from:"Travis Rhoden"

Re: [ceph-users] Uniquely identifying a Ceph client

2016-11-01 Thread Travis Rhoden

On Tue, Nov 1, 2016 at 11:45 AM, Sage Weil <s...@newdream.net> wrote:
> On Tue, 1 Nov 2016, Travis Rhoden wrote:
>> Hello,
>> Is there a consistent, reliable way to identify a Ceph client? I'm looking
>> for a string/ID (UUID, for example) that can be traced back to a client
>> doing RBD maps.
>>
>> There are a couple of possibilities out there, but they aren't quite what
>> I'm looking for.  When checking "rbd status", for example, the output is the
>> following:
>>
>> # rbd status travis2
>> Watchers:
>> watcher=172.21.12.10:0/1492902152 client.4100 cookie=1
>> # rbd status travis3
>> Watchers:
>> watcher=172.21.12.10:0/1492902152 client.4100 cookie=2
>>
>>
>> The IP:port/nonce string is an option, and so is the "client." string,
>> but neither of these is actually that helpful because they don't the same
>> strings when an advisory lock is added to the RBD images. For example:
>
> Both are sufficient.  The  in client. is the most concise and is
> unique per client instance.
>
> I think the problem you're seeing is actually that qemu is using two
> different librbd/librados instances, one for each mapped device?

Not using qemu in this scenario.  Just rbd map && rbd lock.  It's more
that I can't match the output from "rbd lock" against the output from
"rbd status", because they are using different librados instances.
I'm just trying to capture who has an image mapped and locked, and to
those not in the know, it would be a surprise that client. and
client. are actually the same host. :)

I understand why it is, I was checking to see if there was another
field or indicator that I should use instead. I think I'm just going
to have to use the IP address, because that's the value that will have
real meaning to people.

Thanks!

>
>> # rbd lock list travis2
>> There is 1 exclusive lock on this image.
>> Locker  ID Address
>> client.4201 test 172.21.12.100:0/967432549
>> # rbd lock list travis3
>> There is 1 exclusive lock on this image.
>> Locker  ID Address
>> client.4240 test 172.21.12.10:0/2888955091
>>
>> Note that neither the nonce nor the client ID match -- so by looking at the
>> rbd lock output, you can't match that information against the output from
>> "rbd status". I believe this is because the nonce the client identifier is
>> reflecting the CephX session between client and cluster, and while this is
>> persistent across "rbd map" calls (because the rbd kmod has a shared session
>> by default, though that can be changed as well), each call to "rbd lock"
>> initiates a new session. Hence a new nonce and client ID.
>>
>> That pretty much leaves the IP address. These would seem to be problematic
>> as an identifier if the client happened to behind NAT.
>>
>> I am trying to be able to definitely determine what client has an RBD mapped
>> and locked, but I'm not seeing a way to guarantee that you've uniquely
>> identified a client. Am I missing something obvious?
>>
>> Perhaps my concern about NAT is overblown -- I've never mounted an RBD from
>> a client that is behind NAT, and I'm not sure how common that would be
>> (though I think it would work).
>
> It should work, but it's untested.  :)
>
> sage
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] Uniquely identifying a Ceph client

2016-11-01 Thread Travis Rhoden

Hello,

Is there a consistent, reliable way to identify a Ceph client? I'm looking
for a string/ID (UUID, for example) that can be traced back to a client
doing RBD maps.

There are a couple of possibilities out there, but they aren't quite what
I'm looking for.  When checking "rbd status", for example, the output is
the following:

# rbd status travis2
Watchers:
watcher=172.21.12.10:0/1492902152 client.4100 cookie=1
# rbd status travis3
Watchers:
watcher=172.21.12.10:0/1492902152 client.4100 cookie=2


The IP:port/nonce string is an option, and so is the "client." string,
but neither of these is actually that helpful because they don't the same
strings when an advisory lock is added to the RBD images. For example:

# rbd lock list travis2
There is 1 exclusive lock on this image.
Locker  ID Address
client.4201 test 172.21.12.100:0/967432549
# rbd lock list travis3
There is 1 exclusive lock on this image.
Locker  ID Address
client.4240 test 172.21.12.10:0/2888955091

Note that neither the nonce nor the client ID match -- so by looking at the
rbd lock output, you can't match that information against the output from
"rbd status". I believe this is because the nonce the client identifier is
reflecting the CephX session between client and cluster, and while this is
persistent across "rbd map" calls (because the rbd kmod has a shared
session by default, though that can be changed as well), each call to "rbd
lock" initiates a new session. Hence a new nonce and client ID.

That pretty much leaves the IP address. These would seem to be problematic
as an identifier if the client happened to behind NAT.

I am trying to be able to definitely determine what client has an RBD
mapped and locked, but I'm not seeing a way to guarantee that you've
uniquely identified a client. Am I missing something obvious?

Perhaps my concern about NAT is overblown -- I've never mounted an RBD from
a client that is behind NAT, and I'm not sure how common that would be
(though I think it would work).

 - Travis
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] ceph-deploy: too many argument: --setgroup 10

2015-09-02 Thread Travis Rhoden

Hi Noah,

What is the ownership on /var/lib/ceph ?

ceph-deploy should only be trying to use --setgroup if /var/lib/ceph is
owned by non-root.

On a fresh install of Hammer, this should be root:root.

The --setgroup flag was added to ceph-deploy in 1.5.26.

 - Travis

On Wed, Sep 2, 2015 at 1:59 PM, Noah Watkins  wrote:

> I'm getting the following error using ceph-deploy to setup a cluster.
> It's Centos6.6 and I'm using Hammer and the latest ceph-deploy. It
> looks like setgroup wasn't an option in Hammer, but ceph-deploy adds
> it. Is there a trick or older version of ceph-deploy I should try?
>
> - Noah
>
> [cn67][INFO  ] Running command: sudo ceph-mon --cluster ceph --mkfs -i
> cn67 --keyring /var/lib/ceph/tmp/ceph-cn67.mon.keyring --setgroup 10
> [cn67][WARNIN] too many arguments: [--setgroup,10]
> [cn67][DEBUG ]   --conf/-c FILEread configuration from the given
> configuration file
> [cn67][WARNIN] usage: ceph-mon -i monid [flags]
> [cn67][DEBUG ]   --id/-i IDset ID portion of my name
> [cn67][WARNIN]   --debug_mon n
> [cn67][DEBUG ]   --name/-n TYPE.ID set name
> [cn67][WARNIN] debug monitor level (e.g. 10)
> [cn67][DEBUG ]   --cluster NAMEset cluster name (default: ceph)
> [cn67][WARNIN]   --mkfs
> [cn67][DEBUG ]   --version show version and quit
> [cn67][WARNIN] build fresh monitor fs
> [cn67][DEBUG ]
> [cn67][WARNIN]   --force-sync
> [cn67][DEBUG ]   -drun in foreground, log to stderr.
> [cn67][WARNIN] force a sync from another mon by wiping local
> data (BE CAREFUL)
> [cn67][DEBUG ]   -frun in foreground, log to usual
> location.
> [cn67][WARNIN]   --yes-i-really-mean-it
> [cn67][DEBUG ]   --debug_ms N  set message debug level (e.g. 1)
> [cn67][WARNIN] mandatory safeguard for --force-sync
> [cn67][WARNIN]   --compact
> [cn67][WARNIN] compact the monitor store
> [cn67][WARNIN]   --osdmap 
> [cn67][WARNIN] only used when --mkfs is provided: load the
> osdmap from 
> [cn67][WARNIN]   --inject-monmap 
> [cn67][WARNIN] write the  monmap to the local
> monitor store and exit
> [cn67][WARNIN]   --extract-monmap 
> [cn67][WARNIN] extract the monmap from the local monitor store and
> exit
> [cn67][WARNIN]   --mon-data 
> [cn67][WARNIN] where the mon store and keyring are located
> [cn67][ERROR ] RuntimeError: command returned non-zero exit status: 1
> [ceph_deploy.mon][ERROR ] Failed to execute command: ceph-mon
> --cluster ceph --mkfs -i cn67 --keyring
> /var/lib/ceph/tmp/ceph-cn67.mon.keyring --setgroup 10
> [ceph_deploy][ERROR ] GenericError: Failed to create 1 monitors
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Error while installing ceph

2015-08-28 Thread Travis Rhoden

A couple of things here...

Looks like you are on RHEL. If you are on RHEL, but *not* trying to install
RHCS (Red Hat Ceph Storage), a few extra flags are required.  You must use
--release.  For example, ceph-deploy install --release hammer  in
order to get the Hammer upstream release.

The docs need to make this more clear (I don't think it's mentioned
anywhere -- upstream Ceph on RHEL is not a very common case, but it is
supposed to work. :))

That will at least install the right packages.  however, there is still one
more issue you will hit, which is that when installing upstream Ceph on
RHEL, it knows that it needs EPEL (EPEL is not needed with RHCS), and it
will try to install it by name yum install epel-release.  But that
doesn't work on RHEL.  Until that is fixed, you will also have to install
EPEL by hand on your nodes.

On Fri, Aug 28, 2015 at 5:02 PM, Brad Hubbard bhubb...@redhat.com wrote:

 - Original Message -
  From: pavana bhat pavanakrishnab...@gmail.com
  To: Brad Hubbard bhubb...@redhat.com
  Cc: ceph-users@lists.ceph.com
  Sent: Saturday, 29 August, 2015 9:40:50 AM
  Subject: Re: [ceph-users] Error while installing ceph
 
  Yes I did follow all the preflight steps.
 
  After yum install (sudo yum update  sudo yum install ceph-deploy), it
 did
  show the following are installed
 
  rhel-7-ha-rpms
 
 
  rhel-7-optional-rpms
 
 
  rhel-7-server-rpms
 
 
  rhel-7-supplemental-rpms
 
 
  rhel-7-server-rpms/primary_db
 
  ceph-noarch
 
  Installed:
 
ceph-deploy.noarch 0:1.5.28-0

 Perhaps the --repo and/or --release flags are required?

 
 
  Thanks,
 
  Pavana
 
  On Fri, Aug 28, 2015 at 4:29 PM, Brad Hubbard bhubb...@redhat.com
 wrote:
 
   Did you follow this first?
  
   http://docs.ceph.com/docs/v0.80.5/start/quick-start-preflight/
  
   It doesn't seem to be able to locate the repos for the ceph rpms.
  
   - Original Message -
From: pavana bhat pavanakrishnab...@gmail.com
To: ceph-users@lists.ceph.com
Sent: Saturday, 29 August, 2015 8:55:14 AM
Subject: [ceph-users] Error while installing ceph
   
Hi,
   
Im getting an error while ceph installation. Can you please help me?
   
I'm exactly following the steps given in
http://docs.ceph.com/docs/v0.80.5/start/quick-ceph-deploy/ to
 install
   ceph.


These are pretty old docs (see the version number in the URL).  It's
probably always best to start at http://docs.ceph.com/docs/master instead.
How did you get to this old version?  If it was from a link, we would want
to check that that link still made sense.


   
But when I execute ceph-deploy install {ceph-node}[{ceph-node} ...],
   getting
following error:
   
   
   
[ ceph-vm-mon1 ][ DEBUG ] Cleaning up everything
   
[ ceph-vm-mon1 ][ DEBUG ] Cleaning up list of fastest mirrors
   
[ ceph-vm-mon1 ][ INFO ] Running command: sudo yum -y install
 ceph-osd
ceph-mds ceph-mon ceph-radosgw
   
[ ceph-vm-mon1 ][ DEBUG ] Loaded plugins: fastestmirror
   
[ ceph-vm-mon1 ][ DEBUG ] Determining fastest mirrors
   
[ ceph-vm-mon1 ][ DEBUG ] * rhel-7-ha-rpms: 203.36.4.124
   
[ ceph-vm-mon1 ][ DEBUG ] * rhel-7-optional-rpms: 203.36.4.124
   
[ ceph-vm-mon1 ][ DEBUG ] * rhel-7-server-rpms: 203.36.4.124
   
[ ceph-vm-mon1 ][ DEBUG ] * rhel-7-supplemental-rpms: 203.36.4.124
   
[ ceph-vm-mon1 ][ DEBUG ] No package ceph-osd available.
   
[ ceph-vm-mon1 ][ DEBUG ] No package ceph-mds available.
   
[ ceph-vm-mon1 ][ DEBUG ] No package ceph-mon available.
   
[ ceph-vm-mon1 ][ DEBUG ] No package ceph-radosgw available.
   
[ ceph-vm-mon1 ][ WARNIN ] Error: Nothing to do
   
[ ceph-vm-mon1 ][ ERROR ] RuntimeError: command returned non-zero
 exit
status: 1
   
[ ceph_deploy ][ ERROR ] RuntimeError: Failed to execute command:
 yum -y
install ceph-osd ceph-mds ceph-mon ceph-radosgw
   
I have finished the preflight steps and I'm able to connect to
 internet
   from
my nodes.
   
Thanks,
   
Pavana
   
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
   
  
 
 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] [ANN] ceph-deploy 1.5.28 released

2015-08-26 Thread Travis Rhoden

Hi everyone,

A new version of ceph-deploy has been released. Version 1.5.28
includes the following:

 - A fix for a regression introduced in 1.5.27 that prevented
importing GPG keys on CentOS 6 only.
 - Will prevent Ceph daemon deployment on nodes that don't have Ceph
installed on them.
 - Makes it possible to go from 1 monitor daemon to 2 without a 5
minute hang/delay.
 - More systemd enablement work.


Full changelog is at [1].

Updated packages have been uploaded to
{rpm,debian}-{firefly,hammer,testing} repos on ceph.com, and to PyPI.

Cheers,

 - Travis

[1] http://ceph.com/ceph-deploy/docs/changelog.html#id2
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] [ANN] ceph-deploy 1.5.27 released

2015-08-05 Thread Travis Rhoden

Hi Nigel,

On Wed, Aug 5, 2015 at 9:00 PM, Nigel Williams
nigel.willi...@utas.edu.au wrote:
 On 6/08/2015 9:45 AM, Travis Rhoden wrote:

 A new version of ceph-deploy has been released. Version 1.5.27
 includes the following:


 Has the syntax for use of --zap-disk changed? I moved it around but it is no
 longer recognised; worked around by doing a ceph-disk zap before running
 ceph-deploy.

A few things in this area changed with 1.5.26. ceph-deploys options
are much more strictly attached only to the commands where they make
sense.


 This worked previously:

 ceph-deploy --overwrite-conf osd --zap-disk prepare ceph05:/dev/sdb:/dev/sdd

--zap-disk is an option to 'prepare', not to 'osd'.  ceph-deploy osd
--zap-disk list doesn't make any sense, for example.  The help menus
should make this clear:

# ceph-deploy osd --help
usage: ceph-deploy osd [-h] {list,create,prepare,activate} ...

# ceph-deploy osd prepare --help
usage: ceph-deploy osd prepare [-h] [--zap-disk] [--fs-type FS_TYPE]
   [--dmcrypt] [--dmcrypt-key-dir KEYDIR]



 --
 To unsubscribe from this list: send the line unsubscribe ceph-devel in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] [ANN] ceph-deploy 1.5.27 released

2015-08-05 Thread Travis Rhoden

Hi everyone,

A new version of ceph-deploy has been released. Version 1.5.27
includes the following:

 - a new ceph-deploy repo command that allows for adding and
removing custom repo definitions
 - Makes commands like ceph-deploy install --rgw only install the
RGW component of Ceph.

This works for daemons/components such as --rgw, --mds, and --cli,
depending on how packages are split on your distro.  For example,
Debian packages the Ceph MDS into a separate 'ceph-mon' package, and
therefore if you use install --mds only the ceph-mds package will be
installed.  RPM packages do not do this, so it has to install ceph,
which includes MDS, MON, and OSD daemons.  Further package splits are
coming, but right now we do what we can.

 - Some fixes around using DNF (Fedora = 22)
 - Early support for systemd (Fedora 22 and development Ceph builds only)
 - Loads of internal changes.

Full changelog is at [1].

Updated packages have been uploaded to
{rpm,debian}-{firefly,hammer,testing} repos on ceph.com, and to PyPI.

Cheers,

 - Travis

[1] http://ceph.com/ceph-deploy/docs/changelog.html#id2
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] injectargs not working?

2015-07-29 Thread Travis Rhoden

Hi Quentin,

It may be the specific option you are trying to tweak.
osd-scrub-begin-hour was first introduced in development release
v0.93, which means it would be in 0.94.x (Hammer), but your cluster is
0.87.1 (Giant).

Cheers,

 - Travis

On Wed, Jul 29, 2015 at 4:28 PM, Quentin Hartman
qhart...@direwolfdigital.com wrote:
 I'm running a 0.87.1 cluster, and my ceph tell seems to not be working:

 # ceph tell osd.0 injectargs '--osd-scrub-begin-hour 1'
  failed to parse arguments: --osd-scrub-begin-hour,1


 I've also tried the daemon config set variant and it also fails:

 # ceph daemon osd.0 config set osd_scrub_begin_hour 1
 { error: error setting 'osd_scrub_begin_hour' to '1': (2) No such file or
 directory}

 I'm guessing I have something goofed in my admin socket client config:

 [client]
 rbd cache = true
 rbd cache writethrough until flush = true
 admin socket = /var/run/ceph/$cluster-$type.$id.asok

 but that seems to correlate with the structure that exists:

 # ls
 ceph-osd.24.asok  ceph-osd.25.asok  ceph-osd.26.asok
 # pwd
 /var/run/ceph

 I can show my configs all over the place, but changing them seems to always
 fail. It behaves the same if I'm working on a local daemon, or on my config
 node trying to make changes globally.

 Thanks in advance for any ideas

 QH


 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] debugging ceps-deploy warning: could not open file descriptor -1

2015-07-24 Thread Travis Rhoden

Hi Noah,

It does look like the two things are unrelated.  But you are right,
ceph-deploy stopped accepting that trailing hostname with the
ceph-deploy mon create-initial command with 1.5.26.  It was never a
needed argument, and accepting it led to confusion.  I tightened up
the argument parsing for ceph-deploy quite a bit in 1.5.26.

Looking at your logfile, I do not know what the apt-get errors.  It
does seem like the install proceeds successfully, and that the ceph
setup will proceed once the extra arg to mon create-initial is
removed.

Here's hoping that is indeed nothing to worry about. :)

 - Travis

On Tue, Jul 21, 2015 at 2:27 PM, Noah Watkins noahwatk...@gmail.com wrote:
 Nevermind. I see that `ceph-deploy mon create-initial` has stopped
 accepting the trailing hostname which was causing the failure. I don't
 know if those problems above I showed are actually anything to worry
 about :)

 On Tue, Jul 21, 2015 at 3:17 PM, Noah Watkins noahwatk...@gmail.com wrote:
 The docker/distribution project runs a continuous integration VM using
 CircleCI, and part of the VM setup installs Ceph packages using
 ceph-deploy. This has been working well for quite a while, but we are
 seeing a failure running `ceph-deploy install --release hammer`. The
 snippet is here where it looks the first problem shows up.

 ...
 [box156][DEBUG ] Get:24 http://ceph.com/debian-hammer/ precise/main
 ceph-mds amd64 0.94.2-1precise [10.5 MB]
 [box156][DEBUG ] Get:25 http://ceph.com/debian-hammer/ precise/main
 radosgw amd64 0.94.2-1precise [3,619 kB]
 [box156][WARNIN] E: Could not open file descriptor -1
 [box156][WARNIN] E: Prior errors apply to
 /var/cache/apt/archives/parted_2.3-19ubuntu1_amd64.deb
 ...

 On the surface it seems that the problem is coming from apt-get under
 the hood. Any pointers here? It doesn't seem like anything has changed
 configuration wise. The full build log can be found here which starts
 off with the ceph-deploy command that is failing:

 https://circleci.com/gh/docker/distribution/1848

 Thanks,
 -Noah
 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] ceph-deploy on ubuntu 15.04

2015-07-24 Thread Travis Rhoden

Hi Bernhard,

Thanks for your email.  systemd support for Ceph in general is still a
work in progress.  It is actively being worked on, but the packages
hosted on ceph.com are still using sysvinit (for RPM systems), and
Upstart on Ubuntu.  It is definitely a known issue.

Along those lines, ceph.com only hosts packages for precise and trusty
(the LTS releases), so there is no support for 15.04 either.

 - Travis

On Fri, Jul 24, 2015 at 4:01 AM, Bernhard Duebi boom...@inbox.com wrote:
 Hi,

 I have a problem with ceph-deploy on Ubuntu 15.04

 in the file 
 /usr/local/lib/python2.7/dist-packages/ceph_deploy/hosts/debian/__init__.py

 def choose_init():
 
 Select a init system

 Returns the name of a init system (upstart, sysvinit ...).
 
 if distro.lower() == 'ubuntu':
 return 'upstart'
 return 'sysvinit'

 This function assumes that Ubuntu is using upstart but at least Ubuntu 15.04 
 Server is using SystemD by default.
 I'm not a python hacker. As a quick fix I commented out the if statement 
 and now it always returns 'sysvinit'. But for a real fix there should be 
 something like if os == ubuntu and osrelease  15.04
 I first noticed this problem in the package that came with the distribution. 
 A few days ago I removed the package and installed the latest ceph-deploy 
 using pip install.

 Maybe this is a know problem, then I'm sorry for the spam

 Regards
 Bernhard

 
 FREE 3D MARINE AQUARIUM SCREENSAVER - Watch dolphins, sharks  orcas on your 
 desktop!
 Check it out at http://www.inbox.com/marineaquarium


 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] [ANN] ceps-deploy 1.5.26 released

2015-07-23 Thread Travis Rhoden

Hi everyone,

This is announcing a new release of ceph-deploy that focuses on usability 
improvements.

 - Most of the help menus for ceph-deploy subcommands (e.ge. “ceph-deploy mon” 
and “ceph-deploy osd”) have been improved to be more context aware, such that 
help for “ceph-deploy osd create --help “ and “ceph-deploy osd zap --help” 
return different output specific to the command.  Previously it would show 
generic help for “ceph-deploy osd”.  Additionally, the list of optional 
arguments shown for the command are always correct for the subcommand in 
question.  Previously the options shown were the aggregate of all options.

 - ceph-deploy now points to git.ceph.com for downloading GPG keys

 - ceph-deploy will now work on the Mint Linux distribution (by pointing to 
Ubuntu packages)

 - SUSE distro users will now be pointed to SUSE packages by default, as there 
have not been updated SUSE packages on ceph.com in quite some time.

Full changelog is available at: 
http://ceph.com/ceph-deploy/docs/changelog.html#id1

New packages are available in the usual places of ceph.com hosted repos and 
PyPI.

Cheers,

 - Travis
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] ceph-deploy for Hammer

2015-05-28 Thread Travis Rhoden

Hi Pankaj,

While there have been times in the past where ARM binaries were hosted
on ceph.com, there is not currently any ARM hardware for builds.  I
don't think you will see any ARM binaries in
http://ceph.com/debian-hammer/pool/main/c/ceph/, for example.

Combine that with the fact that ceph-deploy is not intended to work
with locally compiled binaries (only packages, as it relies on paths,
conventions, and service definitions from the packages), and it is a
very tricky combo to use ceph-deploy and ARM together.

Your most recent error is indicative of the ceph-mon service not
coming up successfully.  when ceph-mon (the service, not the daemon)
is started, it also calls ceph-create-keys, which waits for the
monitor daemon to come up and the creates keys that are necessary for
all clusters to run when using cephx (the admin key, the bootsraps
keys).

 - Travis

On Wed, May 27, 2015 at 8:27 PM, Garg, Pankaj
pankaj.g...@caviumnetworks.com wrote:
 Actually the ARM binaries do exist and I have been using for previous
 releases. Somehow this library is the one that doesn’t load.

 Anyway I did compile my own Ceph for ARM, and now getting the following
 issue:



 [ceph_deploy.gatherkeys][WARNIN] Unable to find
 /etc/ceph/ceph.client.admin.keyring on ceph1

 [ceph_deploy][ERROR ] KeyNotFoundError: Could not find keyring file:
 /etc/ceph/ceph.client.admin.keyring on host ceph1





 From: Somnath Roy [mailto:somnath@sandisk.com]
 Sent: Wednesday, May 27, 2015 4:29 PM
 To: Garg, Pankaj


 Cc: ceph-users@lists.ceph.com
 Subject: RE: ceph-deploy for Hammer



 If you are trying to install the ceph repo hammer binaries, I don’t think it
 is built for ARM. Both binary and the .so needs to be built in ARM to make
 this work I guess.

 Try to build hammer code base in your ARM server and then retry.



 Thanks  Regards

 Somnath



 From: Pankaj Garg [mailto:pankaj.g...@caviumnetworks.com]
 Sent: Wednesday, May 27, 2015 4:17 PM
 To: Somnath Roy
 Cc: ceph-users@lists.ceph.com
 Subject: RE: ceph-deploy for Hammer



 Yes I am on ARM.

 -Pankaj

 On May 27, 2015 3:58 PM, Somnath Roy somnath@sandisk.com wrote:

 Are you running this on ARM ?

 If not, it should not go for loading this library.



 Thanks  Regards

 Somnath



 From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of
 Garg, Pankaj
 Sent: Wednesday, May 27, 2015 2:26 PM
 To: Garg, Pankaj; ceph-users@lists.ceph.com
 Subject: Re: [ceph-users] ceph-deploy for Hammer



 I seem to be getting these errors in the Monitor Log :

 2015-05-27 21:17:41.908839 3ff907368e0 -1
 erasure_code_init(jerasure,/usr/lib/aarch64-linux-gnu/ceph/erasure-code):
 (5) Input/output error

 2015-05-27 21:17:41.978113 3ff969168e0  0 ceph version 0.94.1
 (e4bfad3a3c51054df7e537a724c8d0bf9be972ff), process ceph-mon, pid 16592

 2015-05-27 21:17:41.984383 3ff969168e0 -1 ErasureCodePluginSelectJerasure:
 load
 dlopen(/usr/lib/aarch64-linux-gnu/ceph/erasure-code/libec_jerasure_neon.so):
 /usr/lib/aarch64-linux-gnu/ceph/erasure-code/libec_jerasure_neon.so: cannot
 open shared object file: No such file or directory

 2015-05-27 21:17:41.98 3ff969168e0 -1
 erasure_code_init(jerasure,/usr/lib/aarch64-linux-gnu/ceph/erasure-code):
 (5) Input/output error

 2015-05-27 21:17:42.052415 3ff90cf68e0  0 ceph version 0.94.1
 (e4bfad3a3c51054df7e537a724c8d0bf9be972ff), process ceph-mon, pid 16604

 2015-05-27 21:17:42.058656 3ff90cf68e0 -1 ErasureCodePluginSelectJerasure:
 load
 dlopen(/usr/lib/aarch64-linux-gnu/ceph/erasure-code/libec_jerasure_neon.so):
 /usr/lib/aarch64-linux-gnu/ceph/erasure-code/libec_jerasure_neon.so: cannot
 open shared object file: No such file or directory

 2015-05-27 21:17:42.058715 3ff90cf68e0 -1
 erasure_code_init(jerasure,/usr/lib/aarch64-linux-gnu/ceph/erasure-code):
 (5) Input/output error

 2015-05-27 21:17:42.125279 3ffac4368e0  0 ceph version 0.94.1
 (e4bfad3a3c51054df7e537a724c8d0bf9be972ff), process ceph-mon, pid 16616

 2015-05-27 21:17:42.131666 3ffac4368e0 -1 ErasureCodePluginSelectJerasure:
 load
 dlopen(/usr/lib/aarch64-linux-gnu/ceph/erasure-code/libec_jerasure_neon.so):
 /usr/lib/aarch64-linux-gnu/ceph/erasure-code/libec_jerasure_neon.so: cannot
 open shared object file: No such file or directory

 2015-05-27 21:17:42.131726 3ffac4368e0 -1
 erasure_code_init(jerasure,/usr/lib/aarch64-linux-gnu/ceph/erasure-code):
 (5) Input/output error





 The lib file exists, so not sure why this is happening. Any help
 appreciated.



 Thanks

 Pankaj



 From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of
 Garg, Pankaj
 Sent: Wednesday, May 27, 2015 1:37 PM
 To: ceph-users@lists.ceph.com
 Subject: [ceph-users] ceph-deploy for Hammer



 Hi,

 Is there a particular verion of Ceph-Deploy that should be used with Hammer
 release? This is a brand new cluster.

 I’m getting the following error when running command : ceph-deploy mon
 create-initial



 [ceph_deploy.conf][DEBUG ] found configuration file at:

[ceph-users] [ANN] ceph-deploy 1.5.25 released

2015-05-26 Thread Travis Rhoden

Hi everyone,

This is announcing a new release of ceph-deploy that fixes a security
related issue, improves SUSE support, and improves support for RGW on
RPM systems.  ceph-deploy can be installed from ceph.com hosted repos
for Firefly, Giant, Hammer, and testing, and is also available on
PyPI.

Eagle-eyed readers may notice that there was not an announcement for
1.5.24 -- this was due to package build infrastructure issues that
prevented the creation of RPM and DEB packages.  By the time the
issues were resolved, 1.5.25 was imminent, so 1.5.24 packages were not
created even though 1.5.24 was available through PyPI.

Full changelog is available at [1], but here are the highlights for
both 1.5.25 and 1.5.24:

 - Fix CVE where 'ceph-deploy admin' command resulted in admin keyring
being pushed to remote nodes with world readable (0644) permissions.
 - Fix reference to package name ceph-radosgw on RPM systems
 - Fix possible truncated output of ceph-deploy disk list
 - More robust deployment of RGW on RPM systems

Please update!

Cheers,

 - Travis

[1] http://ceph.com/ceph-deploy/docs/changelog.html#id2
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Firefly - Giant : CentOS 7 : install failed ceph-deploy

2015-04-08 Thread Travis Rhoden

Hi Vickey,

The easiest way I know of to get around this right now is to add the
following line in section for epel in /etc/yum.repos.d/epel.repo

exclude=python-rados python-rbd

So this is what my epel.repo file looks like: http://fpaste.org/208681/

It is those two packages in EPEL that are causing problems.  I also tried
enabling epel-testing, but that didn't work either.

Unfortunately you would need to add this line on each node where Ceph Giant
is being installed.

 - Travis

On Wed, Apr 8, 2015 at 4:11 PM, Vickey Singh vickey.singh22...@gmail.com
wrote:

 Community , need help.


 -VS-

 On Wed, Apr 8, 2015 at 4:36 PM, Vickey Singh vickey.singh22...@gmail.com
 wrote:

 Any suggestion  geeks


 VS

 On Wed, Apr 8, 2015 at 2:15 PM, Vickey Singh vickey.singh22...@gmail.com
  wrote:


 Hi


 The below suggestion also didn’t worked


 Full logs here : http://paste.ubuntu.com/10771939/




 [root@rgw-node1 yum.repos.d]# yum --showduplicates list ceph

 Loaded plugins: fastestmirror, priorities

 Loading mirror speeds from cached hostfile

  * base: mirror.zetup.net

  * epel: ftp.fi.muni.cz

  * extras: mirror.zetup.net

  * updates: mirror.zetup.net

 25 packages excluded due to repository priority protections

 Available Packages

 ceph.x86_64
 0.80.6-0.el7.centos
 Ceph

 ceph.x86_64
 0.80.7-0.el7.centos
 Ceph

 ceph.x86_64
 0.80.8-0.el7.centos
 Ceph

 ceph.x86_64
 0.80.9-0.el7.centos
 Ceph

 [root@rgw-node1 yum.repos.d]#





 Its not able to install latest available package , yum is getting
 confused with other DOT releases.


 Any other suggestion to fix this ???



 -- Processing Dependency: libboost_system-mt.so.1.53.0()(64bit) for
 package: librbd1-0.80.9-0.el7.centos.x86_64

 -- Processing Dependency: libboost_thread-mt.so.1.53.0()(64bit) for
 package: librbd1-0.80.9-0.el7.centos.x86_64

 -- Finished Dependency Resolution

 Error: Package: librbd1-0.80.7-0.el7.centos.x86_64 (Ceph)

Requires: libboost_system-mt.so.1.53.0()(64bit)

 Error: Package: ceph-0.80.7-0.el7.centos.x86_64 (Ceph)

Requires: libboost_system-mt.so.1.53.0()(64bit)

 Error: Package: ceph-0.80.7-0.el7.centos.x86_64 (Ceph)

Requires: libaio.so.1(LIBAIO_0.4)(64bit)

 Error: Package: ceph-common-0.80.7-0.el7.centos.x86_64 (Ceph)

Requires: libboost_thread-mt.so.1.53.0()(64bit)

 Error: Package: ceph-common-0.80.7-0.el7.centos.x86_64 (Ceph)

Requires: librados2 = 0.80.7-0.el7.centos

Available: librados2-0.80.6-0.el7.centos.x86_64 (Ceph)

librados2 = 0.80.6-0.el7.centos

Available: librados2-0.80.7-0.el7.centos.x86_64 (Ceph)

librados2 = 0.80.7-0.el7.centos

Available: librados2-0.80.8-0.el7.centos.x86_64 (Ceph)

librados2 = 0.80.8-0.el7.centos

Installing: librados2-0.80.9-0.el7.centos.x86_64 (Ceph)

librados2 = 0.80.9-0.el7.centos

 Error: Package: libcephfs1-0.80.7-0.el7.centos.x86_64 (Ceph)

Requires: libboost_thread-mt.so.1.53.0()(64bit)

 Error: Package: ceph-common-0.80.7-0.el7.centos.x86_64 (Ceph)

Requires: python-requests

 Error: Package: ceph-common-0.80.7-0.el7.centos.x86_64 (Ceph)

Requires: librbd1 = 0.80.7-0.el7.centos

Available: librbd1-0.80.6-0.el7.centos.x86_64 (Ceph)

librbd1 = 0.80.6-0.el7.centos

Available: librbd1-0.80.7-0.el7.centos.x86_64 (Ceph)

librbd1 = 0.80.7-0.el7.centos

Available: librbd1-0.80.8-0.el7.centos.x86_64 (Ceph)

librbd1 = 0.80.8-0.el7.centos

Installing: librbd1-0.80.9-0.el7.centos.x86_64 (Ceph)

librbd1 = 0.80.9-0.el7.centos

 Error: Package: ceph-common-0.80.7-0.el7.centos.x86_64 (Ceph)

Requires: libboost_system-mt.so.1.53.0()(64bit)

 Error: Package: ceph-common-0.80.7-0.el7.centos.x86_64 (Ceph)

Requires: python-ceph = 0.80.7-0.el7.centos

Available: python-ceph-0.80.6-0.el7.centos.x86_64 (Ceph)

python-ceph = 0.80.6-0.el7.centos

Available: python-ceph-0.80.7-0.el7.centos.x86_64 (Ceph)

python-ceph = 0.80.7-0.el7.centos

Available: python-ceph-0.80.8-0.el7.centos.x86_64 (Ceph)

python-ceph = 0.80.8-0.el7.centos

Installing: python-ceph-0.80.9-0.el7.centos.x86_64 (Ceph)

python-ceph = 0.80.9-0.el7.centos

 Error: Package: libcephfs1-0.80.7-0.el7.centos.x86_64 (Ceph)

Requires: libboost_system-mt.so.1.53.0()(64bit)

 Error: Package: ceph-0.80.7-0.el7.centos.x86_64 (Ceph)

Requires: python-requests

 Error: Package: ceph-0.80.7-0.el7.centos.x86_64 (Ceph)

Requires: librados2 = 0.80.7-0.el7.centos

Available: librados2-0.80.6-0.el7.centos.x86_64 (Ceph)

librados2 = 0.80.6-0.el7.centos

Available: librados2-0.80.7-0.el7.centos.x86_64 (Ceph)

[ceph-users] [ANN] ceph-deploy 1.5.23 released

2015-04-08 Thread Travis Rhoden

Hi All,

This is a new release of ceph-deploy that includes a new feature for
Hammer and bugfixes.  ceph-deploy can be installed from the ceph.com
hosted repos for Firefly, Giant, Hammer, or testing, and is also
available on PyPI.

ceph-deploy now defaults to installing the Hammer release. If you need
to install a different release, use the --release flag.

To go along with the Hammer release, ceph-deploy now includes support
for a drastically simplified deployment for RGW.  See further details
at [1] and [2].

This release also fixes an issue where keyrings pushed to remote nodes
ended up with world-readable permissions.

The full changelog can be seen at [3].

Please update!

Cheers,

 - Travis

[1] http://ceph.com/docs/master/start/quick-ceph-deploy/#add-an-rgw-instance
[2] http://ceph.com/ceph-deploy/docs/rgw.html
[3] http://ceph.com/ceph-deploy/docs/changelog.html#id2
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Inconsistent ceph-deploy disk list command results

2015-04-08 Thread Travis Rhoden

Hi Frederic,

Thanks for the report!  Do you mind throwing this details into a bug
report at http://tracker.ceph.com/ ?

I have seen the same thing once before, but at the time didn't have
the chance to check if the inconsistency was coming from ceph-deploy
or from ceph-disk.  This certainly seems to point at ceph-deploy!

 - Travis

On Wed, Apr 8, 2015 at 4:15 AM, f...@univ-lr.fr f...@univ-lr.fr wrote:
 Hi all,

 I want to alert on a command we've learned to avoid for its inconsistent
 results.

 on Giant 0.87.1 and Hammer 0.93.0 (ceph-deploy-1.5.22-0.noarch was used in
 both cases) ceph-deploy disk list command has a problem.

 We should get an exhaustive list of devices entries, like this one :
 ../..
 /dev/sdk :
 /dev/sdk1 ceph data, active, cluster ceph, osd.34, journal /dev/sda9
 ../..

 But from the admin node,
 when we count how many disks we have on our nodes , results are incorrect
 and differ each time :
 $ ceph-deploy disk list osdnode1 21|grep active, |wc -l
  8
 $ ceph-deploy disk list osdnode1 21|grep active, |wc -l
 12
 $ ceph-deploy disk list osdnode1 21|grep active, |wc -l
 10
 $ ceph-deploy disk list osdnode1 21|grep active, |wc -l
 15
 $ ceph-deploy disk list osdnode1 21|grep active, |wc -l
 12

 From the nodes,
 results are correct (15) and always the same :
 $ ceph-disk list 21|grep active, |wc -l
 15
 $ ceph-disk list 21|grep active, |wc -l
 15
 $ ceph-disk list 21|grep active, |wc -l
 15
 $ ceph-disk list 21|grep active, |wc -l
 15
 $ ceph-disk list 21|grep active, |wc -l
 15
 $ ceph-disk list 21|grep active, |wc -l
 15

 but a pretty similar 'ceph-deploy osd list' command works fine

 Frederic
 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] [ANN] ceph-deploy 1.5.22 released

2015-03-09 Thread Travis Rhoden

Hi All,

This is a new release of ceph-deploy that changes a couple of behaviors.

On RPM-based distros, ceph-deploy will now automatically enable
check_obsoletes in the Yum priorities plugin. This resolves an issue
many community members hit where package dependency resolution was
breaking due to conflicts between upstream packaging (hosted on
ceph.com) and downstream (i.e., Fedora or EPEL).

The other important change is that when using ceph-deploy to install
Ceph packages on a RHEL machine, the --release flag *must* be used if
you want to install upstream packages. In other words, if you want to
install Giant on a RHEL machine, you would need to use ceph-deploy
install --release giant. If the --release flag is not used,
ceph-deploy will expect to use downstream package on RHEL. This is
documented at [1].

The full changelog can be seen at [2].

Please update!

 - Travis


[1] http://ceph.com/ceph-deploy/docs/install.html#distribution-notes
[2] http://ceph.com/ceph-deploy/docs/changelog.html#id1
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] problem with yum install ceph-deploy command

2015-03-07 Thread Travis Rhoden

Hi Khyati,


On Sat, Mar 7, 2015 at 5:18 AM, khyati joshi kpjosh...@gmail.com wrote:
 Hello ceph-users,

  I am new to ceph.I am using centos-5.11 (i386) for deploying ceph
 with  and epel-release-5.4.noarch.rpm is sucessfuly installed.

ceph (and ceph-deploy) is not packaged for CentOS 5.  You'll need to use 6 or 7.


 But   running  yum install ceph-deploy command is giving following error.

 ceph-deploy-1.5.21-0.noarch from ceph-noarch has depsolving problem
 --missing dependencies:  python-distribute is needed by package
 ceph-deploy-1.5.21-0.noarch.

This is Yum saying it can't find a python-distribute package in your
configured repos.  Again, not sure where/if this is available on
CentOS 5.


 then I removed python-argparse and run command  yum install snappy
 leveldb gdisk python-argparse gperftools-libs and got another error
 as

 rpm(lib)(filedigests) needed by python-argparse-1.2.1-2.e16.noarch
 rpmlib(payloadlsxz) needed by python-arg-parse-1.2.1.1-2.e16.noarch.

Not sure why you would remove argparse.  It's required.


  Main cause of both errors are related with python but dont know how
 to resolve it. Do anyone knows how to sove this error?

While it *may* be possible to get ceph-deploy working on a CentOS 5
box (I would install using pip, pointing to PyPI instead of using
Yum/EPEL for this), it would only be useful as a place to launch
installs on remote machines from. Your best bet is to run a much newer
distribution.

 - Travis


 Any help will be appreciated.


 Thanks,
 khyati joshi
 M.tech Student,
 Gujarat, India.
 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Ceph User Teething Problems

2015-03-04 Thread Travis Rhoden

On Wed, Mar 4, 2015 at 4:43 PM, Lionel Bouton
lionel-subscript...@bouton.name wrote:
 On 03/04/15 22:18, John Spray wrote:
 On 04/03/2015 20:27, Datatone Lists wrote:
 [...] [Please don't mention ceph-deploy]
 This kind of comment isn't very helpful unless there is a specific
 issue with ceph-deploy that is preventing you from using it, and
 causing you to resort to manual steps.

As a new maintainer of ceph-deploy, I'm happy to hear all gripes.  :)


 ceph-deploy is a subject I never took the time to give feedback on.

 We can't use it (we use Gentoo which isn't supported by ceph-deploy) and
 even if we could I probably wouldn't allow it: I believe that for
 important pieces of infrastructure like Ceph you have to understand its
 inner workings to the point where you can hack your way out in cases of
 problems and build tools to integrate them better with your environment
 (you can understand one of the reasons why we use Gentoo in production
 with other distributions...).
 I believe using ceph-deploy makes it more difficult to acquire the
 knowledge to do so.
 For example we have a script to replace a defective OSD (destroying an
 existing one and replacing with a new one) locking data in place as long
 as we can to avoid crush map changes to trigger movements until the map
 reaches its original state again which minimizes the total amount of
 data copied around. It might have been possible to achieve this with
 ceph-deploy, but I doubt we would have achieved it as easily (from
 understanding the causes of data movements through understanding the osd
 identifiers allocation process to implementing the script) if we hadn't
 created the OSD by hand repeatedly before scripting some processes.

Thanks for this feedback.  I share a lot of your sentiments,
especially that it is good to understand as much of the system as you
can.  Everyone's skill level and use-case is different, and
ceph-deploy is targeted more towards PoC use-cases. It tries to make
things as easy as possible, but that necessarily abstracts most of the
details away.


 Last time I searched for documentation on manual configuration it was
 much harder to find (mds manual configuration was indeed something I
 didn't find at all too).

 Best regards,

 Lionel
 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Centos 7 OSD silently fail to start

2015-02-25 Thread Travis Rhoden

Also, did you successfully start your monitor(s), and define/create the
OSDs within the Ceph cluster itself?

There are several steps to creating a Ceph cluster manually.  I'm unsure if
you have done the steps to actually create and register the OSDs with the
cluster.

 - Travis

On Wed, Feb 25, 2015 at 9:49 AM, Leszek Master keks...@gmail.com wrote:

 Check firewall rules and selinux. It sometimes is a pain in the ... :)
 25 lut 2015 01:46 Barclay Jameson almightybe...@gmail.com napisał(a):

 I have tried to install ceph using ceph-deploy but sgdisk seems to
 have too many issues so I did a manual install. After mkfs.btrfs on
 the disks and journals and mounted them I then tried to start the osds
 which failed. The first error was:
 #/etc/init.d/ceph start osd.0
 /etc/init.d/ceph: osd.0 not found (/etc/ceph/ceph.conf defines ,
 /var/lib/ceph defines )

 I then manually added the osds to the conf file with the following as
 an example:
 [osd.0]
 osd_host = node01

 Now when I run the command :
 # /etc/init.d/ceph start osd.0

 There is no error or output from the command and in fact when I do a
 ceph -s no osds are listed as being up.
 Doing as ps aux | grep -i ceph or ps aux | grep -i osd shows there are
 no osd running.
 I also have done htop to see if any process are running and none are
 shown.

 I had this working on SL6.5 with Firefly but Giant on Centos 7 has
 been nothing but a giant pain.
 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Ceph-deploy issues

2015-02-25 Thread Travis Rhoden

Hi Pankaj,

I can't say that it will fix the issue, but the first thing I would
encourage is to use the latest ceph-deploy.

you are using 1.4.0, which is quite old.  The latest is 1.5.21.

 - Travis

On Wed, Feb 25, 2015 at 3:38 PM, Garg, Pankaj
pankaj.g...@caviumnetworks.com wrote:
 Hi,

 I had a successful ceph cluster that I am rebuilding. I have completely
 uninstalled ceph and any remnants and directories and config files.

 While setting up the new cluster, I follow the Ceph-deploy documentation as
 described before. I seem to get an error now (tried many times) :



 ceph-deploy mon create-initial command fails in gather keys step. This never
 happened before, and I’m not sure why its failing now.







 cephuser@ceph1:~/my-cluster$ ceph-deploy mon create-initial

 [ceph_deploy.cli][INFO  ] Invoked (1.4.0): /usr/bin/ceph-deploy mon
 create-initial

 [ceph_deploy.mon][DEBUG ] Deploying mon, cluster ceph hosts ceph1

 [ceph_deploy.mon][DEBUG ] detecting platform for host ceph1 ...

 [ceph1][DEBUG ] connected to host: ceph1

 [ceph1][DEBUG ] detect platform information from remote host

 [ceph1][DEBUG ] detect machine type

 [ceph_deploy.mon][INFO  ] distro info: Ubuntu 14.04 trusty

 [ceph1][DEBUG ] determining if provided host has same hostname in remote

 [ceph1][DEBUG ] get remote short hostname

 [ceph1][DEBUG ] deploying mon to ceph1

 [ceph1][DEBUG ] get remote short hostname

 [ceph1][DEBUG ] remote hostname: ceph1

 [ceph1][DEBUG ] write cluster configuration to /etc/ceph/{cluster}.conf

 [ceph1][DEBUG ] create the mon path if it does not exist

 [ceph1][DEBUG ] checking for done path: /var/lib/ceph/mon/ceph-ceph1/done

 [ceph1][DEBUG ] done path does not exist: /var/lib/ceph/mon/ceph-ceph1/done

 [ceph1][INFO  ] creating keyring file:
 /var/lib/ceph/tmp/ceph-ceph1.mon.keyring

 [ceph1][DEBUG ] create the monitor keyring file

 [ceph1][INFO  ] Running command: sudo ceph-mon --cluster ceph --mkfs -i
 ceph1 --keyring /var/lib/ceph/tmp/ceph-ceph1.mon.keyring

 [ceph1][DEBUG ] ceph-mon: set fsid to 099013d5-126d-45b4-a98e-5f0c386805a4

 [ceph1][DEBUG ] ceph-mon: created monfs at /var/lib/ceph/mon/ceph-ceph1 for
 mon.ceph1

 [ceph1][INFO  ] unlinking keyring file
 /var/lib/ceph/tmp/ceph-ceph1.mon.keyring

 [ceph1][DEBUG ] create a done file to avoid re-doing the mon deployment

 [ceph1][DEBUG ] create the init path if it does not exist

 [ceph1][DEBUG ] locating the `service` executable...

 [ceph1][INFO  ] Running command: sudo initctl emit ceph-mon cluster=ceph
 id=ceph1

 [ceph1][INFO  ] Running command: sudo ceph --cluster=ceph --admin-daemon
 /var/run/ceph/ceph-mon.ceph1.asok mon_status

 [ceph1][DEBUG ]
 

 [ceph1][DEBUG ] status for monitor: mon.ceph1

 [ceph1][DEBUG ] {

 [ceph1][DEBUG ]   election_epoch: 2,

 [ceph1][DEBUG ]   extra_probe_peers: [

 [ceph1][DEBUG ] 192.168.240.101:6789/0

 [ceph1][DEBUG ]   ],

 [ceph1][DEBUG ]   monmap: {

 [ceph1][DEBUG ] created: 0.00,

 [ceph1][DEBUG ] epoch: 1,

 [ceph1][DEBUG ] fsid: 099013d5-126d-45b4-a98e-5f0c386805a4,

 [ceph1][DEBUG ] modified: 0.00,

 [ceph1][DEBUG ] mons: [

 [ceph1][DEBUG ]   {

 [ceph1][DEBUG ] addr: 10.18.240.101:6789/0,

 [ceph1][DEBUG ] name: ceph1,

 [ceph1][DEBUG ] rank: 0

 [ceph1][DEBUG ]   }

 [ceph1][DEBUG ] ]

 [ceph1][DEBUG ]   },

 [ceph1][DEBUG ]   name: ceph1,

 [ceph1][DEBUG ]   outside_quorum: [],

 [ceph1][DEBUG ]   quorum: [

 [ceph1][DEBUG ] 0

 [ceph1][DEBUG ]   ],

 [ceph1][DEBUG ]   rank: 0,

 [ceph1][DEBUG ]   state: leader,

 [ceph1][DEBUG ]   sync_provider: []

 [ceph1][DEBUG ] }

 [ceph1][DEBUG ]
 

 [ceph1][INFO  ] monitor: mon.ceph1 is running

 [ceph1][INFO  ] Running command: sudo ceph --cluster=ceph --admin-daemon
 /var/run/ceph/ceph-mon.ceph1.asok mon_status

 [ceph_deploy.mon][INFO  ] processing monitor mon.ceph1

 [ceph1][DEBUG ] connected to host: ceph1

 [ceph1][INFO  ] Running command: sudo ceph --cluster=ceph --admin-daemon
 /var/run/ceph/ceph-mon.ceph1.asok mon_status

 [ceph_deploy.mon][INFO  ] mon.ceph1 monitor has reached quorum!

 [ceph_deploy.mon][INFO  ] all initial monitors are running and have formed
 quorum

 [ceph_deploy.mon][INFO  ] Running gatherkeys...

 [ceph_deploy.gatherkeys][DEBUG ] Checking ceph1 for
 /etc/ceph/ceph.client.admin.keyring

 [ceph1][DEBUG ] connected to host: ceph1

 [ceph1][DEBUG ] detect platform information from remote host

 [ceph1][DEBUG ] detect machine type

 [ceph1][DEBUG ] fetch remote file

 [ceph_deploy.gatherkeys][WARNIN] Unable to find
 /etc/ceph/ceph.client.admin.keyring on ['ceph1']

 [ceph_deploy.gatherkeys][DEBUG ] Have ceph.mon.keyring

 [ceph_deploy.gatherkeys][DEBUG ] Checking ceph1 for
 /var/lib/ceph/bootstrap-osd/ceph.keyring

 [ceph1][DEBUG ] connected to host: ceph1

 [ceph1][DEBUG ] detect platform

Re: [ceph-users] ceph-giant installation error on centos 6.6

2015-02-18 Thread Travis Rhoden

Note that ceph-deploy would enable EPEL for you automatically on
CentOS.  When doing a manual installation, the requirement for EPEL is
called out here:
http://ceph.com/docs/master/install/get-packages/#id8

Though looking at that, we could probably update it to use the now
much easier to use yum install epel-release.  :)

 - Travis

On Wed, Feb 18, 2015 at 12:25 PM, Wenxiao He wenx...@gmail.com wrote:
 Thanks Brad. That solved the problem. I mistakenly assumed all dependencies
 are in http://ceph.com/rpm-giant/el6/x86_64/.


 Regards,
 Wenxiao

 On Tue, Feb 17, 2015 at 10:37 PM, Brad Hubbard bhubb...@redhat.com wrote:

 On 02/18/2015 12:43 PM, Wenxiao He wrote:


 Hello,

 I need some help as I am getting package dependency errors when trying to
 install ceph-giant on centos 6.6. See below for repo files and also the yum
 install output.


 --- Package python-imaging.x86_64 0:1.1.6-19.el6 will be installed
 -- Finished Dependency Resolution
 Error: Package: 1:librbd1-0.87-0.el6.x86_64 (Ceph)
 Requires: liblttng-ust.so.0()(64bit)
 Error: Package: gperftools-libs-2.0-11.el6.3.x86_64 (Ceph)
 Requires: libunwind.so.8()(64bit)
 Error: Package: 1:librados2-0.87-0.el6.x86_64 (Ceph)
 Requires: liblttng-ust.so.0()(64bit)
 Error: Package: 1:ceph-0.87-0.el6.x86_64 (Ceph)
 Requires: liblttng-ust.so.0()(64bit)


 Looks like you may need to install libunwind and lttng-ust from EPEL 6?

 They seem to be the packages that supply liblttng-ust.so and ibunwind.so
 so you
 could try installing those from EPEL 6 and see how that goes?

 Note that this should not be taken as the, or even a, authorative answer
 :)

 Cheers,
 Brad



 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Installation failure

2015-02-16 Thread Travis Rhoden

Hi Paul,

Would you mind sharing/posting the contents of your .repo files for
ceph, ceph-el7, and ceph-noarch repos?

I see that python-rbd is getting pulled in from EPEL, which I don't
think is what you want.

My guess is that you need the fix documented in
http://tracker.ceph.com/issues/10476, though that was specifically
addressing Fedora downstream packaging of Ceph competing with current
upstream packaging hosted on ceph.com repos.  This may be something
similar with EPEL.

 - Travis

On Mon, Feb 16, 2015 at 7:19 AM, HEWLETT, Paul (Paul)** CTR **
paul.hewl...@alcatel-lucent.com wrote:
 Hi all

 I have been installing ceph giant quite happily for the past 3 months on
 various systems and use
 an ansible recipe to do so. The OS is RHEL7.

 This morning on one of my test systems installation fails with:

 [root@octopus ~]# yum install ceph ceph-deploy
 Loaded plugins: langpacks, priorities, product-id, subscription-manager
 Ceph-el7
 |  951 B  00:00:00
 ceph
 |  951 B  00:00:00
 ceph-noarch
 |  951 B  00:00:00
 14 packages excluded due to repository priority protections
 Package ceph-deploy-1.5.21-0.noarch already installed and latest version
 Resolving Dependencies
 -- Running transaction check
 --- Package ceph.x86_64 1:0.87-0.el7.centos will be installed
 -- Processing Dependency: librbd1 = 1:0.87-0.el7.centos for package:
 1:ceph-0.87-0.el7.centos.x86_64
 -- Processing Dependency: ceph-common = 1:0.87-0.el7.centos for package:
 1:ceph-0.87-0.el7.centos.x86_64
 -- Processing Dependency: libcephfs1 = 1:0.87-0.el7.centos for package:
 1:ceph-0.87-0.el7.centos.x86_64
 -- Processing Dependency: python-ceph = 1:0.87-0.el7.centos for package:
 1:ceph-0.87-0.el7.centos.x86_64
 -- Processing Dependency: librados2 = 1:0.87-0.el7.centos for package:
 1:ceph-0.87-0.el7.centos.x86_64
 -- Processing Dependency: python-flask for package:
 1:ceph-0.87-0.el7.centos.x86_64
 -- Processing Dependency: python-requests for package:
 1:ceph-0.87-0.el7.centos.x86_64
 -- Processing Dependency: hdparm for package:
 1:ceph-0.87-0.el7.centos.x86_64
 -- Processing Dependency: libtcmalloc.so.4()(64bit) for package:
 1:ceph-0.87-0.el7.centos.x86_64
 -- Processing Dependency: libleveldb.so.1()(64bit) for package:
 1:ceph-0.87-0.el7.centos.x86_64
 -- Processing Dependency: libcephfs.so.1()(64bit) for package:
 1:ceph-0.87-0.el7.centos.x86_64
 -- Processing Dependency: librados.so.2()(64bit) for package:
 1:ceph-0.87-0.el7.centos.x86_64
 -- Processing Dependency: libboost_system-mt.so.1.53.0()(64bit) for
 package: 1:ceph-0.87-0.el7.centos.x86_64
 -- Processing Dependency: libboost_thread-mt.so.1.53.0()(64bit) for
 package: 1:ceph-0.87-0.el7.centos.x86_64
 -- Running transaction check
 --- Package boost-system.x86_64 0:1.53.0-18.el7 will be installed
 --- Package boost-thread.x86_64 0:1.53.0-18.el7 will be installed
 --- Package ceph-common.x86_64 1:0.87-0.el7.centos will be installed
 -- Processing Dependency: redhat-lsb-core for package:
 1:ceph-common-0.87-0.el7.centos.x86_64
 --- Package gperftools-libs.x86_64 0:2.1-1.el7 will be installed
 -- Processing Dependency: libunwind.so.8()(64bit) for package:
 gperftools-libs-2.1-1.el7.x86_64
 --- Package hdparm.x86_64 0:9.43-5.el7 will be installed
 --- Package leveldb.x86_64 0:1.12.0-5.el7 will be installed
 --- Package libcephfs1.x86_64 1:0.87-0.el7.centos will be installed
 --- Package librados2.x86_64 1:0.87-0.el7.centos will be installed
 --- Package librbd1.x86_64 1:0.87-0.el7.centos will be installed
 --- Package python-ceph-compat.x86_64 1:0.80.7-0.4.el7 will be installed
 -- Processing Dependency: python-rbd = 1:0.80.7 for package:
 1:python-ceph-compat-0.80.7-0.4.el7.x86_64
 -- Processing Dependency: python-rados = 1:0.80.7 for package:
 1:python-ceph-compat-0.80.7-0.4.el7.x86_64
 -- Processing Dependency: python-cephfs = 1:0.80.7 for package:
 1:python-ceph-compat-0.80.7-0.4.el7.x86_64
 --- Package python-flask.noarch 1:0.10.1-4.el7 will be installed
 -- Processing Dependency: python-werkzeug for package:
 1:python-flask-0.10.1-4.el7.noarch
 -- Processing Dependency: python-jinja2 for package:
 1:python-flask-0.10.1-4.el7.noarch
 -- Processing Dependency: python-itsdangerous for package:
 1:python-flask-0.10.1-4.el7.noarch
 --- Package python-requests.noarch 0:1.1.0-8.el7 will be installed
 -- Processing Dependency: python-urllib3 for package:
 python-requests-1.1.0-8.el7.noarch
 -- Running transaction check
 --- Package libunwind.x86_64 0:1.1-3.el7 will be installed
 --- Package python-cephfs.x86_64 1:0.80.7-0.4.el7 will be installed
 -- Processing Dependency: libcephfs1 = 1:0.80.7 for package:
 1:python-cephfs-0.80.7-0.4.el7.x86_64
 --- Package python-itsdangerous.noarch 0:0.23-2.el7 will be installed
 --- Package python-jinja2.noarch 0:2.7.2-2.el7 will be installed
 -- Processing Dependency: python-babel = 0.8 for package:
 python-jinja2-2.7.2-2.el7.noarch
 -- Processing Dependency: python-markupsafe for package:
 python-jinja2-2.7.2-2.el7.noarch
 --- Package

Re: [ceph-users] Installation failure

2015-02-16 Thread Travis Rhoden

Hi Paul,

Looking a bit closer, I do believe it is the same issue.  It looks
like python-rbd in EPEL (and others like python-rados) were updated in
EPEL on January 21st, 2015.  This update included some changes to how
dependencies were handled between EPEL and RHEL for Ceph.  See
http://pkgs.fedoraproject.org/cgit/ceph.git/commit/?h=epel7

Fedora and EPEL both split out the older python-ceph package into
smaller subsets (python-{rados,cephfs,rbd}), but these changes are not
upstream yet (from the ceph.com hosted packages).  So if repos enable
both ceph.com and EPEL, the EPEL packages will override the ceph.com
packages because the RPMs have obsoletes: python-ceph in them, even
though the EPEL packages are older.

It's a bit of a problematic transition period until the upstream
packaging splits in the same way.  I do believe that using
check_obsoletes=1 in /etc/yum/pluginconf.d/priorities.conf will take
care of the problem for you.  However, it may be the case that you
would need to make your ceph .repo files that point to rpm-giant be
priority=1.

That's my best advice of something to try for now.

Thanks,

 - Travis

On Mon, Feb 16, 2015 at 10:16 AM, HEWLETT, Paul (Paul)** CTR **
paul.hewl...@alcatel-lucent.com wrote:
 Hi Travis

 Thanks for the reply.

 My only doubt is that this was all working until this morning. Has anything 
 changed in the Ceph repository?

 I tried commenting out various repos but this did not work.
 If I delete the epel repos than ceph installation fails becuase tcmalloc and 
 leveldb are not found

 My repos are:

 [root@octopus ~]# ls -l /etc/yum.repos.d/
 total 40
 -rw-r--r-- 1 root root   700 Feb 16 12:08 ceph.repo
 -rw-r--r-- 1 root root   957 Nov 25 16:23 epel.repo
 -rw-r--r-- 1 root root  1056 Nov 25 16:23 epel-testing.repo
 -rw-r--r-- 1 root root 26533 Feb 16 11:55 redhat.repo

 and the contents of ceph.repo:

 [ceph]
 name=Ceph packages for $basearch
 baseurl=http://ceph.com/rpm-giant/el7/$basearch
 enabled=1
 priority=2
 gpgcheck=1
 type=rpm-md
 gpgkey=https://ceph.com/git/?p=ceph.git;a=blob_plain;f=keys/release.asc

 [ceph-noarch]
 name=Ceph noarch packages
 baseurl=http://ceph.com/rpm-giant/el7/noarch
 enabled=1
 priority=2
 gpgcheck=1
 type=rpm-md
 gpgkey=https://ceph.com/git/?p=ceph.git;a=blob_plain;f=keys/release.asc

 [ceph-source]
 name=Ceph source packages
 baseurl=http://ceph.com/rpm-giant/el7/SRPMS
 enabled=0
 priority=2
 gpgcheck=1
 type=rpm-md
 gpgkey=https://ceph.com/git/?p=ceph.git;a=blob_plain;f=keys/release.asc

 [Ceph-el7]
 name=Ceph-el7
 baseurl=http://eu.ceph.com/rpms/rhel7/noarch/
 enabled=1
 priority=2
 gpgcheck=0

 [root@octopus ~]# cat /etc/yum.repos.d/epel.repo
 [epel]
 name=Extra Packages for Enterprise Linux 7 - $basearch
 #baseurl=http://download.fedoraproject.org/pub/epel/7/$basearch
 mirrorlist=https://mirrors.fedoraproject.org/metalink?repo=epel-7arch=$basearch
 failovermethod=priority
 enabled=1
 gpgcheck=1
 gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-EPEL-7

 [epel-debuginfo]
 name=Extra Packages for Enterprise Linux 7 - $basearch - Debug
 #baseurl=http://download.fedoraproject.org/pub/epel/7/$basearch/debug
 mirrorlist=https://mirrors.fedoraproject.org/metalink?repo=epel-debug-7arch=$basearch
 failovermethod=priority
 enabled=0
 gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-EPEL-7
 gpgcheck=1

 [epel-source]
 name=Extra Packages for Enterprise Linux 7 - $basearch - Source
 #baseurl=http://download.fedoraproject.org/pub/epel/7/SRPMS
 mirrorlist=https://mirrors.fedoraproject.org/metalink?repo=epel-source-7arch=$basearch
 failovermethod=priority
 enabled=0
 gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-EPEL-7
 gpgcheck=1
 [root@octopus ~]# cat /etc/yum.repos.d/epel-testing.repo
 [epel-testing]
 name=Extra Packages for Enterprise Linux 7 - Testing - $basearch
 #baseurl=http://download.fedoraproject.org/pub/epel/testing/7/$basearch
 mirrorlist=https://mirrors.fedoraproject.org/metalink?repo=testing-epel7arch=$basearch
 failovermethod=priority
 enabled=0
 gpgcheck=1
 gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-EPEL-7

 [epel-testing-debuginfo]
 name=Extra Packages for Enterprise Linux 7 - Testing - $basearch - Debug
 #baseurl=http://download.fedoraproject.org/pub/epel/testing/7/$basearch/debug
 mirrorlist=https://mirrors.fedoraproject.org/metalink?repo=testing-debug-epel7arch=$basearch
 failovermethod=priority
 enabled=0
 gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-EPEL-7
 gpgcheck=1

 [epel-testing-source]
 name=Extra Packages for Enterprise Linux 7 - Testing - $basearch - Source
 #baseurl=http://download.fedoraproject.org/pub/epel/testing/7/SRPMS
 mirrorlist=https://mirrors.fedoraproject.org/metalink?repo=testing-source-epel7arch=$basearch
 failovermethod=priority
 enabled=0
 gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-EPEL-7
 gpgcheck=1

 Regards
 Paul Hewlett
 Senior Systems Engineer
 Velocix, Cambridge
 Alcatel-Lucent
 t: +44 1223 435893 m: +44 7985327353



 
 From: Travis Rhoden [trho...@gmail.com]
 Sent

Re: [ceph-users] error in sys.exitfunc

2015-01-30 Thread Travis Rhoden

Hi Karl,

Sorry that I missed this go by.  If you are still hitting this issue,
I'd like to help you and figure this one out, especially since you are
not the only person to have hit it.

Can you pass along your system details, (OS, version, etc.).

I'd also like to know how you installed ceph-deploy (via RPM, or pip?).

 - Travis

On Tue, Jan 20, 2015 at 10:46 AM, Blake, Karl D karl.d.bl...@intel.com wrote:
 Error is same as this posted link -
 http://www.spinics.net/lists/ceph-devel/msg21388.html



 From: Blake, Karl D
 Sent: Tuesday, January 20, 2015 4:29 AM
 To: ceph-us...@ceph.com
 Subject: RE: error in sys.exitfunc



 Please advise.



 Thanks,

 -Karl



 From: Blake, Karl D
 Sent: Monday, January 19, 2015 7:23 PM
 To: 'ceph-us...@ceph.com'
 Subject: error in sys.exitfunc



 Anytime I run Ceph-deploy I get the above error. Can you help resolve?



 Thanks,

 -Karl


 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] RHEL 7 Installs

2015-01-09 Thread Travis Rhoden

Hi John,

For the last part, there being two different versions of packages in
Giant, I don't think that's the actual problem.

What's really happening there is that python-ceph has been obsoleted
by other packages that are getting picked up by Yum.  See the line
that says Package python-ceph is obsoleted by python-rados...

It's the same deal as http://tracker.ceph.com/issues/10476

You could try the same fix there.

On Fri, Jan 9, 2015 at 4:50 PM, John Wilkins john.wilk...@inktank.com wrote:
 Ken,

 I had a number of issues installing Ceph on RHEL 7, which I think are
 mostly due to dependencies. I followed the quick start guide, which
 gets the latest major release--e.g., Firefly, Giant.

 ceph.conf is here: http://goo.gl/LNjFp3
 ceph.log common errors included: http://goo.gl/yL8UsM

 To resolve these, I had to download and install libunwind and python-jinja2.

 It also seems that the Giant repo had 0.86 and 0.87 packages for
 python-ceph, and ceph-deploy didn't like that.

 ceph.log error: http://goo.gl/oeKGUv

 To resolve this, I had to download and install python-ceph v0.87.
 Then, run the ceph-deploy install command again.


 --
 John Wilkins
 Red Hat
 jowil...@redhat.com
 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] ceph-deploy dependency errors on fc20 with firefly

2015-01-08 Thread Travis Rhoden

Hi Noah,

The root cause has been found.  Please see
http://tracker.ceph.com/issues/10476 for details.

In short, it's an issue between RPM obsoletes and yum priorities
plugin.  Final solution is pending, but details of a work around are
in the issue comments.

 - Travis

On Wed, Jan 7, 2015 at 4:05 PM, Travis Rhoden trho...@gmail.com wrote:
 Hi Noah,

 I'll try to recreate this on a fresh FC20 install as well.  Looks to
 me like there might be a repo priority issue.  It's mixing packages
 from Fedora downstream repos and the ceph.com upstream repos.  That's
 not supposed to happen.

  - Travis

 On Wed, Jan 7, 2015 at 2:15 PM, Noah Watkins noah.watk...@inktank.com wrote:
 I'm trying to install Firefly on an up-to-date FC20 box. I'm getting
 the following errors:

 [nwatkins@kyoto cluster]$ ../ceph-deploy/ceph-deploy install --release
 firefly kyoto
 [ceph_deploy.conf][DEBUG ] found configuration file at:
 /home/nwatkins/.cephdeploy.conf
 [ceph_deploy.cli][INFO  ] Invoked (1.5.21): ../ceph-deploy/ceph-deploy
 install --release firefly kyoto
 [ceph_deploy.install][DEBUG ] Installing stable version firefly on
 cluster ceph hosts kyoto
 [ceph_deploy.install][DEBUG ] Detecting platform for host kyoto ...
 [kyoto][DEBUG ] connection detected need for sudo
 [kyoto][DEBUG ] connected to host: kyoto
 [kyoto][DEBUG ] detect platform information from remote host
 [kyoto][DEBUG ] detect machine type
 [ceph_deploy.install][INFO  ] Distro info: Fedora 20 Heisenbug
 [kyoto][INFO  ] installing ceph on kyoto
 [kyoto][INFO  ] Running command: sudo yum -y install yum-plugin-priorities
 [kyoto][DEBUG ] Loaded plugins: langpacks, priorities, refresh-packagekit
 [kyoto][DEBUG ] Package yum-plugin-priorities-1.1.31-27.fc20.noarch
 already installed and latest version
 [kyoto][DEBUG ] Nothing to do
 [kyoto][INFO  ] Running command: sudo rpm --import
 https://ceph.com/git/?p=ceph.git;a=blob_plain;f=keys/release.asc
 [kyoto][INFO  ] Running command: sudo rpm -Uvh --replacepkgs --force
 --quiet 
 http://ceph.com/rpm-firefly/fc20/noarch/ceph-release-1-0.fc20.noarch.rpm
 [kyoto][DEBUG ] 
 [kyoto][DEBUG ] Updating / installing...
 [kyoto][DEBUG ] 
 [kyoto][WARNIN] ensuring that /etc/yum.repos.d/ceph.repo contains a
 high priority
 [kyoto][WARNIN] altered ceph.repo priorities to contain: priority=1
 [kyoto][INFO  ] Running command: sudo yum -y -q install ceph
 [kyoto][WARNIN] Error: Package: 1:python-cephfs-0.80.7-1.fc20.x86_64 
 (updates)
 [kyoto][WARNIN]Requires: libcephfs1 = 1:0.80.7-1.fc20
 [kyoto][WARNIN]Available: libcephfs1-0.80.1-0.fc20.x86_64 (Ceph)
 [kyoto][DEBUG ]  You could try using --skip-broken to work around the problem
 [kyoto][WARNIN]libcephfs1 = 0.80.1-0.fc20
 [kyoto][WARNIN]Available: libcephfs1-0.80.3-0.fc20.x86_64 (Ceph)
 [kyoto][WARNIN]libcephfs1 = 0.80.3-0.fc20
 [kyoto][WARNIN]Available: libcephfs1-0.80.4-0.fc20.x86_64 (Ceph)
 [kyoto][WARNIN]libcephfs1 = 0.80.4-0.fc20
 [kyoto][WARNIN]Available: libcephfs1-0.80.5-0.fc20.x86_64 (Ceph)
 [kyoto][WARNIN]libcephfs1 = 0.80.5-0.fc20
 [kyoto][WARNIN]Available: libcephfs1-0.80.6-0.fc20.x86_64 (Ceph)
 [kyoto][WARNIN]libcephfs1 = 0.80.6-0.fc20
 [kyoto][WARNIN]Installing: libcephfs1-0.80.7-0.fc20.x86_64 (Ceph)
 [kyoto][WARNIN]libcephfs1 = 0.80.7-0.fc20
 [kyoto][WARNIN] Error: Package: 1:python-rbd-0.80.7-1.fc20.x86_64 (updates)
 [kyoto][WARNIN]Requires: librbd1 = 1:0.80.7-1.fc20
 [kyoto][WARNIN]Available: librbd1-0.80.1-0.fc20.x86_64 (Ceph)
 [kyoto][WARNIN]librbd1 = 0.80.1-0.fc20
 [kyoto][WARNIN]Available: librbd1-0.80.3-0.fc20.x86_64 (Ceph)
 [kyoto][WARNIN]librbd1 = 0.80.3-0.fc20
 [kyoto][WARNIN]Available: librbd1-0.80.4-0.fc20.x86_64 (Ceph)
 [kyoto][WARNIN]librbd1 = 0.80.4-0.fc20
 [kyoto][WARNIN]Available: librbd1-0.80.5-0.fc20.x86_64 (Ceph)
 [kyoto][WARNIN]librbd1 = 0.80.5-0.fc20
 [kyoto][WARNIN]Available: librbd1-0.80.6-0.fc20.x86_64 (Ceph)
 [kyoto][WARNIN]librbd1 = 0.80.6-0.fc20
 [kyoto][WARNIN]Installing: librbd1-0.80.7-0.fc20.x86_64 (Ceph)
 [kyoto][WARNIN]librbd1 = 0.80.7-0.fc20
 [kyoto][WARNIN] Error: Package: 1:python-rados-0.80.7-1.fc20.x86_64 (updates)
 [kyoto][WARNIN]Requires: librados2 = 1:0.80.7-1.fc20
 [kyoto][WARNIN]Available: librados2-0.80.1-0.fc20.x86_64 (Ceph)
 [kyoto][WARNIN]librados2 = 0.80.1-0.fc20
 [kyoto][WARNIN]Available: librados2-0.80.3-0.fc20.x86_64 (Ceph)
 [kyoto][WARNIN]librados2 = 0.80.3-0.fc20
 [kyoto][WARNIN]Available: librados2-0.80.4-0.fc20.x86_64 (Ceph)
 [kyoto][WARNIN]librados2 = 0.80.4-0.fc20
 [kyoto][WARNIN

Re: [ceph-users] Ceph on Centos 7

2015-01-07 Thread Travis Rhoden

Hello,

Can you give the link the exact instructions you followed?

For CentOS7 (EL7) ceph-extras should not be necessary.  The instructions at
[1] do not have you enabled the ceph-extras repo.  You will find that there
are EL7 packages at [2].  I recently found a README that was incorrectly
referencing ceph-extras when it came to ceph-deploy.  I'm wondering if
there may be other incorrect instructions floating around. I'm guessing the
confusion may be coming from [3].  I think a note should be added there
that ceph-extras is not needed for EL7.  Right now it just says this is
needed for some Ceph deployments, but as you have found, if you enable it
on EL7, it won't work.

Can you try removing the ceph-extra repo definition and see if that fixes
things?

 - Travis


[1]
http://ceph.com/docs/master/start/quick-start-preflight/#red-hat-package-manager-rpm
[2] http://ceph.com/rpm-giant/
[3] http://ceph.com/docs/master/install/get-packages/#add-ceph-extras

On Tue, Jan 6, 2015 at 2:40 AM, Nur Aqilah aqi...@impact-multimedia.com
wrote:

 Hi all,

 I was wondering if anyone can give me some guidelines in installing ceph
 on Centos 7. I followed the guidelines on ceph.com on how to do the Quick
 Installation. But there was always this one particular error. When i typed
 in this command sudo yum update  sudo yum install ceph-deploy a long
 error pops up. I later checked and found out that el7/CentOS 7 is not
 listed in here http://ceph.com/packages/ceph-extras/rpm/

 Together attached is a screenshot of the error that i was talking about. I
 would really appreciate it if someone would kindly help me out

 Thank you and regards,

 *Nur Aqilah Abdul Rahman*

 Systems Engineer

 *impact* *business solutions Sdn Bhd*

 E303, Level 3 East Wing Metropolitan Square,
 Jalan PJU 8/1, Damansara Perdana,
 47820 Petaling Jaya, Selangor Darul Ehsan

 P: 03 7728 6826
 F: 03 7728 5826

 Thanks  Regards,

 [image: Email-Signature_Updated240713]

 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] ceph-deploy dependency errors on fc20 with firefly

2015-01-07 Thread Travis Rhoden

Hi Noah,

I'll try to recreate this on a fresh FC20 install as well.  Looks to
me like there might be a repo priority issue.  It's mixing packages
from Fedora downstream repos and the ceph.com upstream repos.  That's
not supposed to happen.

 - Travis

On Wed, Jan 7, 2015 at 2:15 PM, Noah Watkins noah.watk...@inktank.com wrote:
 I'm trying to install Firefly on an up-to-date FC20 box. I'm getting
 the following errors:

 [nwatkins@kyoto cluster]$ ../ceph-deploy/ceph-deploy install --release
 firefly kyoto
 [ceph_deploy.conf][DEBUG ] found configuration file at:
 /home/nwatkins/.cephdeploy.conf
 [ceph_deploy.cli][INFO  ] Invoked (1.5.21): ../ceph-deploy/ceph-deploy
 install --release firefly kyoto
 [ceph_deploy.install][DEBUG ] Installing stable version firefly on
 cluster ceph hosts kyoto
 [ceph_deploy.install][DEBUG ] Detecting platform for host kyoto ...
 [kyoto][DEBUG ] connection detected need for sudo
 [kyoto][DEBUG ] connected to host: kyoto
 [kyoto][DEBUG ] detect platform information from remote host
 [kyoto][DEBUG ] detect machine type
 [ceph_deploy.install][INFO  ] Distro info: Fedora 20 Heisenbug
 [kyoto][INFO  ] installing ceph on kyoto
 [kyoto][INFO  ] Running command: sudo yum -y install yum-plugin-priorities
 [kyoto][DEBUG ] Loaded plugins: langpacks, priorities, refresh-packagekit
 [kyoto][DEBUG ] Package yum-plugin-priorities-1.1.31-27.fc20.noarch
 already installed and latest version
 [kyoto][DEBUG ] Nothing to do
 [kyoto][INFO  ] Running command: sudo rpm --import
 https://ceph.com/git/?p=ceph.git;a=blob_plain;f=keys/release.asc
 [kyoto][INFO  ] Running command: sudo rpm -Uvh --replacepkgs --force
 --quiet 
 http://ceph.com/rpm-firefly/fc20/noarch/ceph-release-1-0.fc20.noarch.rpm
 [kyoto][DEBUG ] 
 [kyoto][DEBUG ] Updating / installing...
 [kyoto][DEBUG ] 
 [kyoto][WARNIN] ensuring that /etc/yum.repos.d/ceph.repo contains a
 high priority
 [kyoto][WARNIN] altered ceph.repo priorities to contain: priority=1
 [kyoto][INFO  ] Running command: sudo yum -y -q install ceph
 [kyoto][WARNIN] Error: Package: 1:python-cephfs-0.80.7-1.fc20.x86_64 (updates)
 [kyoto][WARNIN]Requires: libcephfs1 = 1:0.80.7-1.fc20
 [kyoto][WARNIN]Available: libcephfs1-0.80.1-0.fc20.x86_64 (Ceph)
 [kyoto][DEBUG ]  You could try using --skip-broken to work around the problem
 [kyoto][WARNIN]libcephfs1 = 0.80.1-0.fc20
 [kyoto][WARNIN]Available: libcephfs1-0.80.3-0.fc20.x86_64 (Ceph)
 [kyoto][WARNIN]libcephfs1 = 0.80.3-0.fc20
 [kyoto][WARNIN]Available: libcephfs1-0.80.4-0.fc20.x86_64 (Ceph)
 [kyoto][WARNIN]libcephfs1 = 0.80.4-0.fc20
 [kyoto][WARNIN]Available: libcephfs1-0.80.5-0.fc20.x86_64 (Ceph)
 [kyoto][WARNIN]libcephfs1 = 0.80.5-0.fc20
 [kyoto][WARNIN]Available: libcephfs1-0.80.6-0.fc20.x86_64 (Ceph)
 [kyoto][WARNIN]libcephfs1 = 0.80.6-0.fc20
 [kyoto][WARNIN]Installing: libcephfs1-0.80.7-0.fc20.x86_64 (Ceph)
 [kyoto][WARNIN]libcephfs1 = 0.80.7-0.fc20
 [kyoto][WARNIN] Error: Package: 1:python-rbd-0.80.7-1.fc20.x86_64 (updates)
 [kyoto][WARNIN]Requires: librbd1 = 1:0.80.7-1.fc20
 [kyoto][WARNIN]Available: librbd1-0.80.1-0.fc20.x86_64 (Ceph)
 [kyoto][WARNIN]librbd1 = 0.80.1-0.fc20
 [kyoto][WARNIN]Available: librbd1-0.80.3-0.fc20.x86_64 (Ceph)
 [kyoto][WARNIN]librbd1 = 0.80.3-0.fc20
 [kyoto][WARNIN]Available: librbd1-0.80.4-0.fc20.x86_64 (Ceph)
 [kyoto][WARNIN]librbd1 = 0.80.4-0.fc20
 [kyoto][WARNIN]Available: librbd1-0.80.5-0.fc20.x86_64 (Ceph)
 [kyoto][WARNIN]librbd1 = 0.80.5-0.fc20
 [kyoto][WARNIN]Available: librbd1-0.80.6-0.fc20.x86_64 (Ceph)
 [kyoto][WARNIN]librbd1 = 0.80.6-0.fc20
 [kyoto][WARNIN]Installing: librbd1-0.80.7-0.fc20.x86_64 (Ceph)
 [kyoto][WARNIN]librbd1 = 0.80.7-0.fc20
 [kyoto][WARNIN] Error: Package: 1:python-rados-0.80.7-1.fc20.x86_64 (updates)
 [kyoto][WARNIN]Requires: librados2 = 1:0.80.7-1.fc20
 [kyoto][WARNIN]Available: librados2-0.80.1-0.fc20.x86_64 (Ceph)
 [kyoto][WARNIN]librados2 = 0.80.1-0.fc20
 [kyoto][WARNIN]Available: librados2-0.80.3-0.fc20.x86_64 (Ceph)
 [kyoto][WARNIN]librados2 = 0.80.3-0.fc20
 [kyoto][WARNIN]Available: librados2-0.80.4-0.fc20.x86_64 (Ceph)
 [kyoto][WARNIN]librados2 = 0.80.4-0.fc20
 [kyoto][WARNIN]Available: librados2-0.80.5-0.fc20.x86_64 (Ceph)
 [kyoto][WARNIN]librados2 = 0.80.5-0.fc20
 [kyoto][WARNIN]Available: librados2-0.80.6-0.fc20.x86_64 (Ceph)
 [kyoto][WARNIN]librados2 = 0.80.6-0.fc20
 [kyoto][WARNIN]Installing: librados2-0.80.7-0.fc20.x86_64 (Ceph)
 [kyoto][WARNIN]

Re: [ceph-users] ceph-deploy Errors - Fedora 21

2015-01-02 Thread Travis Rhoden

Hello,

I believe this is a problem specific to Fedora packaging.  The Fedora
package for ceph-deploy is a bit different than the ones hosted at ceph.com.
Can you please tell me the output of rpm -q python-remoto?

I believe the problem is that the python-remoto package is too old, and
there is not a correct dependency on it when it comes to versions.  The
minimum version should be 0.0.22, but the latest in Fedora is 0.0.21 (and
latest upstream is 0.0.23).  I'll push to get this updated correctly.  The
Fedora package maintainers will need to put out a new release of
python-remoto, and hopefully update the spec file for ceph-deploy to
require = 0.0.22.

 - Travis

On Mon, Dec 29, 2014 at 10:24 PM, deeepdish deeepd...@gmail.com wrote:

 Hello.

 I’m having an issue with ceph-deploy on Fedora 21.

 - Installed ceph-deploy via ‘yum install ceph-deploy'
 - created non-root user
 - assigned sudo privs as per documentation -
 http://ceph.com/docs/master/rados/deployment/preflight-checklist/

 $ ceph-deploy install smg01.erbus.kupsta.net
 [ceph_deploy.conf][DEBUG ] found configuration file at:
 /cephfs/.cephdeploy.conf
 [ceph_deploy.cli][INFO  ] Invoked (1.5.20): /bin/ceph-deploy install
 [hostname]
 [ceph_deploy.install][DEBUG ] Installing stable version firefly on
 cluster ceph hosts [hostname]
 [ceph_deploy.install][DEBUG ] Detecting platform for host [hostname] ...
 [ceph_deploy][ERROR ] RuntimeError: connecting to
 host: [hostname] resulted in errors: TypeError __init__() got an unexpected
 keyword argument 'detect_sudo'


 Thank you.

 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Ceph-deploy install and pinning on Ubuntu 14.04

2015-01-02 Thread Travis Rhoden

Hi Giuseppe,

ceph-deploy does try to do some pinning for the Ceph packages.  Those
settings should be found at /etc/apt/preferences.d/ceph.pref

If you find something is incorrect there, please let us know what it is and
we can can look into it!

 - Travis

On Sat, Dec 20, 2014 at 11:32 AM, Giuseppe Civitella 
giuseppe.civite...@gmail.com wrote:

 Hi all,

 I'm using deph-deploy on Ubuntu 14.04. When I do a ceph-deploy install I
 see packages getting installed from ubuntu repositories instead of ceph's
 ones, am I missing something? Do I need to do some pinning on repositories?

 Thanks

 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Ceph Block device and Trim/Discard

2014-12-18 Thread Travis Rhoden

One question re: discard support for kRBD -- does it matter which format
the RBD is?  Format 1 and Format 2 are okay, or just for Format 2?

 - Travis

On Mon, Dec 15, 2014 at 8:58 AM, Max Power 
mailli...@ferienwohnung-altenbeken.de wrote:

  Ilya Dryomov ilya.dryo...@inktank.com hat am 12. Dezember 2014 um
 18:00
  geschrieben:
  Just a note, discard support went into 3.18, which was released a few
  days ago.

 I recently compiled 3.18 on Debian 7 and what do I have to say... It works
 perfectly well. The used memory goes up and down again. So I think this
 will be
 my choice. Thank you!
 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] [ANN] ceph-deploy 1.5.21 released

2014-12-10 Thread Travis Rhoden

Hi All,

This is a new release of ceph-deploy that defaults to installing the
Giant release of Ceph.

Additionally, there are a couple of bug fixes that makes sure that
calls to 'gatherkeys' returns non-zero upon failure, and that the EPEL
repo is properly enabled as a prerequisite to installation on CentOS
and Scientific Linux distros.

The full changelog can be seen here:
http://ceph.com/ceph-deploy/docs/changelog.html#id1

Please update!

 - Travis
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Dependency issues in fresh ceph/CentOS 7 install

2014-11-25 Thread Travis Rhoden

Hi Massimiliano,

On Tue, Nov 25, 2014 at 6:02 AM, Massimiliano Cuttini m...@phoenixweb.it
wrote:

  Hi travis,

 can I have a develop account or tester account in order to submit issue by
 myself?


Registration for the Ceph tracker is open -- anyone can sign up for an
account to report issues.

If you visit http://tracker.ceph.com, in the top right-hand corner is a
link for Register.

Hope that helps!

 - Travis


 Thanks,
 Massimiliano Cuttini


 Il 18/11/2014 23:03, Travis Rhoden ha scritto:

 I've captured this at http://tracker.ceph.com/issues/10133

 On Tue, Nov 18, 2014 at 4:48 PM, Travis Rhoden trho...@gmail.com wrote:

 Hi Massimiliano,

  I just recreated this bug myself.  Ceph-deploy is supposed to install
 EPEL automatically on the platforms that need it.  I just confirmed that it
 is not doing so, and will be opening up a bug in the Ceph tracker.  I'll
 paste it here when I do so you can follow it.  Thanks for the report!

   - Travis

 On Tue, Nov 18, 2014 at 4:41 PM, Massimiliano Cuttini m...@phoenixweb.it
 wrote:

  I solved by installing EPEL repo on yum.
 I think that somebody should write down in the documentation that EPEL
 is mandatory



 Il 18/11/2014 14:29, Massimiliano Cuttini ha scritto:

  Dear all,

 i try to install ceph but i get errors:

 #ceph-deploy install node1
 []
 [ceph_deploy.install][DEBUG ] Installing stable version *firefly *on
 cluster ceph hosts node1
 [ceph_deploy.install][DEBUG ] Detecting platform for host node1 ...
 []
 [node1][DEBUG ] --- Pacchetto libXxf86vm.x86_64 0:1.1.3-2.1.el7 settato
 per essere installato
 [node1][DEBUG ] --- Pacchetto mesa-libgbm.x86_64
 0:9.2.5-6.20131218.el7_0 settato per essere installato
 [node1][DEBUG ] --- Pacchetto mesa-libglapi.x86_64
 0:9.2.5-6.20131218.el7_0 settato per essere installato
 [node1][DEBUG ] -- Risoluzione delle dipendenze completata
 [node1][WARNIN] Errore: Pacchetto:
 ceph-common-0.80.7-0.el7.centos.x86_64 (Ceph)
 [node1][WARNIN] Richiede: libtcmalloc.so.4()(64bit)
 [node1][WARNIN] Errore: Pacchetto: ceph-0.80.7-0.el7.centos.x86_64 (Ceph)
 [node1][DEBUG ]  Si può provare ad usare --skip-broken per aggirare il
 problema
 [node1][WARNIN] Richiede: libleveldb.so.1()(64bit)
 [node1][WARNIN] Errore: Pacchetto: ceph-0.80.7-0.el7.centos.x86_64 (Ceph)
 [node1][WARNIN] Richiede: libtcmalloc.so.4()(64bit)
 [node1][DEBUG ]  Provare ad eseguire: rpm -Va --nofiles --nodigest
 [node1][ERROR ] RuntimeError: command returned non-zero exit status: 1
 *[ceph_deploy][ERROR ] RuntimeError: Failed to execute command: yum -y
 install ceph*

 I installed GIANT version not FIREFLY on admin-node.
 Is it a typo error in the config file or is it truly trying to install
 FIREFLY instead of GIANT.

 About the error, i see that it's related to wrong python default
 libraries.
 It seems that CEPH require libraries not available in the current distro:

 [node1][WARNIN] Richiede: libtcmalloc.so.4()(64bit)
 [node1][WARNIN] Richiede: libleveldb.so.1()(64bit)
 [node1][WARNIN] Richiede: libtcmalloc.so.4()(64bit)

 This seems strange.
 Can you fix this?


 Thanks,
 Massimiliano Cuttini





  ___
 ceph-users mailing 
 listceph-us...@lists.ceph.comhttp://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com






___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Dependency issues in fresh ceph/CentOS 7 install

2014-11-19 Thread Travis Rhoden

Hi Massimiliano,

On Tue, Nov 18, 2014 at 5:23 PM, Massimiliano Cuttini m...@phoenixweb.it
wrote:

  Then.
 ...very good! :)

 Ok, the next bad thing is that I have installed GIANT on Admin node.

However ceph-deploy ignore ADMIN node installation and install FIREFLY.
 Now i have ceph-deploy of Giant on my ADMIN node and my first OSD node
 with FIREFLY.


How did you do the install on the ADMIN node?  was it using ceph-deploy, or
installing manually?  ceph-deploy does indeed still default to Firefly, but
it will change in the next version to Giant. The ceph-deploy admin
command should only push keys and config files, so it doesn't do an actual
install of packages.  A call to ceph-deploy install would have installed
Firefly, unless providing the --release giant option.

It seems to me odd. Is it fine or i should prepare myself to format again?


That will depend on your goals.  A mixed version cluster is viable, but if
you want Giant everywhere, you'll need to upgrade the packages on the node
running your OSDs and restart the OSDs themselves.  An actual disk
re-format is not necessary.

 - Travis




 Il 18/11/2014 23:03, Travis Rhoden ha scritto:

 I've captured this at http://tracker.ceph.com/issues/10133

 On Tue, Nov 18, 2014 at 4:48 PM, Travis Rhoden trho...@gmail.com wrote:

 Hi Massimiliano,

  I just recreated this bug myself.  Ceph-deploy is supposed to install
 EPEL automatically on the platforms that need it.  I just confirmed that it
 is not doing so, and will be opening up a bug in the Ceph tracker.  I'll
 paste it here when I do so you can follow it.  Thanks for the report!

   - Travis

 On Tue, Nov 18, 2014 at 4:41 PM, Massimiliano Cuttini m...@phoenixweb.it
 wrote:

  I solved by installing EPEL repo on yum.
 I think that somebody should write down in the documentation that EPEL
 is mandatory



 Il 18/11/2014 14:29, Massimiliano Cuttini ha scritto:

  Dear all,

 i try to install ceph but i get errors:

 #ceph-deploy install node1
 []
 [ceph_deploy.install][DEBUG ] Installing stable version *firefly *on
 cluster ceph hosts node1
 [ceph_deploy.install][DEBUG ] Detecting platform for host node1 ...
 []
 [node1][DEBUG ] --- Pacchetto libXxf86vm.x86_64 0:1.1.3-2.1.el7 settato
 per essere installato
 [node1][DEBUG ] --- Pacchetto mesa-libgbm.x86_64
 0:9.2.5-6.20131218.el7_0 settato per essere installato
 [node1][DEBUG ] --- Pacchetto mesa-libglapi.x86_64
 0:9.2.5-6.20131218.el7_0 settato per essere installato
 [node1][DEBUG ] -- Risoluzione delle dipendenze completata
 [node1][WARNIN] Errore: Pacchetto:
 ceph-common-0.80.7-0.el7.centos.x86_64 (Ceph)
 [node1][WARNIN] Richiede: libtcmalloc.so.4()(64bit)
 [node1][WARNIN] Errore: Pacchetto: ceph-0.80.7-0.el7.centos.x86_64 (Ceph)
 [node1][DEBUG ]  Si può provare ad usare --skip-broken per aggirare il
 problema
 [node1][WARNIN] Richiede: libleveldb.so.1()(64bit)
 [node1][WARNIN] Errore: Pacchetto: ceph-0.80.7-0.el7.centos.x86_64 (Ceph)
 [node1][WARNIN] Richiede: libtcmalloc.so.4()(64bit)
 [node1][DEBUG ]  Provare ad eseguire: rpm -Va --nofiles --nodigest
 [node1][ERROR ] RuntimeError: command returned non-zero exit status: 1
 *[ceph_deploy][ERROR ] RuntimeError: Failed to execute command: yum -y
 install ceph*

 I installed GIANT version not FIREFLY on admin-node.
 Is it a typo error in the config file or is it truly trying to install
 FIREFLY instead of GIANT.

 About the error, i see that it's related to wrong python default
 libraries.
 It seems that CEPH require libraries not available in the current distro:

 [node1][WARNIN] Richiede: libtcmalloc.so.4()(64bit)
 [node1][WARNIN] Richiede: libleveldb.so.1()(64bit)
 [node1][WARNIN] Richiede: libtcmalloc.so.4()(64bit)

 This seems strange.
 Can you fix this?


 Thanks,
 Massimiliano Cuttini





  ___
 ceph-users mailing 
 listceph-us...@lists.ceph.comhttp://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com






___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Dependency issues in fresh ceph/CentOS 7 install

2014-11-18 Thread Travis Rhoden

Hi Massimiliano,

I just recreated this bug myself.  Ceph-deploy is supposed to install EPEL
automatically on the platforms that need it.  I just confirmed that it is
not doing so, and will be opening up a bug in the Ceph tracker.  I'll paste
it here when I do so you can follow it.  Thanks for the report!

 - Travis

On Tue, Nov 18, 2014 at 4:41 PM, Massimiliano Cuttini m...@phoenixweb.it
wrote:

  I solved by installing EPEL repo on yum.
 I think that somebody should write down in the documentation that EPEL is
 mandatory



 Il 18/11/2014 14:29, Massimiliano Cuttini ha scritto:

 Dear all,

 i try to install ceph but i get errors:

 #ceph-deploy install node1
 []
 [ceph_deploy.install][DEBUG ] Installing stable version *firefly *on
 cluster ceph hosts node1
 [ceph_deploy.install][DEBUG ] Detecting platform for host node1 ...
 []
 [node1][DEBUG ] --- Pacchetto libXxf86vm.x86_64 0:1.1.3-2.1.el7 settato
 per essere installato
 [node1][DEBUG ] --- Pacchetto mesa-libgbm.x86_64 0:9.2.5-6.20131218.el7_0
 settato per essere installato
 [node1][DEBUG ] --- Pacchetto mesa-libglapi.x86_64
 0:9.2.5-6.20131218.el7_0 settato per essere installato
 [node1][DEBUG ] -- Risoluzione delle dipendenze completata
 [node1][WARNIN] Errore: Pacchetto: ceph-common-0.80.7-0.el7.centos.x86_64
 (Ceph)
 [node1][WARNIN] Richiede: libtcmalloc.so.4()(64bit)
 [node1][WARNIN] Errore: Pacchetto: ceph-0.80.7-0.el7.centos.x86_64 (Ceph)
 [node1][DEBUG ]  Si può provare ad usare --skip-broken per aggirare il
 problema
 [node1][WARNIN] Richiede: libleveldb.so.1()(64bit)
 [node1][WARNIN] Errore: Pacchetto: ceph-0.80.7-0.el7.centos.x86_64 (Ceph)
 [node1][WARNIN] Richiede: libtcmalloc.so.4()(64bit)
 [node1][DEBUG ]  Provare ad eseguire: rpm -Va --nofiles --nodigest
 [node1][ERROR ] RuntimeError: command returned non-zero exit status: 1
 *[ceph_deploy][ERROR ] RuntimeError: Failed to execute command: yum -y
 install ceph*

 I installed GIANT version not FIREFLY on admin-node.
 Is it a typo error in the config file or is it truly trying to install
 FIREFLY instead of GIANT.

 About the error, i see that it's related to wrong python default libraries.
 It seems that CEPH require libraries not available in the current distro:

 [node1][WARNIN] Richiede: libtcmalloc.so.4()(64bit)
 [node1][WARNIN] Richiede: libleveldb.so.1()(64bit)
 [node1][WARNIN] Richiede: libtcmalloc.so.4()(64bit)

 This seems strange.
 Can you fix this?


 Thanks,
 Massimiliano Cuttini





 ___
 ceph-users mailing 
 listceph-us...@lists.ceph.comhttp://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Dependency issues in fresh ceph/CentOS 7 install

2014-11-18 Thread Travis Rhoden

I've captured this at http://tracker.ceph.com/issues/10133

On Tue, Nov 18, 2014 at 4:48 PM, Travis Rhoden trho...@gmail.com wrote:

 Hi Massimiliano,

 I just recreated this bug myself.  Ceph-deploy is supposed to install EPEL
 automatically on the platforms that need it.  I just confirmed that it is
 not doing so, and will be opening up a bug in the Ceph tracker.  I'll paste
 it here when I do so you can follow it.  Thanks for the report!

  - Travis

 On Tue, Nov 18, 2014 at 4:41 PM, Massimiliano Cuttini m...@phoenixweb.it
 wrote:

  I solved by installing EPEL repo on yum.
 I think that somebody should write down in the documentation that EPEL is
 mandatory



 Il 18/11/2014 14:29, Massimiliano Cuttini ha scritto:

 Dear all,

 i try to install ceph but i get errors:

 #ceph-deploy install node1
 []
 [ceph_deploy.install][DEBUG ] Installing stable version *firefly *on
 cluster ceph hosts node1
 [ceph_deploy.install][DEBUG ] Detecting platform for host node1 ...
 []
 [node1][DEBUG ] --- Pacchetto libXxf86vm.x86_64 0:1.1.3-2.1.el7 settato
 per essere installato
 [node1][DEBUG ] --- Pacchetto mesa-libgbm.x86_64
 0:9.2.5-6.20131218.el7_0 settato per essere installato
 [node1][DEBUG ] --- Pacchetto mesa-libglapi.x86_64
 0:9.2.5-6.20131218.el7_0 settato per essere installato
 [node1][DEBUG ] -- Risoluzione delle dipendenze completata
 [node1][WARNIN] Errore: Pacchetto: ceph-common-0.80.7-0.el7.centos.x86_64
 (Ceph)
 [node1][WARNIN] Richiede: libtcmalloc.so.4()(64bit)
 [node1][WARNIN] Errore: Pacchetto: ceph-0.80.7-0.el7.centos.x86_64 (Ceph)
 [node1][DEBUG ]  Si può provare ad usare --skip-broken per aggirare il
 problema
 [node1][WARNIN] Richiede: libleveldb.so.1()(64bit)
 [node1][WARNIN] Errore: Pacchetto: ceph-0.80.7-0.el7.centos.x86_64 (Ceph)
 [node1][WARNIN] Richiede: libtcmalloc.so.4()(64bit)
 [node1][DEBUG ]  Provare ad eseguire: rpm -Va --nofiles --nodigest
 [node1][ERROR ] RuntimeError: command returned non-zero exit status: 1
 *[ceph_deploy][ERROR ] RuntimeError: Failed to execute command: yum -y
 install ceph*

 I installed GIANT version not FIREFLY on admin-node.
 Is it a typo error in the config file or is it truly trying to install
 FIREFLY instead of GIANT.

 About the error, i see that it's related to wrong python default
 libraries.
 It seems that CEPH require libraries not available in the current distro:

 [node1][WARNIN] Richiede: libtcmalloc.so.4()(64bit)
 [node1][WARNIN] Richiede: libleveldb.so.1()(64bit)
 [node1][WARNIN] Richiede: libtcmalloc.so.4()(64bit)

 This seems strange.
 Can you fix this?


 Thanks,
 Massimiliano Cuttini





 ___
 ceph-users mailing 
 listceph-us...@lists.ceph.comhttp://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] v0.80.4 Firefly released

2014-07-16 Thread Travis Rhoden

Hi Andrija,

I'm running a cluster with both CentOS and Ubuntu machines in it.  I just
did some upgrades to 0.80.4, and I can confirm that doing yum update ceph
on the CentOS machine did result in having all OSDs on that machine
restarted automatically.  I actually did not know that would happen, as the
CentOS machines were new additions (first update since deploying them with
0.80.1), and I'm used the Ubuntu behavior where I can update the package
first, then reboot things at will.

So yeah, that still happens with RPM.  :/


On Wed, Jul 16, 2014 at 3:55 AM, Andrija Panic andrija.pa...@gmail.com
wrote:

 Hi Sage,

 can anyone validate, if there is still bug inside RPMs that does
 automatic CEPH service restart after updating packages ?

 We are instructed to first update/restart MONs, and after that OSD - but
 that is impossible if we have MON+OSDs on same host...since the ceph is
 automaticaly restarted with YUM/RPM, but NOT automaticaly restarted on
 Ubuntu/Debian (as reported by some other list memeber...)

 Thanks


 On 16 July 2014 01:45, Sage Weil s...@inktank.com wrote:

 This Firefly point release fixes an potential data corruption problem
 when ceph-osd daemons run on top of XFS and service Firefly librbd
 clients.  A recently added allocation hint that RBD utilizes triggers
 an XFS bug on some kernels (Linux 3.2, and likely others) that leads
 to data corruption and deep-scrub errors (and inconsistent PGs).  This
 release avoids the situation by disabling the allocation hint until we
 can validate which kernels are affected and/or are known to be safe to
 use the hint on.

 We recommend that all v0.80.x Firefly users urgently upgrade,
 especially if they are using RBD.

 Notable Changes
 ---

 * osd: disable XFS extsize hint by default (#8830, Samuel Just)
 * rgw: fix extra data pool default name (Yehuda Sadeh)

 For more detailed information, see:

   http://ceph.com/docs/master/_downloads/v0.80.4.txt

 Getting Ceph
 

 * Git at git://github.com/ceph/ceph.git
 * Tarball at http://ceph.com/download/ceph-0.80.4.tar.gz
 * For packages, see http://ceph.com/docs/master/install/get-packages
 * For ceph-deploy, see
 http://ceph.com/docs/master/install/install-ceph-deploy

 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




 --

 Andrija Panić
 --
   http://admintweets.com
 --

 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] scrub error on firefly

2014-07-10 Thread Travis Rhoden

I can also say that after a recent upgrade to Firefly, I have experienced
massive uptick in scrub errors.  The cluster was on cuttlefish for about a
year, and had maybe one or two scrub errors.  After upgrading to Firefly,
we've probably seen 3 to 4 dozen in the last month or so (was getting 2-3 a
day for a few weeks until the whole cluster was rescrubbed, it seemed).

What I cannot determine, however, is how to know which object is busted?
For example, just today I ran into a scrub error.  The object has two
copies and is an 8MB piece of an RBD, and has identical timestamps,
identical xattrs names and values.  But it definitely has a different MD5
sum. How to know which one is correct?

I've been just kicking off pg repair each time, which seems to just use the
primary copy to overwrite the others.  Haven't run into any issues with
that so far, but it does make me nervous.

 - Travis


On Tue, Jul 8, 2014 at 1:06 AM, Gregory Farnum g...@inktank.com wrote:

 It's not very intuitive or easy to look at right now (there are plans
 from the recent developer summit to improve things), but the central
 log should have output about exactly what objects are busted. You'll
 then want to compare the copies manually to determine which ones are
 good or bad, get the good copy on the primary (make sure you preserve
 xattrs), and run repair.
 -Greg
 Software Engineer #42 @ http://inktank.com | http://ceph.com


 On Mon, Jul 7, 2014 at 6:48 PM, Randy Smith rbsm...@adams.edu wrote:
  Greetings,
 
  I upgraded to firefly last week and I suddenly received this error:
 
  health HEALTH_ERR 1 pgs inconsistent; 1 scrub errors
 
  ceph health detail shows the following:
 
  HEALTH_ERR 1 pgs inconsistent; 1 scrub errors
  pg 3.c6 is active+clean+inconsistent, acting [2,5]
  1 scrub errors
 
  The docs say that I can run `ceph pg repair 3.c6` to fix this. What I
 want
  to know is what are the risks of data loss if I run that command in this
  state and how can I mitigate them?
 
  --
  Randall Smith
  Computing Services
  Adams State University
  http://www.adams.edu/
  719-587-7741
 
  ___
  ceph-users mailing list
  ceph-users@lists.ceph.com
  http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
 
 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] scrub error on firefly

2014-07-10 Thread Travis Rhoden

And actually just to follow-up, it does seem like there are some additional
smarts beyond just using the primary to overwrite the secondaries...  Since
I captured md5 sums before and after the repair, I can say that in this
particular instance, the secondary copy was used to overwrite the primary.
So, I'm just trusting Ceph to the right thing, and so far it seems to, but
the comments here about needing to determine the correct object and place
it on the primary PG make me wonder if I've been missing something.

 - Travis


On Thu, Jul 10, 2014 at 10:19 AM, Travis Rhoden trho...@gmail.com wrote:

 I can also say that after a recent upgrade to Firefly, I have experienced
 massive uptick in scrub errors.  The cluster was on cuttlefish for about a
 year, and had maybe one or two scrub errors.  After upgrading to Firefly,
 we've probably seen 3 to 4 dozen in the last month or so (was getting 2-3 a
 day for a few weeks until the whole cluster was rescrubbed, it seemed).

 What I cannot determine, however, is how to know which object is busted?
 For example, just today I ran into a scrub error.  The object has two
 copies and is an 8MB piece of an RBD, and has identical timestamps,
 identical xattrs names and values.  But it definitely has a different MD5
 sum. How to know which one is correct?

 I've been just kicking off pg repair each time, which seems to just use
 the primary copy to overwrite the others.  Haven't run into any issues with
 that so far, but it does make me nervous.

  - Travis


 On Tue, Jul 8, 2014 at 1:06 AM, Gregory Farnum g...@inktank.com wrote:

 It's not very intuitive or easy to look at right now (there are plans
 from the recent developer summit to improve things), but the central
 log should have output about exactly what objects are busted. You'll
 then want to compare the copies manually to determine which ones are
 good or bad, get the good copy on the primary (make sure you preserve
 xattrs), and run repair.
 -Greg
 Software Engineer #42 @ http://inktank.com | http://ceph.com


 On Mon, Jul 7, 2014 at 6:48 PM, Randy Smith rbsm...@adams.edu wrote:
  Greetings,
 
  I upgraded to firefly last week and I suddenly received this error:
 
  health HEALTH_ERR 1 pgs inconsistent; 1 scrub errors
 
  ceph health detail shows the following:
 
  HEALTH_ERR 1 pgs inconsistent; 1 scrub errors
  pg 3.c6 is active+clean+inconsistent, acting [2,5]
  1 scrub errors
 
  The docs say that I can run `ceph pg repair 3.c6` to fix this. What I
 want
  to know is what are the risks of data loss if I run that command in this
  state and how can I mitigate them?
 
  --
  Randall Smith
  Computing Services
  Adams State University
  http://www.adams.edu/
  719-587-7741
 
  ___
  ceph-users mailing list
  ceph-users@lists.ceph.com
  http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
 
 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] 'osd pool set-quota' behaviour with CephFS

2014-06-24 Thread Travis Rhoden

Hi George,

I actually asked Sage about a similar scenario at the OpenStack summit in
Atlanta this year -- namely if I could use the new pool quota functionality
to enforce quotas on CephFS.  The answer was no, that the pool quota
functionality is mostly intended for radosgw and that the existing cephfs
clients have no support for it.  He said the quota should work, actually,
but that you were likely to see some very strange behavior in cephfs.  That
sounds like what you've seen.  It won't be a graceful failure at all.

Quotas in cephfs is a different task, and one that I'm following as well.
See here: https://github.com/ceph/ceph/pull/1122

The pull request is old, but Sage did mention he was in contact with the
team working on the code and was hopeful to see it finished.

 - Travis


On Tue, Jun 24, 2014 at 7:06 AM, george.ry...@stfc.ac.uk wrote:

  Last week I decided to take a look at the ‘osd pool set-quota’ option.



 I have a directory in cephFS that uses a pool called pool-2 (configured by
 following this:
 http://www.sebastien-han.fr/blog/2013/02/11/mount-a-specific-pool-with-cephfs/).
 I have a directory in that filled with cat pictures. I ran ‘rados df’. I
 then copied a couple more cat pictures into my directory using ‘cp file
 destination  sync’. I then ran ‘rados df’ again, this showed an increase
 in the object count for the pool equal to the number of additional cat
 pictures and an increase in the pool size equal to the size of the cat
 pictures, as expected.



 I then used the command ‘ceph osd pool set-quota {pool-name} [max_objects
 {obj-count}] [max_bytes {bytes}]’, as per
 http://ceph.com/docs/master/rados/operations/pools/, and set an object
 limit a couple of objects bigger than the current pool size. I then ran a
 loop copying more cat pictures one at a time (again with ‘ sync’) each
 time. Whilst doing this I ran ‘rados df’, the number of objects in the pool
 increased up to the limit and stopped. However on the machine copying the
 cat pictures, the copying appeared to work fine and running ls showed more
 pictures than the ‘rados df’ command would suggest should be there. If I
 accessed the same directory from a different machine, then I saw only the
 pictures that were copied up to the limit. If I then removed the limit, the
 images would appear in the directory and ‘rados df’ would report a larger
 number of objects. Similar behaviour was observed when setting a size
 limit.  What’s going on? Is this expected behaviour?





 George Ryall


 Scientific Computing | STFC Rutherford Appleton Laboratory | Harwell
 Oxford | Didcot | OX11 0QX

 (01235 44) 5021



 --
 Scanned by iCritical.


 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] Multiple L2 LAN segments with Ceph

2014-05-28 Thread Travis Rhoden

Hi folks,

Does anybody know if there are any issues running Ceph with multiple L2 LAN
segements?  I'm picturing a large multi-rack/multi-row deployment where you
may give each rack (or row) it's own L2 segment, then connect them all with
L3/ECMP in a leaf-spine architecture.

I'm wondering how cluster_network (or public_network) in ceph.conf works in
this case.  Does that directive just tell a daemon starting on a particular
node which network to bind to?  Or is a CIDR that has to be accurate for
every OSD and MON in the entire cluster?

Thanks,

 - Travis
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Multiple L2 LAN segments with Ceph

2014-05-28 Thread Travis Rhoden

Thanks to you all!  You confirmed everything I thought I knew, but it is
nice to be sure!


On Wed, May 28, 2014 at 1:37 PM, Mike Dawson mike.daw...@cloudapt.comwrote:

 Travis,

 We run a routed ECMP spine-leaf network architecture with Ceph and have no
 issues on the network side whatsoever. Each leaf switch has an L2 cidr
 block inside a common L3 supernet.

 We do not currently split cluster_network and public_network. If we did,
 we'd likely build a separate spine-leaf network with it's own L3 supernet.

 A simple IPv4 example:

 - ceph-cluster: 10.1.0.0/16
 - cluster-leaf1: 10.1.1.0/24
 - node1: 10.1.1.1/24
 - node2: 10.1.1.2/24
 - cluster-leaf2: 10.1.2.0/24

 - ceph-public: 10.2.0.0/16
 - public-leaf1: 10.2.1.0/24
 - node1: 10.2.1.1/24
 - node2: 10.2.1.2/24
 - public-leaf2: 10.2.2.0/24

 ceph.conf would be:

 cluster_network: 10.1.0.0/255.255.0.0
 public_network: 10.2.0.0/255.255.0.0

 - Mike Dawson


 On 5/28/2014 1:01 PM, Travis Rhoden wrote:

 Hi folks,

 Does anybody know if there are any issues running Ceph with multiple L2
 LAN segements?  I'm picturing a large multi-rack/multi-row deployment
 where you may give each rack (or row) it's own L2 segment, then connect
 them all with L3/ECMP in a leaf-spine architecture.

 I'm wondering how cluster_network (or public_network) in ceph.conf works
 in this case.  Does that directive just tell a daemon starting on a
 particular node which network to bind to?  Or is a CIDR that has to be
 accurate for every OSD and MON in the entire cluster?

 Thanks,

   - Travis


 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] ceph cinder compute-nodes

2014-05-24 Thread Travis Rhoden

You can define the UUID in the secret.xml file.  That way you can generate
one yourself, or let it autogenerate the first one for you and then use the
same one on all the other compute nodes.

In the Ceph docs, it actually generates one using uuidgen, then puts that
UUID in the secret.xml file itself.  See the very last part here:
http://ceph.com/docs/master/rbd/rbd-openstack/#setup-ceph-client-authentication

Hope that's clear.   It's definitely possible (and pretty much required) to
have all the nodes have the same secret UUID -- as you saw, you only put
one UUID into cinder.

 - Travis


On Sat, May 24, 2014 at 9:39 AM, 10 minus t10te...@gmail.com wrote:

 Hi,

 I went through the docs fo setting up cinder with ceph.

 from the docs -  I  have to perform on every compute node

 virsh secret-define --file secret.xml

 The issue I see is that I have to perform this on 5 compute nodes and on
 cinder it expects to have only one

 rbd_secret_uuid= uuid


 as the former command will generate 5 uuids . How can I pass 5 uuids to
 cinder

 Cheers



 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Red Hat to acquire Inktank

2014-04-30 Thread Travis Rhoden

Sage,

Congrats to you and Inktank!

 - Travis


On Wed, Apr 30, 2014 at 9:27 AM, Haomai Wang haomaiw...@gmail.com wrote:

 Congratulation!

 On Wed, Apr 30, 2014 at 8:18 PM, Sage Weil s...@inktank.com wrote:
  Today we are announcing some very big news: Red Hat is acquiring Inktank.
  We are very excited about what this means for Ceph, the community, the
  team, our partners, and our customers. Ceph has come a long way in the
 ten
  years since the first line of code has been written, particularly over
 the
  last two years that Inktank has been focused on its development. The
 fifty
  members of the Inktank team, our partners, and the hundreds of other
  contributors have done amazing work in bringing us to where we are today.
 
  We believe that, as part of Red Hat, the Inktank team will be able to
  build a better quality Ceph storage platform that will benefit the entire
  ecosystem. Red Hat brings a broad base of expertise in building and
  delivering hardened software stacks as well as a wealth of resources that
  will help Ceph become the transformative and ubiquitous storage platform
  that we always believed it could be.
 
  For existing Inktank customers, this is going to mean turning a reliable
  and robust storage system into something that delivers even more value.
 In
  particular, joining forces with the Red Hat team will improve our ability
  to address problems at all layers of the storage stack, including in the
  kernel. We naturally recognize that many customers and users have built
  platforms based on other Linux distributions. We will continue to support
  these installations while we determine how to provide the best customer
  experience moving forward and how the next iteration of the enterprise
  Ceph product will be structured. In the meantime, our team remains
  committed to keeping Ceph an open, multiplatform project that works in
 any
  environment where it makes sense, including other Linux distributions and
  non-Linux operating systems.
 
  Red Hat is one of only a handful of companies that I trust to steward the
  Ceph project. When we started Inktank two years ago, our goal was to
 build
  the business by making Ceph successful as a broad-based, collaborative
  open source project with a vibrant user, developer, and commercial
  community. Red Hat shares this vision. They are passionate about open
  source, and have demonstrated that they are strong and fair stewards with
  other critical projects (like KVM). Red Hat intends to administer the
 Ceph
  trademark in a manner that protects the ecosystem as a whole and creates
 a
  level playing field where everyone is held to the same standards of use.
  Similarly, policies like upstream first ensure that bug fixes and
  improvements that go into Ceph-derived products are always shared with
 the
  community to streamline development and benefit all members of the
  ecosystem.
 
  One important change that will take place involves Inktank's product
  strategy, in which some add-on software we have developed is proprietary.
  In contrast, Red Hat favors a pure open source model. That means that
  Calamari, the monitoring and diagnostics tool that Inktank has developed
  as part of the Inktank Ceph Enterprise product, will soon be open
 sourced.
 
  This is a big step forward for the Ceph community. Very little will
 change
  on day one as it will take some time to integrate the Inktank business
 and
  for any significant changes to happen with our engineering activities.
  However, we are very excited about what is coming next for Ceph and are
  looking forward to this new chapter.
 
  I'd like to thank everyone who has helped Ceph get to where we are today:
  the amazing research group at UCSC where it began, DreamHost for
  supporting us for so many years, the incredible Inktank team, and the
 many
  contributors and users that have helped shape the system. We continue to
  believe that robust, scalable, and completely open storage platforms like
  Ceph will transform a storage industry that is still dominated by
  proprietary systems. Let's make it happen!
 
  sage
  ___
  ceph-users mailing list
  ceph-users@lists.ceph.com
  http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



 --
 Best Regards,

 Wheat
 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] packages for Trusty

2014-04-25 Thread Travis Rhoden

Thanks guys.  I don't know why I didn't try that.  I guess just too much
habit of setting up the additional repo.  =)


On Fri, Apr 25, 2014 at 4:09 PM, Cédric Lemarchand c.lemarch...@yipikai.org
 wrote:

 Yes, juste apt-get install ceph ;-)

 Cheers

 --
 Cédric Lemarchand

 Le 25 avr. 2014 à 21:07, Drew Weaver drew.wea...@thenap.com a écrit :

  You can actually just install it using the Ubuntu packages. I did it
 yesterday on Trusty.



 Thanks,

 -Drew





 *From:* ceph-users-boun...@lists.ceph.com [
 mailto:ceph-users-boun...@lists.ceph.comceph-users-boun...@lists.ceph.com]
 *On Behalf Of *Travis Rhoden
 *Sent:* Friday, April 25, 2014 3:06 PM
 *To:* ceph-users
 *Subject:* [ceph-users] packages for Trusty



 Are there packages for Trusty being built yet?

 I don't see it listed at http://ceph.com/debian-emperor/dists/

 Thanks,

  - Travis

 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] cephx key for CephFS access only

2014-04-02 Thread Travis Rhoden

Thanks for the response Greg.

Unfortunately, I appear to be missing something.  If I use my cephfs key
with these perms:

client.cephfs
key: redacted
caps: [mds] allow rwx
caps: [mon] allow r
caps: [osd] allow rwx pool=data

This is what happens when I mount:

# ceph-fuse -k /etc/ceph/ceph.client.cephfs.keyring -m ceph0-10g /data
ceph-fuse[13533]: starting ceph client
ceph-fuse[13533]: ceph mount failed with (1) Operation not permitted
ceph-fuse[13531]: mount failed: (1) Operation not permitted

But using the admin key works just fine:

# ceph-fuse -k /etc/ceph/ceph.client.admin.keyring -m ceph0-10g /data
ceph-fuse[13548]: starting ceph client
ceph-fuse[13548]: starting fuse

The admin key as the following perms:

client.admin
key: redacted
caps: [mds] allow
caps: [mon] allow *
caps: [osd] allow *

Since the mds permissions are functionally equivalent, either I need extra
rights on the monitor, or the OSDs.  Does a client need to access the
metadata pool in order to do a CephFS mount?

I'll experiment a bit and report back.


On Mon, Mar 31, 2014 at 1:36 PM, Gregory Farnum g...@inktank.com wrote:

 At present, the only security permission on the MDS is allowed to do
 stuff, so rwx and * are synonymous. In general * means is an
 admin, though, so you'll be happier in the future if you use rwx.
 You may also want a more restrictive set of monitor capabilities as
 somebody else recently pointed out, but [3] will give you the
 filesystem access you're looking for.
 -Greg
 Software Engineer #42 @ http://inktank.com | http://ceph.com


 On Fri, Mar 28, 2014 at 9:40 AM, Travis Rhoden trho...@gmail.com wrote:
  Hi Folks,
 
  What would be the right set of capabilities to set for a new client key
 that
  has access to CephFS only?  I've seen a few different examples:
 
  [1] mds 'allow *' mon 'allow r' osd 'allow rwx pool=data'
  [2] mon 'allow r' osd 'allow rwx pool=data'
  [3] mds 'allow rwx' mon 'allow r' osd 'allow rwx pool=data'
 
  I'm inclined to go with [3]. [1] seems weird for using *, I like seeing
 rwx.
  Are these synonymous? [2] seems wrong because it doesn't include anything
  for MDS.
 
  - Travis
 
  ___
  ceph-users mailing list
  ceph-users@lists.ceph.com
  http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
 

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] cephx key for CephFS access only

2014-04-02 Thread Travis Rhoden

Ah, I figured it out.  My original key worked, but I needed to use the --id
option with ceph-fuse to tell it to use the cephfs user rather than the
admin user.  Tailing the log on my monitor pointed out that it was logging
in with client.admin, but providing the key for client.cephfs.

So, final working command is:

ceph-fuse -k /etc/ceph/ceph.client.cephfs.keyring --id cephfs -m ceph0-10g
/data

I will note that neither the -k or --id options are present in man
ceph-fuse, ceph-fuse --help, or in the Ceph docs, really.  An example
using -k is found here:
http://ceph.com/docs/master/start/quick-cephfs/#filesystem-in-user-space-fuse,
but there is never any mention of needing to change users if you are not
using client.admin.  In fact, using the search functionality on ceph-fuse
returns zero results.

If I'm ambitious I'll submit changes for the docs...

Thanks for the help!

 - Travis


On Wed, Apr 2, 2014 at 12:00 PM, Travis Rhoden trho...@gmail.com wrote:

 Thanks for the response Greg.

 Unfortunately, I appear to be missing something.  If I use my cephfs key
 with these perms:

 client.cephfs
 key: redacted
 caps: [mds] allow rwx
 caps: [mon] allow r
 caps: [osd] allow rwx pool=data

 This is what happens when I mount:

 # ceph-fuse -k /etc/ceph/ceph.client.cephfs.keyring -m ceph0-10g /data
 ceph-fuse[13533]: starting ceph client
 ceph-fuse[13533]: ceph mount failed with (1) Operation not permitted
 ceph-fuse[13531]: mount failed: (1) Operation not permitted

 But using the admin key works just fine:

 # ceph-fuse -k /etc/ceph/ceph.client.admin.keyring -m ceph0-10g /data
 ceph-fuse[13548]: starting ceph client
 ceph-fuse[13548]: starting fuse

 The admin key as the following perms:

 client.admin
 key: redacted
 caps: [mds] allow
 caps: [mon] allow *
 caps: [osd] allow *

 Since the mds permissions are functionally equivalent, either I need extra
 rights on the monitor, or the OSDs.  Does a client need to access the
 metadata pool in order to do a CephFS mount?

 I'll experiment a bit and report back.


 On Mon, Mar 31, 2014 at 1:36 PM, Gregory Farnum g...@inktank.com wrote:

 At present, the only security permission on the MDS is allowed to do
 stuff, so rwx and * are synonymous. In general * means is an
 admin, though, so you'll be happier in the future if you use rwx.
 You may also want a more restrictive set of monitor capabilities as
 somebody else recently pointed out, but [3] will give you the
 filesystem access you're looking for.
 -Greg
 Software Engineer #42 @ http://inktank.com | http://ceph.com


 On Fri, Mar 28, 2014 at 9:40 AM, Travis Rhoden trho...@gmail.com wrote:
  Hi Folks,
 
  What would be the right set of capabilities to set for a new client key
 that
  has access to CephFS only?  I've seen a few different examples:
 
  [1] mds 'allow *' mon 'allow r' osd 'allow rwx pool=data'
  [2] mon 'allow r' osd 'allow rwx pool=data'
  [3] mds 'allow rwx' mon 'allow r' osd 'allow rwx pool=data'
 
  I'm inclined to go with [3]. [1] seems weird for using *, I like seeing
 rwx.
  Are these synonymous? [2] seems wrong because it doesn't include
 anything
  for MDS.
 
  - Travis
 
  ___
  ceph-users mailing list
  ceph-users@lists.ceph.com
  http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
 



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] Monitors stuck in electing

2014-03-25 Thread Travis Rhoden

Hello,

I just deployed a new Emperor cluster using ceph-deploy 1.4.  All went very
smooth, until I rebooted all the nodes.  After reboot, the monitors no
longer form a quorum.

I followed the troubleshooting steps here:
http://ceph.com/docs/master/rados/troubleshooting/troubleshooting-mon/

Specifically, Im in the stat described in this section:
http://ceph.com/docs/master/rados/troubleshooting/troubleshooting-mon/#most-common-monitor-issues

The state for all the monitors is electing.  The docs say this is most
likely clock skew, but I do have all nodes synch'd with NTP.  I've
confirmed this multiple times.  I've also confirmed the monitors can reach
each other (by telneting to IP:PORT, and I can see established connections
via netstat).

I'm baffled.

here is a sample mon_status output:

root@ceph0:~# ceph daemon mon.ceph0 quorum_status
{ election_epoch: 31,
  quorum: [],
  quorum_names: [],
  quorum_leader_name: ,
  monmap: { epoch: 2,
  fsid: XXX, (redacted)
  modified: 2014-03-24 14:35:22.332646,
  created: 0.00,
  mons: [
{ rank: 0,
  name: ceph0,
  addr: 10.10.30.0:6789\/0},
{ rank: 1,
  name: ceph1,
  addr: 10.10.30.1:6789\/0},
{ rank: 2,
  name: ceph2,
  addr: 10.10.30.2:6789\/0}]}}

They all look identical to that.

Any ideas what I can look at besides NTP?  The docs really stress that it
should be clock skew, so I'll keep looking at that...

 - Travis
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Monitors stuck in electing

2014-03-25 Thread Travis Rhoden

Just to emphasize that I don't think it's clock skew, here is the NTP state
of all three monitors:

# ansible ceph_mons -m command -a ntpq -p -kK
SSH password:
sudo password [defaults to SSH password]:
ceph0 | success | rc=0 
 remote   refid  st t when poll reach   delay   offset
jitter
==
*controller-10g  198.60.73.8  2 u   43   64  3770.2360.057
0.097

ceph1 | success | rc=0 
 remote   refid  st t when poll reach   delay   offset
jitter
==
*controller-10g  198.60.73.8  2 u   39   64  3770.2730.035
0.064

ceph2 | success | rc=0 
 remote   refid  st t when poll reach   delay   offset
jitter
==
*controller-10g  198.60.73.8  2 u   30   64  3770.201   -0.063
0.063

I think they are pretty well in synch.

 - Travis


On Tue, Mar 25, 2014 at 11:09 AM, Travis Rhoden trho...@gmail.com wrote:

 Hello,

 I just deployed a new Emperor cluster using ceph-deploy 1.4.  All went
 very smooth, until I rebooted all the nodes.  After reboot, the monitors no
 longer form a quorum.

 I followed the troubleshooting steps here:
 http://ceph.com/docs/master/rados/troubleshooting/troubleshooting-mon/

 Specifically, Im in the stat described in this section:
 http://ceph.com/docs/master/rados/troubleshooting/troubleshooting-mon/#most-common-monitor-issues

 The state for all the monitors is electing.  The docs say this is most
 likely clock skew, but I do have all nodes synch'd with NTP.  I've
 confirmed this multiple times.  I've also confirmed the monitors can reach
 each other (by telneting to IP:PORT, and I can see established connections
 via netstat).

 I'm baffled.

 here is a sample mon_status output:

 root@ceph0:~# ceph daemon mon.ceph0 quorum_status
 { election_epoch: 31,
   quorum: [],
   quorum_names: [],
   quorum_leader_name: ,
   monmap: { epoch: 2,
   fsid: XXX, (redacted)
   modified: 2014-03-24 14:35:22.332646,
   created: 0.00,
   mons: [
 { rank: 0,
   name: ceph0,
   addr: 10.10.30.0:6789\/0},
 { rank: 1,
   name: ceph1,
   addr: 10.10.30.1:6789\/0},
 { rank: 2,
   name: ceph2,
   addr: 10.10.30.2:6789\/0}]}}

 They all look identical to that.

 Any ideas what I can look at besides NTP?  The docs really stress that it
 should be clock skew, so I'll keep looking at that...

  - Travis

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Monitors stuck in electing

2014-03-25 Thread Travis Rhoden

Well since I spammed the list earlier, I should fess up to my mistakes.  I
forgot to change MTU sizes on the 10G switch after I switched to jumbo
frames.  So yes, I had a very unhappy networking stack.

On the upside, playing with Cumulus Linux on switches is fun.


On Tue, Mar 25, 2014 at 1:12 PM, Travis Rhoden trho...@gmail.com wrote:

 Thanks for the feedback -- I'll post back with more detailed logs if
 anything looks fishy!


 On Tue, Mar 25, 2014 at 1:10 PM, Gregory Farnum g...@inktank.com wrote:

 Well, you could try running with messenger debugging cranked all the
 way up and see if there's something odd happening there (eg, not
 handling incoming messages), but based on not having any other reports
 of this, I think your networking stack is unhappy in some way. *shrug*
 (Higher log levels showing what the individual pipes are doing will
 narrow it down on the Ceph side.)
 -Greg
 Software Engineer #42 @ http://inktank.com | http://ceph.com


 On Tue, Mar 25, 2014 at 10:05 AM, Travis Rhoden trho...@gmail.com
 wrote:
 
 
 
  On Tue, Mar 25, 2014 at 12:53 PM, Gregory Farnum g...@inktank.com
 wrote:
 
  On Tue, Mar 25, 2014 at 9:24 AM, Travis Rhoden trho...@gmail.com
 wrote:
   Okay, last one until I get some guidance.  Sorry for the spam, but
   wanted to
   paint a full picture.  Here are debug logs from all three mons,
   capturing
   what looks like an election sequence to me:
  
   ceph0:
   2014-03-25 16:17:24.324846 7fa5c53fc700  5
   mon.ceph0@0(electing).elector(35)
   start -- can i be leader?
   2014-03-25 16:17:24.324900 7fa5c53fc700  1
   mon.ceph0@0(electing).elector(35)
   init, last seen epoch 35
   2014-03-25 16:17:24.324913 7fa5c53fc700  1 -- 10.10.30.0:6789/0 --
   mon.1
   10.10.30.1:6789/0 -- election(b3f38955-4321-4850-9ddb-3b09940dc951
   propose
   35) v4 -- ?+0 0x263d480
   2014-03-25 16:17:24.324948 7fa5c53fc700  1 -- 10.10.30.0:6789/0 --
   mon.2
   10.10.30.2:6789/0 -- election(b3f38955-4321-4850-9ddb-3b09940dc951
   propose
   35) v4 -- ?+0 0x263d6c0
   2014-03-25 16:17:25.353975 7fa5c4bfb700  1 -- 10.10.30.0:6789/0 ==
   mon.2
   10.10.30.2:6789/0 493 
 election(b3f38955-4321-4850-9ddb-3b09940dc951
   propose 35) v4  537+0+0 (4036841703 0 0) 0x265fd80 con 0x1df0c60
   2014-03-25 16:17:25.354042 7fa5c4bfb700  5
   mon.ceph0@0(electing).elector(35)
   handle_propose from mon.2
   2014-03-25 16:17:29.325107 7fa5c53fc700  5
   mon.ceph0@0(electing).elector(35)
   election timer expired
  
   ceph1:
   2014-03-25 16:17:24.325529 7ffe48cc1700  5
   mon.ceph1@1(electing).elector(35)
   handle_propose from mon.0
   2014-03-25 16:17:24.325535 7ffe48cc1700  5
   mon.ceph1@1(electing).elector(35)
   defer to 0
   2014-03-25 16:17:24.325546 7ffe48cc1700  1 -- 10.10.30.1:6789/0 --
   mon.0
   10.10.30.0:6789/0 -- election(b3f38955-4321-4850-9ddb-3b09940dc951
 ack
   35)
   v4 -- ?+0 0x1bbfb40
   2014-03-25 16:17:25.354038 7ffe48cc1700  1 -- 10.10.30.1:6789/0 ==
   mon.2
   10.10.30.2:6789/0 489 
 election(b3f38955-4321-4850-9ddb-3b09940dc951
   propose 35) v4  537+0+0 (4036841703 0 0) 0x1bbf6c0 con 0x14d9b00
   2014-03-25 16:17:25.354102 7ffe48cc1700  5
   mon.ceph1@1(electing).elector(35)
   handle_propose from mon.2
   2014-03-25 16:17:25.354113 7ffe48cc1700  5
   mon.ceph1@1(electing).elector(35)
   no, we already acked 0
  
   ceph2:
   2014-03-25 16:17:20.353135 7f80d0013700  5
   mon.ceph2@2(electing).elector(35)
   election timer expired
   2014-03-25 16:17:20.353154 7f80d0013700  5
   mon.ceph2@2(electing).elector(35)
   start -- can i be leader?
   2014-03-25 16:17:20.353225 7f80d0013700  1
   mon.ceph2@2(electing).elector(35)
   init, last seen epoch 35
   2014-03-25 16:17:20.353238 7f80d0013700  1 -- 10.10.30.2:6789/0 --
   mon.0
   10.10.30.0:6789/0 -- election(b3f38955-4321-4850-9ddb-3b09940dc951
   propose
   35) v4 -- ?+0 0x18e7900
   2014-03-25 16:17:20.353272 7f80d0013700  1 -- 10.10.30.2:6789/0 --
   mon.1
   10.10.30.1:6789/0 -- election(b3f38955-4321-4850-9ddb-3b09940dc951
   propose
   35) v4 -- ?+0 0x18e7d80
   2014-03-25 16:17:25.353559 7f80d0013700  5
   mon.ceph2@2(electing).elector(35)
   election timer expired
   2014-03-25 16:17:25.353578 7f80d0013700  5
   mon.ceph2@2(electing).elector(35)
   start -- can i be leader?
   2014-03-25 16:17:25.353647 7f80d0013700  1
   mon.ceph2@2(electing).elector(35)
   init, last seen epoch 35
   2014-03-25 16:17:25.353660 7f80d0013700  1 -- 10.10.30.2:6789/0 --
   mon.0
   10.10.30.0:6789/0 -- election(b3f38955-4321-4850-9ddb-3b09940dc951
   propose
   35) v4 -- ?+0 0x19b7240
   2014-03-25 16:17:25.353695 7f80d0013700  1 -- 10.10.30.2:6789/0 --
   mon.1
   10.10.30.1:6789/0 -- election(b3f38955-4321-4850-9ddb-3b09940dc951
   propose
   35) v4 -- ?+0 0x19b76c0
   2014-03-25 16:17:30.354040 7f80d0013700  5
   mon.ceph2@2(electing).elector(35)
   election timer expired
  
   Oddly, it looks to me like mon.2 (ceph2) never handles/receives the
   proposal
   from mon.0 (ceph0).  But I admit I have no clue how

Re: [ceph-users] ceph-deploy, single mon not in quorum

2014-01-09 Thread Travis Rhoden

HI Mordur,

I'm definitely straining my memory on this one, but happy to help if I can?

I'm pretty sure I did not figure it out -- you can see I didn't get
any feedback from the list.  What I did do, however, was uninstall
everything and try the same setup with mkcephfs, which worked fine at
the time.  This was 8 months ago, though, and I have since used
ceph-deploy many times with great success.  I am not sure if I have
ever tried a similar set up, though, with just one node and one
monitor.  Fortuitiously, I may be trying that very setup today or
tomorrow.  If I still have issues, I will be sure to post them here.

Are you using both the latest ceph-deploy and the latest Ceph packages
(Emperor or newer dev packages)?  There have been lots of changes in
the monitor area, including in the upstart scripts, that made many
things more robust in this area.  I did have a cluster a few months
ago that had a flaky monitor that refused to join quorum after
install, and I had to just blow it away and re-install/deploy it and
then it was fine, which I thought was odd.

Sorry that's probably not much help.

 - Travis

On Thu, Jan 9, 2014 at 12:40 AM, Mordur Ingolfsson r...@1984.is wrote:
 Hi Travis,

 Did you figure this out? I'm dealing with exactly the same thing over here.

 Best,
 Moe
 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] ceph-deploy, single mon not in quorum

2014-01-09 Thread Travis Rhoden

On Thu, Jan 9, 2014 at 9:48 AM, Alfredo Deza alfredo.d...@inktank.com wrote:
 On Thu, Jan 9, 2014 at 9:45 AM, Travis Rhoden trho...@gmail.com wrote:
 HI Mordur,

 I'm definitely straining my memory on this one, but happy to help if I can?

 I'm pretty sure I did not figure it out -- you can see I didn't get
 any feedback from the list.  What I did do, however, was uninstall
 everything and try the same setup with mkcephfs, which worked fine at
 the time.  This was 8 months ago, though, and I have since used
 ceph-deploy many times with great success.  I am not sure if I have
 ever tried a similar set up, though, with just one node and one
 monitor.  Fortuitiously, I may be trying that very setup today or
 tomorrow.  If I still have issues, I will be sure to post them here.

 Are you using both the latest ceph-deploy and the latest Ceph packages
 (Emperor or newer dev packages)?  There have been lots of changes in
 the monitor area, including in the upstart scripts, that made many
 things more robust in this area.  I did have a cluster a few months
 ago that had a flaky monitor that refused to join quorum after
 install, and I had to just blow it away and re-install/deploy it and
 then it was fine, which I thought was odd.

 Sorry that's probably not much help.

  - Travis

 On Thu, Jan 9, 2014 at 12:40 AM, Mordur Ingolfsson r...@1984.is wrote:
 Hi Travis,

 Did you figure this out? I'm dealing with exactly the same thing over here.

 Can you share what exactly you are having problems with? ceph-deploy's
 log output has been
 much improved and it is super useful to have that when dealing with
 possible issues.

I do not, it was long long ago...  And it case it was ambiguous, let
me explicitly say I was not recommending the use of mkcephfs at all
(is that even still possible?).  ceph-deploy is certainly the tool to
use.



 Best,
 Moe
 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Multiple kernel RBD clients failures

2013-10-01 Thread Travis Rhoden

Eric,

Yeah, your OSD weights are a little crazy...

For example, looking at one host from your output of ceph osd tree...

-3  31.5host tca23
1   3.63osd.1   up  1
7   0.26osd.7   up  1
13  2.72osd.13  up  1
19  2.72osd.19  up  1
25  0.26osd.25  up  1
31  3.63osd.31  up  1
37  2.72osd.37  up  1
43  0.26osd.43  up  1
49  3.63osd.49  up  1
55  0.26osd.55  up  1
61  3.63osd.61  up  1
67  0.26osd.67  up  1
73  3.63osd.73  up  1
79  0.26osd.79  up  1
85  3.63osd.85  up  1

osd.7 is set to 0.26, with others set to  3.  Under normal
circumstances, the rule of thumb would be to set weights equal to the
disk size in TB.  So, a 2TB disk would have a weight of 2, a 1.5TB
disk == 1.5, etc.

These weights control what proportion of data is directed to each OSD.
 I'm guessing you do have very different size disks, though, as it
looks like the disk that are reporting near full all have relatively
small weights (OSD 43 is at 91%, weight = 0.26).  Is this really a
260GB disk?  A mix of HDD and SSDs? or maybe just a small partition?
Either way, you probably have something wrong with the weights.  I'd
look into that.  Having a single pool made of disks of such varied
size may not be a good option, but I'm not sure if that's your setup
or not.

To the best of my knowledge, Ceph halts IO operations when any disk
reaches the near full scenario (85% by default).  I'm not 100% certain
on that one, but I believe that is true.

Hope that helps,

 - Travis

On Tue, Oct 1, 2013 at 2:51 AM, Yan, Zheng uker...@gmail.com wrote:
 On Mon, Sep 30, 2013 at 11:50 PM, Eric Eastman eri...@aol.com wrote:

 Thank you for the reply


 -28 == -ENOSPC (No space left on device). I think it's is due to the

 fact that some osds are near full.


 Yan, Zheng


 I thought that may be the case, but I would expect that ceph health would
 tell me I had a full OSDs, but it is only saying they are near full:


 # ceph health detail
 HEALTH_WARN 9 near full osd(s)
 osd.9 is near full at 85%
 osd.29 is near full at 85%
 osd.43 is near full at 91%
 osd.45 is near full at 88%
 osd.47 is near full at 88%
 osd.55 is near full at 94%
 osd.59 is near full at 94%
 osd.67 is near full at 94%
 osd.83 is near full at 94%



 Are these OSD's disks smaller than other OSD's. If they do, you need
 to lower these OSD's weights.

 Regards
 Yan, Zheng

 As I still have lots of space:


 # ceph df
 GLOBAL:
SIZE AVAIL RAW USED %RAW USED
249T 118T  131T 52.60

 POOLS:
NAME ID USED   %USED OBJECTS
data   0  0 0 0
metadata  1  0 0 0
rbd 2  8 0 1
rbd-pool3  67187G 26.30 17713336


 And I setup lots of Placement Groups:

 # ceph osd dump | grep 'rep size' | grep rbd-pool
 pool 3 'rbd-pool' rep size 2 min_size 1 crush_ruleset 0 object_hash rjenkins
 pg_num 4500 pgp_num 4500 last_change 360 owner 0

 Why did the OSDs fill up long before I ran out of space?





 Thanks,

 Eric

 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] RBD Snap removal priority

2013-09-27 Thread Travis Rhoden

Hello everyone,

I'm running a Cuttlefish cluster that hosts a lot of RBDs.  I recently
removed a snapshot of a large one (rbd snap rm -- 12TB), and I noticed
that all of the clients had markedly decreased performance.  Looking
at iostat on the OSD nodes had most disks pegged at 100% util.

I know there are thread priorities that can be set for clients vs
recovery, but I'm not sure what deleting a snapshot falls under.  I
couldn't really find anything relevant.  Is there anything I can tweak
to lower the priority of such an operation?  I didn't need it to
complete fast, as rbd snap rm returns immediately and the actual
deletion is done asynchronously.  I'd be fine with it taking longer at
a lower priority, but as it stands now it brings my cluster to a crawl
and is causing issues with several VMs.

I see an osd snap trim thread timeout option in the docs -- Is the
operation occuring here what you would call snap trimming?  If so, any
chance of adding an option for osd snap trim priority just like
there is for osd client op and osd recovery op?

Hope what I am saying makes sense...

 - Travis
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] RBD Snap removal priority

2013-09-27 Thread Travis Rhoden

Hi Mike,

Thanks for the info.  I had seem some of the previous reports of
reduced performance during various recovery tasks (and certainly
experienced them) but you summarized them all quite nicely.

Yes, I'm running XFS on the OSDs.  I checked fragmentation on a few of
my OSDs -- all came back ~38% (better than I thought!).

 - Travis

On Fri, Sep 27, 2013 at 2:05 PM, Mike Dawson mike.daw...@cloudapt.com wrote:
 [cc ceph-devel]

 Travis,

 RBD doesn't behave well when Ceph maintainance operations create spindle
 contention (i.e. 100% util from iostat). More about that below.

 Do you run XFS under your OSDs? If so, can you check for extent
 fragmentation? Should be something like:

 xfs_db -c frag -r /dev/sdb1

 We recently saw a fragmentation factors of over 80%, with lots of ino's
 having hundreds of extents. After 24 hours+ of defrag'ing, we got it under
 control, but we're seeing the fragmentation factor grow by ~1.5% daily. We
 experienced spindle contention issues even after the defrag.



 Sage, Sam, etc,

 I think the real issue is Ceph has several states where it performs what I
 would call maintanance operations that saturate the underlying storage
 without properly yielding to client i/o (which should have a higher
 priority).

 I have experienced or seen reports of Ceph maintainance affecting rbd client
 i/o in many ways:

 - QEMU/RBD Client I/O Stalls or Halts Due to Spindle Contention from Ceph
 Maintainance [1]
 - Recovery and/or Backfill Cause QEMU/RBD Reads to Hang [2]
 - rbd snap rm (Travis' report below)

 [1] http://tracker.ceph.com/issues/6278
 [2] http://tracker.ceph.com/issues/6333

 I think this family of issues speak to the need for Ceph to have more
 visibility into the underlying storage's limitations (especially spindle
 contention) when performing known expensive maintainance operations.

 Thanks,
 Mike Dawson


 On 9/27/2013 12:25 PM, Travis Rhoden wrote:

 Hello everyone,

 I'm running a Cuttlefish cluster that hosts a lot of RBDs.  I recently
 removed a snapshot of a large one (rbd snap rm -- 12TB), and I noticed
 that all of the clients had markedly decreased performance.  Looking
 at iostat on the OSD nodes had most disks pegged at 100% util.

 I know there are thread priorities that can be set for clients vs
 recovery, but I'm not sure what deleting a snapshot falls under.  I
 couldn't really find anything relevant.  Is there anything I can tweak
 to lower the priority of such an operation?  I didn't need it to
 complete fast, as rbd snap rm returns immediately and the actual
 deletion is done asynchronously.  I'd be fine with it taking longer at
 a lower priority, but as it stands now it brings my cluster to a crawl
 and is causing issues with several VMs.

 I see an osd snap trim thread timeout option in the docs -- Is the
 operation occuring here what you would call snap trimming?  If so, any
 chance of adding an option for osd snap trim priority just like
 there is for osd client op and osd recovery op?

 Hope what I am saying makes sense...

   - Travis
 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Scaling RBD module

2013-09-24 Thread Travis Rhoden

This noshare option may have just helped me a ton -- I sure wish I would
have asked similar questions sooner, because I have seen the same failure
to scale.  =)

One question -- when using the noshare option (or really, even without
it) are there any practical limits on the number of RBDs that can be
mounted?  I have servers with ~100 RBDs on them each, and am wondering if I
switch them all over to using noshare if anything is going to blow up,
use a ton more memory, etc.  Even without noshare, are there any known
limits to how many RBDs can be mapped?

Thanks!

 - Travis


On Thu, Sep 19, 2013 at 8:03 PM, Somnath Roy somnath@sandisk.comwrote:

 Thanks Josh !
 I am able to successfully add this noshare option in the image mapping
 now. Looking at dmesg output, I found that was indeed the secret key
 problem. Block performance is scaling now.

 Regards
 Somnath

 -Original Message-
 From: ceph-devel-ow...@vger.kernel.org [mailto:
 ceph-devel-ow...@vger.kernel.org] On Behalf Of Josh Durgin
 Sent: Thursday, September 19, 2013 12:24 PM
 To: Somnath Roy
 Cc: Sage Weil; ceph-de...@vger.kernel.org; Anirban Ray;
 ceph-users@lists.ceph.com
 Subject: Re: [ceph-users] Scaling RBD module

 On 09/19/2013 12:04 PM, Somnath Roy wrote:
  Hi Josh,
  Thanks for the information. I am trying to add the following but hitting
 some permission issue.
 
  root@emsclient:/etc# echo mon-1:6789,mon-2:6789,mon-3:6789
  name=admin,key=client.admin,noshare test_rbd ceph_block_test' 
  /sys/bus/rbd/add
  -bash: echo: write error: Operation not permitted

 If you check dmesg, it will probably show an error trying to authenticate
 to the cluster.

 Instead of key=client.admin, you can pass the base64 secret value as shown
 in 'ceph auth list' with the secret=X option.

 BTW, there's a ticket for adding the noshare option to rbd map so using
 the sysfs interface like this is never necessary:

 http://tracker.ceph.com/issues/6264

 Josh

  Here is the contents of rbd directory..
 
  root@emsclient:/sys/bus/rbd# ll
  total 0
  drwxr-xr-x  4 root root0 Sep 19 11:59 ./
  drwxr-xr-x 30 root root0 Sep 13 11:41 ../
  --w---  1 root root 4096 Sep 19 11:59 add
  drwxr-xr-x  2 root root0 Sep 19 12:03 devices/
  drwxr-xr-x  2 root root0 Sep 19 12:03 drivers/
  -rw-r--r--  1 root root 4096 Sep 19 12:03 drivers_autoprobe
  --w---  1 root root 4096 Sep 19 12:03 drivers_probe
  --w---  1 root root 4096 Sep 19 12:03 remove
  --w---  1 root root 4096 Sep 19 11:59 uevent
 
 
  I checked even if I am logged in as root , I can't write anything on
 /sys.
 
  Here is the Ubuntu version I am using..
 
  root@emsclient:/etc# lsb_release -a
  No LSB modules are available.
  Distributor ID: Ubuntu
  Description:Ubuntu 13.04
  Release:13.04
  Codename:   raring
 
  Here is the mount information
 
  root@emsclient:/etc# mount
  /dev/mapper/emsclient--vg-root on / type ext4 (rw,errors=remount-ro)
  proc on /proc type proc (rw,noexec,nosuid,nodev) sysfs on /sys type
  sysfs (rw,noexec,nosuid,nodev) none on /sys/fs/cgroup type tmpfs (rw)
  none on /sys/fs/fuse/connections type fusectl (rw) none on
  /sys/kernel/debug type debugfs (rw) none on /sys/kernel/security type
  securityfs (rw) udev on /dev type devtmpfs (rw,mode=0755) devpts on
  /dev/pts type devpts (rw,noexec,nosuid,gid=5,mode=0620)
  tmpfs on /run type tmpfs (rw,noexec,nosuid,size=10%,mode=0755)
  none on /run/lock type tmpfs (rw,noexec,nosuid,nodev,size=5242880)
  none on /run/shm type tmpfs (rw,nosuid,nodev) none on /run/user type
  tmpfs (rw,noexec,nosuid,nodev,size=104857600,mode=0755)
  /dev/sda1 on /boot type ext2 (rw)
  /dev/mapper/emsclient--vg-home on /home type ext4 (rw)
 
 
  Any idea what went wrong here ?
 
  Thanks  Regards
  Somnath
 
  -Original Message-
  From: Josh Durgin [mailto:josh.dur...@inktank.com]
  Sent: Wednesday, September 18, 2013 6:10 PM
  To: Somnath Roy
  Cc: Sage Weil; ceph-de...@vger.kernel.org; Anirban Ray;
  ceph-users@lists.ceph.com
  Subject: Re: [ceph-users] Scaling RBD module
 
  On 09/17/2013 03:30 PM, Somnath Roy wrote:
  Hi,
  I am running Ceph on a 3 node cluster and each of my server node is
 running 10 OSDs, one for each disk. I have one admin node and all the nodes
 are connected with 2 X 10G network. One network is for cluster and other
 one configured as public network.
 
  Here is the status of my cluster.
 
  ~/fio_test# ceph -s
 
  cluster b2e0b4db-6342-490e-9c28-0aadf0188023
   health HEALTH_WARN clock skew detected on mon. server-name-2,
 mon. server-name-3
   monmap e1: 3 mons at {server-name-1=xxx.xxx.xxx.xxx:6789/0,
 server-name-2=xxx.xxx.xxx.xxx:6789/0,
 server-name-3=xxx.xxx.xxx.xxx:6789/0}, election epoch 64, quorum 0,1,2
 server-name-1,server-name-2,server-name-3
   osdmap e391: 30 osds: 30 up, 30 in
pgmap v5202: 30912 pgs: 30912 active+clean; 8494 MB data, 27912
 MB used, 11145 GB / 11172 GB avail
   mdsmap e1: 0/0/1 up
 
 
  I

Re: [ceph-users] Scaling RBD module

2013-09-24 Thread Travis Rhoden

On Tue, Sep 24, 2013 at 5:16 PM, Sage Weil s...@inktank.com wrote:
 On Tue, 24 Sep 2013, Travis Rhoden wrote:
 This noshare option may have just helped me a ton -- I sure wish I would
 have asked similar questions sooner, because I have seen the same failure to
 scale.  =)

 One question -- when using the noshare option (or really, even without it)
 are there any practical limits on the number of RBDs that can be mounted?  I
 have servers with ~100 RBDs on them each, and am wondering if I switch them
 all over to using noshare if anything is going to blow up, use a ton more
 memory, etc.  Even without noshare, are there any known limits to how many
 RBDs can be mapped?

 With noshare each mapped image will appear as a separate client instance,
 which means it will have it's own session with teh monitors and own TCP
 connections to the OSDs.  It may be a viable workaround for now but in
 general I would not recommend it.

Good to know.  We are still playing with CephFS as our ultimate
solution, but in the meantime this may indeed be a good workaround for
me.


 I'm very curious what the scaling issue is with the shared client.  Do you
 have a working perf that can capture callgraph information on this
 machine?

Not currently, but I could certainly work on it.  The issue that we
see is basically what the OP showed -- that there seems to be a finite
amount of bandwidth that I can read/write from a machine, regardless
of how many RBDs are involved.  i.e., if I can get 1GB/sec writes on
one RBD when everything else is idle, running the same test on two
RBDs in parallel *from the same machine* ends up with the sum of the
two at ~1GB/sec, split fairly evenly. However, if I do the same thing
and run the same test on two RBDs, each hosted on a separate machine,
I definitely see increased bandwidth.  Monitoring network traffic and
the Ceph OSD nodes seems to imply that they are not overloaded --
there is more bandwidth to be had, the clients just aren't able to
push the data fast enough.  That's why I'm hoping creating a new
client for each RBD will improve things.

I'm not going to enable this everywhere just yet, we will test things
on a few RBDs and test, and perhaps enable on some RBDs that are
particularly heavily loaded.

I'll work on the perf capture!

Thanks for the feedback, as always.

 - Travis

 sage


 Thanks!

  - Travis


 On Thu, Sep 19, 2013 at 8:03 PM, Somnath Roy somnath@sandisk.com
 wrote:
   Thanks Josh !
   I am able to successfully add this noshare option in the image
   mapping now. Looking at dmesg output, I found that was indeed
   the secret key problem. Block performance is scaling now.

   Regards
   Somnath

   -Original Message-
   From: ceph-devel-ow...@vger.kernel.org
   [mailto:ceph-devel-ow...@vger.kernel.org] On Behalf Of Josh
   Durgin
   Sent: Thursday, September 19, 2013 12:24 PM
   To: Somnath Roy
   Cc: Sage Weil; ceph-de...@vger.kernel.org; Anirban Ray;
   ceph-users@lists.ceph.com
   Subject: Re: [ceph-users] Scaling RBD module

   On 09/19/2013 12:04 PM, Somnath Roy wrote:
Hi Josh,
Thanks for the information. I am trying to add the following
   but hitting some permission issue.
   
root@emsclient:/etc# echo
   mon-1:6789,mon-2:6789,mon-3:6789
name=admin,key=client.admin,noshare test_rbd ceph_block_test'
   
/sys/bus/rbd/add
-bash: echo: write error: Operation not permitted

   If you check dmesg, it will probably show an error trying to
   authenticate to the cluster.

   Instead of key=client.admin, you can pass the base64 secret
   value as shown in 'ceph auth list' with the
   secret=X option.

   BTW, there's a ticket for adding the noshare option to rbd map
   so using the sysfs interface like this is never necessary:

   http://tracker.ceph.com/issues/6264

   Josh

Here is the contents of rbd directory..
   
root@emsclient:/sys/bus/rbd# ll
total 0
drwxr-xr-x  4 root root0 Sep 19 11:59 ./
drwxr-xr-x 30 root root0 Sep 13 11:41 ../
--w---  1 root root 4096 Sep 19 11:59 add
drwxr-xr-x  2 root root0 Sep 19 12:03 devices/
drwxr-xr-x  2 root root0 Sep 19 12:03 drivers/
-rw-r--r--  1 root root 4096 Sep 19 12:03 drivers_autoprobe
--w---  1 root root 4096 Sep 19 12:03 drivers_probe
--w---  1 root root 4096 Sep 19 12:03 remove
--w---  1 root root 4096 Sep 19 11:59 uevent
   
   
I checked even if I am logged in as root , I can't write
   anything on /sys.
   
Here is the Ubuntu version I am using..
   
root@emsclient:/etc# lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description:Ubuntu 13.04
Release:13.04
Codename:   raring

Re: [ceph-users] 1 particular ceph-mon never jobs on 0.67.2

2013-08-27 Thread Travis Rhoden

Hi James,

Yes, all configured using the interfaces file.  Only two interfaces, eth0
and eth1:

auto eth0
iface eth0 inet dhcp

auto eth1
iface eth1 inet dhcp

I took a single node and rebooted it several times, and it really was about
50/50 whether or not the OSDs showed up under 'localhost' or n0.  I tried
a few different things last night with no luck.  I modified when ceph-all
starts by writing differet start on values to
/etc/init/ceph-all.override.  I was grasping for straws a bit, as I just
kept adding (and'ing) events, hoping to find something that works.  I tried:

start on (local-filesystems and net-device-up IFACE=eth0)
start on (local-filesystems and net-device-up IFACE=eth0 and net-device-up
IFACE=eth1)
start on (local-filesystems and net-device-up IFACE=eth0 and net-device-up
IFACE=eth1 and started network-services)

Oddly, the last one seemed to work at first.  When I added the started
network-services to the list, the OSDs came up correctly each time!  But,
the monitor never started.  If I started it directly start ceph-mon
id=n0, it came up fine, but not during boot.  I spent a couple hours
trying to debug *that* before I gave up and switched to static hostnames.
=/  I had even thrown --verbose in the kernel command line so I could see
all the upstart events happening, but didn't see anything obvious.

So now I'm back to the stock upstart scripts, using static hostnames, and,
and I don't have any issues with OSDs moving in the crushmap, or any new
problems with the monitors.  Sage, I do think I still saw a weird issue
with my third mon not starting (same as the original email -- even now with
static hostnames), but it was late, and I lost access to the cluster right
about then and haven't regained it.  Ill double-check that when I get
access again and hopefully will find that problem has gone away too.

 - Travis
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] 1 particular ceph-mon never jobs on 0.67.2

2013-08-26 Thread Travis Rhoden

Cool.  So far I have tried:

start on (local-filesystems and net-device-up IFACE=eth0)
start on (local-filesystems and net-device-up IFACE=eth0 and net-device-up
IFACE=eth1)

About to try:
start on (local-filesystems and net-device-up IFACE=eth0 and net-device-up
IFACE=eth1 and started network-services)

The local-filesystems + network device is billed as an alternative to
runlevel if you need to to do something *after* networking...

No luck so far.  I'll keep trying things out.


On Mon, Aug 26, 2013 at 2:31 PM, Sage Weil s...@inktank.com wrote:

 On Mon, 26 Aug 2013, Travis Rhoden wrote:
  Hi Sage,
 
  Thanks for the response.  I noticed that as well, and suspected
  hostname/DHCP/DNS shenanigans.  What's weird is that all nodes are
  identically configured.  I also have monitors running on n0 and n12, and
  they come up fine, every time.
 
  Here's the mon_host line from ceph.conf:
 
  mon_initial_members = n0, n12, n24
  mon_host = 10.0.1.0,10.0.1.12,10.0.1.24
 
  just to test /etc/hosts and name resolution...
 
  root@n24:~# getent hosts n24
  10.0.1.24   n24
  root@n24:~# hostname -s
  n24
 
  The only loopback device in /etc/hosts is 127.0.0.1   localhost, so
  that should be fine.
 
  Upon rebooting this node, I've had the monitor come up okay once, maybe
 out
  of 12 tries.  So it appears to be some kind of race...  No clue what is
  going on.  If I stop and start the monitor (or restart), it doesn't
 appear
  to change anything.
 
  However, on the topic of races, I having one other more pressing issue.
  Each OSD host is having it's hostname assigned via DHCP.  Until that
  assignment is made (during init), the hostname is localhost, and then
 it
  switches over to nx, for some node number.  The issue I am seeing is
  that there is a race between this hostname assignment and the Ceph
 Upstart
  scripts, such that sometimes ceph-osd starts while the hostname is still
  'localhost'.  This then causes the osd location to change in the
 crushmap,
  which is going to be a very bad thing.  =)  When rebooting all my nodes
 at
  once (there are several dozen), about 50% move from being under nx to
  localhost.  Restarting all the ceph-osd jobs moves them back (because the
  hostname is defined).
 
  I'm wondering what kind of delay, or additional start-on logic I can
 add
  to the upstart script to work around this.

 Hmm, this is beyond my upstart-fu, unfortunately.  This has come up
 before, actually.  Previously we would wait for any interface to come up
 and then start, but that broke with multi-nic machines, and I ended up
 just making things start in runlevel [2345].

 James, do you know what should be done to make the job wait for *all*
 network interfaces to be up?  Is that even the right solution here?

 sage


 
 
  On Fri, Aug 23, 2013 at 4:47 PM, Sage Weil s...@inktank.com wrote:
Hi Travis,
 
On Fri, 23 Aug 2013, Travis Rhoden wrote:
 Hey folks,

 I've just done a brand new install of 0.67.2 on a cluster of
Calxeda nodes.

 I have one particular monitor that number joins the quorum
when I restart
 the node.  Looks to  me like it has something to do with the
create-keys
 task, which never seems to finish:

 root  1240 1  4 13:03 ?00:00:02
/usr/bin/ceph-mon
 --cluster=ceph -i n24 -f
 root  1244 1  0 13:03 ?00:00:00
/usr/bin/python
 /usr/sbin/ceph-create-keys --cluster=ceph -i n24

 I don't see that task on my other monitors.  Additionally,
that task is
 periodically query the monitor status:

 root  1240 1  2 13:03 ?00:00:02
/usr/bin/ceph-mon
 --cluster=ceph -i n24 -f
 root  1244 1  0 13:03 ?00:00:00
/usr/bin/python
 /usr/sbin/ceph-create-keys --cluster=ceph -i n24
 root  1982  1244 15 13:04 ?00:00:00
/usr/bin/python
 /usr/bin/ceph --cluster=ceph
--admin-daemon=/var/run/ceph/ceph-mon.n24.asok
 mon_status

 Checking that status myself, I see:

 # ceph --cluster=ceph
--admin-daemon=/var/run/ceph/ceph-mon.n24.asok
 mon_status
 { name: n24,
   rank: 2,
   state: probing,
   election_epoch: 0,
   quorum: [],
   outside_quorum: [
 n24],
   extra_probe_peers: [],
   sync_provider: [],
   monmap: { epoch: 2,
   fsid: f0b0d4ec-1ac3-4b24-9eab-c19760ce4682,
   modified: 2013-08-23 12:55:34.374650,
   created: 0.00,
   mons: [
 { rank: 0,
   name: n0,
   addr: 10.0.1.0:6789\/0},
 { rank: 1,
   name: n12,
   addr: 10.0.1.12:6789\/0

Re: [ceph-users] Backporting the kernel client

2013-06-18 Thread Travis Rhoden

I built the 3.10-rc rbd module for a 3.8 kernel yesterday, and only
had one thing to add (I know I'm reviving an old thread).

There is one folder missing from the original list of files to use:

include/linux/crush/*

That would bring everything to:

include/keys/ceph-type.h
include/linux/ceph/*
include/linux/crush/**
fs/ceph/*
net/ceph/*
drivers/block/rbd.c
drivers/block/rbd_types.h

RBD built without a hitch.  Getting CephFS to build was going to be a
bit more work, but I didn't need it so I just skipped it.

- Travis

On Mon, Apr 29, 2013 at 8:41 PM, James Harper
james.har...@bendigoit.com.au wrote:

 I'm probably not the only one who would like to run a
 distribution-provided kernel (which for Debian Wheezy/Ubuntu Precise is
 3.2) and still have a recent-enough Ceph kernel client. So I'm wondering
 whether it's feasible to backport the kernel client to an earlier kernel.

 You can grab the 3.8 kernel from debian experimental 
 http://packages.debian.org/search?keywords=linux-image-3.8

 I'm using it on a bunch of machines and I know of a few others using it too.

 The plan is as follows:

 1) Grab the Ceph files from https://github.com/ceph/ceph-client (and put
 them over the older kernel sources). If I got it right the files are:
 include/keys/ceph-type.h include/linux/ceph/* fs/ceph/* net/ceph/*
 drivers/block/rbd.c drivers/block/rbd_types.h

 2) Make (trivial) adjustments to the source code to account for changed
 kernel interfaces.

 3) Compile as modules and install the new Ceph modules under
 /lib/modules.

 4) Reboot to a standard distribution kernel with up-to-date Ceph client.


 I would think you should be able to build a dkms package pretty easily, and 
 it would be a lot faster to build than building an entire kernel, and much 
 easier to maintain. Of course that depends on the degree of integration with 
 the kernel...

 James

 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Upgrade from bobtail

2013-06-17 Thread Travis Rhoden

I'm actually planning this same upgrade on Saturday.  Is the memory
leak from Bobtail during deep-scrub known to be squashed?  I've been
seeing that a lot lately.

I know Bobtail-Cuttlefish is only one way, due to the mon
re-architecting.  But in general, whenever we do upgrades we usually
have a fall-back/reversion plan in case things go wrong.  Is that ever
going to be possible with Ceph?

 - Travis

On Mon, Jun 17, 2013 at 12:27 PM, Sage Weil s...@inktank.com wrote:
 On Mon, 17 Jun 2013, Wolfgang Hennerbichler wrote:
 Hi, i'm planning to Upgrade my bobtail (latest) cluster to cuttlefish.
 Are there any outstanding issues that I should be aware of? Anything
 that could brake my productive setup?

 There will be another point release out in the next day or two that
 resolves a rare sequence of errors during the upgrade that can be
 problematic (see the 0.61.3 release notes).  There are also several fixes
 for udev/ceph-disk/ceph-deploy on rpm-based distros that will be included.
 If you can wait a couple days I would suggest that.

 sage

 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] OSD crash during script, 0.56.4

2013-05-13 Thread Travis Rhoden

I'm afraid I don't.  I don't think I looked when it happened, and
searching for one just now came up empty.  :/  If it happens again,
I'll be sure to keep my eye out for one.

FWIW, this particular server (1 out of 5) has 8GB *less* RAM than the
others (one bad stick, it seems), and this has happened twice.  But it
still has 40GB for 12 OSDs, so I think it should be plenty.  Thanks
for responding.

 - Travis

On Mon, May 13, 2013 at 4:49 PM, Gregory Farnum g...@inktank.com wrote:
 On Tue, May 7, 2013 at 9:44 AM, Travis Rhoden trho...@gmail.com wrote:
 Hey folks,

 Saw this crash the other day:

  ceph version 0.56.4 (63b0f854d1cef490624de5d6cf9039735c7de5ca)
  1: /usr/bin/ceph-osd() [0x788fba]
  2: (()+0xfcb0) [0x7f19d1889cb0]
  3: (gsignal()+0x35) [0x7f19d0248425]
  4: (abort()+0x17b) [0x7f19d024bb8b]
  5: (__gnu_cxx::__verbose_terminate_handler()+0x11d) [0x7f19d0b9a69d]
  6: (()+0xb5846) [0x7f19d0b98846]
  7: (()+0xb5873) [0x7f19d0b98873]
  8: (()+0xb596e) [0x7f19d0b9896e]
  9: (operator new[](unsigned long)+0x47e) [0x7f19d102db1e]
  10: (ceph::buffer::create(unsigned int)+0x67) [0x834727]
  11: (ceph::buffer::ptr::ptr(unsigned int)+0x15) [0x834a95]
  12: (FileStore::read(coll_t, hobject_t const, unsigned long,
 unsigned long, ceph::buffer::list)+0x1ae) [0x6fbdde]
  13: (PG::build_scrub_map_chunk(ScrubMap, hobject_t, hobject_t,
 bool)+0x347) [0x69ac57]
  14: (PG::chunky_scrub()+0x375) [0x69faf5]
  15: (PG::scrub()+0x145) [0x6a0e95]
  16: (OSD::ScrubWQ::_process(PG*)+0xc) [0x6384ec]
  17: (ThreadPool::worker(ThreadPool::WorkThread*)+0x4e6) [0x8297e6]
  18: (ThreadPool::WorkThread::entry()+0x10) [0x82b610]
  19: (()+0x7e9a) [0x7f19d1881e9a]
  20: (clone()+0x6d) [0x7f19d0305cbd]
  NOTE: a copy of the executable, or `objdump -rdS executable` is
 needed to interpret this.

 Appears to have gone down during a scrub?

 I don't see anything interesting in /var/log/syslog or anywhere else
 at the same time.  It's actually the second time I've seen this exact
 stack trace.  First time was reported here...  (was going to insert
 GMane link, but search.gmane.org appears to be down for me).  Well,
 for those inclined, the thread was titled question about mon memory
 usage, and was also started by me.

 Any thoughts?  I do plan to upgrade to 0.56.6 when I can.  I'm a
 little leery of doing it on a production system without a maintenance
 window, though.  When I went from 0.56.3 -- 0.56.4 on a live system,
 a system using the RBD kernel module kpanic'd.  =)

 Do you have a core from when this happened? It was indeed during a
 scrub, but it didn't fail an assert or anything — looks like maybe it
 tried to allocate too much memory or something... :/
 -Greg
 Software Engineer #42 @ http://inktank.com | http://ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] distinguish administratively down OSDs

2013-05-13 Thread Travis Rhoden

Hey folks,

This is either a feature request, or a request for guidance to handle
something that must be common...  =)

I have a cluster with dozens of OSDs, and one started having read
errors (media errors) from the hard disk.  Ceph complained, I took it
out of service my marking it down and out.  ceph osd tree showed it
as down, with a weight of 0 (out).  Perfect.  In the meantime, I RMA'd
the disk.  The replacement is on-hand, but we haven't done the
swap-out yet.  Woohoo, rot in place.  =)

Fast forward a few days, and we had a server failure.  This took a
bunch of OSDs with it, but we were able to bring it back online, but
not before before normal recovery operations had started. The failed
server came back up, and things started to migrate *back*.  All this
is normal.  However, the loads were pretty intense, and I actually saw
a few OSDs on *other* servers fail.  Seemingly randomly.  Only 3 or 4.
 Thankfully I was watching for that, and restarted them before hitting
the default 5 minute timeout and kicking off *more* recovery.

On to my question...  During this time where I was watching for newly
down OSDs, I had no way of knowing which OSDs were newly down (and
potentially out), and which was the one I had set down on purpose.  At
least not from the CLI.  I figured it out from some notes I had taken
when I RMA'd the drive, but (sheepishly) not before I tried restarting
the OSD that had a bad hard drive behind it.

So, from the CLI, how could one distinguish OSDs that are down *on
purpose* and should be left that way?

My first thought would be to allow for a note field to be attached
to an OSD, and have that displayed in the output of ceph osd tree.
If anyone is familiar with HPC and specifically PBS (pbsnodes command,
specifically), this would be similar to pbsnodes -ln, which shows
notes attached to compute nodes that an administrator might have
attached to compute nodes that are down.  Examples I see from this on
one of our current compute clusters are bad RAM, bad scratch disk,
does not POST, etc.

Anyone else want to be able to track such a thing?  Is there an
existing method I could achieve such a goal with?  As things scale to
hundreds of OSDs are more, seems like a useful thing to note OSDs that
have failed, and why.

Thanks,

 - Travis
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] OSD crash during script, 0.56.4

2013-05-07 Thread Travis Rhoden

Hey folks,

Saw this crash the other day:

 ceph version 0.56.4 (63b0f854d1cef490624de5d6cf9039735c7de5ca)
 1: /usr/bin/ceph-osd() [0x788fba]
 2: (()+0xfcb0) [0x7f19d1889cb0]
 3: (gsignal()+0x35) [0x7f19d0248425]
 4: (abort()+0x17b) [0x7f19d024bb8b]
 5: (__gnu_cxx::__verbose_terminate_handler()+0x11d) [0x7f19d0b9a69d]
 6: (()+0xb5846) [0x7f19d0b98846]
 7: (()+0xb5873) [0x7f19d0b98873]
 8: (()+0xb596e) [0x7f19d0b9896e]
 9: (operator new[](unsigned long)+0x47e) [0x7f19d102db1e]
 10: (ceph::buffer::create(unsigned int)+0x67) [0x834727]
 11: (ceph::buffer::ptr::ptr(unsigned int)+0x15) [0x834a95]
 12: (FileStore::read(coll_t, hobject_t const, unsigned long,
unsigned long, ceph::buffer::list)+0x1ae) [0x6fbdde]
 13: (PG::build_scrub_map_chunk(ScrubMap, hobject_t, hobject_t,
bool)+0x347) [0x69ac57]
 14: (PG::chunky_scrub()+0x375) [0x69faf5]
 15: (PG::scrub()+0x145) [0x6a0e95]
 16: (OSD::ScrubWQ::_process(PG*)+0xc) [0x6384ec]
 17: (ThreadPool::worker(ThreadPool::WorkThread*)+0x4e6) [0x8297e6]
 18: (ThreadPool::WorkThread::entry()+0x10) [0x82b610]
 19: (()+0x7e9a) [0x7f19d1881e9a]
 20: (clone()+0x6d) [0x7f19d0305cbd]
 NOTE: a copy of the executable, or `objdump -rdS executable` is
needed to interpret this.

Appears to have gone down during a scrub?

I don't see anything interesting in /var/log/syslog or anywhere else
at the same time.  It's actually the second time I've seen this exact
stack trace.  First time was reported here...  (was going to insert
GMane link, but search.gmane.org appears to be down for me).  Well,
for those inclined, the thread was titled question about mon memory
usage, and was also started by me.

Any thoughts?  I do plan to upgrade to 0.56.6 when I can.  I'm a
little leery of doing it on a production system without a maintenance
window, though.  When I went from 0.56.3 -- 0.56.4 on a live system,
a system using the RBD kernel module kpanic'd.  =)

 - Travis
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] ceph osd tell bench

2013-05-03 Thread Travis Rhoden

I have a question about tell bench command.

When I run this, is it behaving more or less like a dd on the drive?  It
appears to be, but I wanted to confirm whether or not it is bypassing all
the normal Ceph stack that would be writing metadata, calculating
checksums, etc.

One bit of behavior I noticed a while back that I was not expecting is that
this command does write to the journal.  It made sense when I thought about
it, but when I have an SSD journal in front of an OSD, I can't get the
tell bench command to really show me accurate numbers of the raw speed of
the OSD -- instead I get write speeds of the SSD.  Just a small caveat
there.

The upside to that is when do you something like tell \* bench, you are
able to see if that SSD becomes a bottleneck by hosting multiple journals,
so I'm not really complaining.  But it does make a bit tough to see if
perhaps one OSD is performing much differently than others.

But really, I'm mainly curious if it skips any normal metadata/checksum
overhead that may be there otherwise.

Thanks,

 - Travis
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Failed assert when starting new OSDs in 0.60

2013-05-02 Thread Travis Rhoden

Hi Guys,

Any additional thoughts on this?  There was a bit of information shared
off-list I wanted to bring back:

Sam mentioned that the metadata looked odd, and suspected some form of
32bit shenanigans in the key name construction.

However, that might not have been the case, because later came in with:

Hmm.  Based on the omap and logs, the omap directory is simply a bunch
of updates behind.  Was the node rebooted as part of the osd restart?
FS is xfs?  What are your fs mount options?

There was no node restart.  We are using XFS.

From ceph.conf:

osd mount options xfs = rw,noatime,inode64,logbufs=8,logbsize=256k

And of course as soon as I paste that, I look at inode64 on these 32-bit
ARM systems and think, hmm.  I know 64-bit inodes are recommended for
filesystems  1TB (these are 4TB drives), but have never thought about if
this is supported on a 32-bit system.  Quick web searches appear to
indicate this may be okay...

Sorry some of this may be a duplicate.  I wanted to bring it back on-list
in case someone looked at that and said no, you can't use those XFS
options on 32-bit ARM.  =)

On a side note, I've been using the cluster heavily the last couple days,
with no other problems.  I just am not doing any cluster or OSD restarts
for fear of the OSD not coming back.

 - Travis


On Tue, Apr 30, 2013 at 12:17 PM, Travis Rhoden trho...@gmail.com wrote:

 On the OSD node:

 root@cepha0:~# lsb_release -a
 No LSB modules are available.
 Distributor ID:Ubuntu
 Description:Ubuntu 12.10
 Release:12.10
 Codename:quantal
 root@cepha0:~# dpkg -l *leveldb*
 Desired=Unknown/Install/Remove/Purge/Hold
 |
 Status=Not/Inst/Conf-files/Unpacked/halF-conf/Half-inst/trig-aWait/Trig-pend
 |/ Err?=(none)/Reinst-required (Status,Err: uppercase=bad)
 ||/ Name   Version
 Architecture Description

 +++-==---==
 ii  libleveldb1:armhf  0+20120530.gitdd0d562-2
 armhffast key-value storage library
 root@cepha0:~# uname -a
 Linux cepha0 3.5.0-27-highbank #46-Ubuntu SMP Mon Mar 25 23:19:40 UTC 2013
 armv7l armv7l armv7l GNU/Linux


 On the MON node:
 # lsb_release -a
 No LSB modules are available.
 Distributor ID:Ubuntu
 Description:Ubuntu 12.10
 Release:12.10
 Codename:quantal
 # uname -a
 Linux  3.5.0-27-generic #46-Ubuntu SMP Mon Mar 25 19:58:17 UTC 2013 x86_64
 x86_64 x86_64 GNU/Linux
 # dpkg -l *leveldb*
 Desired=Unknown/Install/Remove/Purge/Hold
 |
 Status=Not/Inst/Conf-files/Unpacked/halF-conf/Half-inst/trig-aWait/Trig-pend
 |/ Err?=(none)/Reinst-required (Status,Err: uppercase=bad)
 ||/ Name   Version
 Architecture Description

 +++-==---==
 un  leveldb-doc
 none(no description available)
 ii  libleveldb-dev:amd64   0+20120530.gitdd0d562-2
 amd64fast key-value storage library (development files)
 ii  libleveldb1:amd64  0+20120530.gitdd0d562-2
 amd64fast key-value storage library


 On Tue, Apr 30, 2013 at 12:11 PM, Samuel Just sam.j...@inktank.comwrote:

 What version of leveldb is installed?  Ubuntu/version?
 -Sam

 On Tue, Apr 30, 2013 at 8:50 AM, Travis Rhoden trho...@gmail.com wrote:
  Interestingly, the down OSD does not get marked out after 5 minutes.
  Probably that is already fixed by http://tracker.ceph.com/issues/4822.
 
 
  On Tue, Apr 30, 2013 at 11:42 AM, Travis Rhoden trho...@gmail.com
 wrote:
 
  Hi Sam,
 
  I was prepared to write in and say that the problem had gone away.  I
  tried restarting several OSDs last night in the hopes of capturing the
  problem on and OSD that hadn't failed yet, but didn't have any luck.
  So I
  did indeed re-create the cluster from scratch (using mkcephfs), and
 what do
  you know -- everything worked.  I got everything in a nice stable
 state,
  then decided to do a full cluster restart, just to be sure.  Sure
 enough,
  one OSD failed to come up, and has the same stack trace.  So I believe
 I
  have the log you want -- just from the OSD that failed, right?
 
  Question -- any feeling for what parts of the log you need?  It's 688MB
  uncompressed (two hours!), so I'd like to be able to trim some off for
 you
  before making it available.  Do you only need/want the part from after
 the
  OSD was restarted?  Or perhaps the corruption happens on OSD shutdown
 and
  you need some before that?  If you are fine with that large of a file,
 I can
  just make that available too.  Let me know.
 
   - Travis
 
 
  On Mon, Apr 29, 2013 at 6:26 PM, Travis Rhoden trho...@gmail.com
 wrote

Re: [ceph-users] Failed assert when starting new OSDs in 0.60

2013-04-30 Thread Travis Rhoden

On the OSD node:

root@cepha0:~# lsb_release -a
No LSB modules are available.
Distributor ID:Ubuntu
Description:Ubuntu 12.10
Release:12.10
Codename:quantal
root@cepha0:~# dpkg -l *leveldb*
Desired=Unknown/Install/Remove/Purge/Hold
|
Status=Not/Inst/Conf-files/Unpacked/halF-conf/Half-inst/trig-aWait/Trig-pend
|/ Err?=(none)/Reinst-required (Status,Err: uppercase=bad)
||/ Name   Version
Architecture Description
+++-==---==
ii  libleveldb1:armhf  0+20120530.gitdd0d562-2
armhffast key-value storage library
root@cepha0:~# uname -a
Linux cepha0 3.5.0-27-highbank #46-Ubuntu SMP Mon Mar 25 23:19:40 UTC 2013
armv7l armv7l armv7l GNU/Linux


On the MON node:
# lsb_release -a
No LSB modules are available.
Distributor ID:Ubuntu
Description:Ubuntu 12.10
Release:12.10
Codename:quantal
# uname -a
Linux  3.5.0-27-generic #46-Ubuntu SMP Mon Mar 25 19:58:17 UTC 2013 x86_64
x86_64 x86_64 GNU/Linux
# dpkg -l *leveldb*
Desired=Unknown/Install/Remove/Purge/Hold
|
Status=Not/Inst/Conf-files/Unpacked/halF-conf/Half-inst/trig-aWait/Trig-pend
|/ Err?=(none)/Reinst-required (Status,Err: uppercase=bad)
||/ Name   Version
Architecture Description
+++-==---==
un  leveldb-doc
none(no description available)
ii  libleveldb-dev:amd64   0+20120530.gitdd0d562-2
amd64fast key-value storage library (development files)
ii  libleveldb1:amd64  0+20120530.gitdd0d562-2
amd64fast key-value storage library


On Tue, Apr 30, 2013 at 12:11 PM, Samuel Just sam.j...@inktank.com wrote:

 What version of leveldb is installed?  Ubuntu/version?
 -Sam

 On Tue, Apr 30, 2013 at 8:50 AM, Travis Rhoden trho...@gmail.com wrote:
  Interestingly, the down OSD does not get marked out after 5 minutes.
  Probably that is already fixed by http://tracker.ceph.com/issues/4822.
 
 
  On Tue, Apr 30, 2013 at 11:42 AM, Travis Rhoden trho...@gmail.com
 wrote:
 
  Hi Sam,
 
  I was prepared to write in and say that the problem had gone away.  I
  tried restarting several OSDs last night in the hopes of capturing the
  problem on and OSD that hadn't failed yet, but didn't have any luck.
  So I
  did indeed re-create the cluster from scratch (using mkcephfs), and
 what do
  you know -- everything worked.  I got everything in a nice stable state,
  then decided to do a full cluster restart, just to be sure.  Sure
 enough,
  one OSD failed to come up, and has the same stack trace.  So I believe I
  have the log you want -- just from the OSD that failed, right?
 
  Question -- any feeling for what parts of the log you need?  It's 688MB
  uncompressed (two hours!), so I'd like to be able to trim some off for
 you
  before making it available.  Do you only need/want the part from after
 the
  OSD was restarted?  Or perhaps the corruption happens on OSD shutdown
 and
  you need some before that?  If you are fine with that large of a file,
 I can
  just make that available too.  Let me know.
 
   - Travis
 
 
  On Mon, Apr 29, 2013 at 6:26 PM, Travis Rhoden trho...@gmail.com
 wrote:
 
  Hi Sam,
 
  No problem, I'll leave that debugging turned up high, and do a mkcephfs
  from scratch and see what happens.  Not sure if it will happen again
 or not.
  =)
 
  Thanks again.
 
   - Travis
 
 
  On Mon, Apr 29, 2013 at 5:51 PM, Samuel Just sam.j...@inktank.com
  wrote:
 
  Hmm, I need logging from when the corruption happened.  If this is
  reproducible, can you enable that logging on a clean osd (or better, a
  clean cluster) until the assert occurs?
  -Sam
 
  On Mon, Apr 29, 2013 at 2:45 PM, Travis Rhoden trho...@gmail.com
  wrote:
   Also, I can note that it does not take a full cluster restart to
   trigger
   this.  If I just restart an OSD that was up/in previously, the same
   error
   can happen (though not every time).  So restarting OSD's for me is a
   bit
   like Russian roullette.  =)  Even though restarting an OSD may not
   also
   result in the error, it seems that once it happens that OSD is gone
   for
   good.  No amount of restart has brought any of the dead ones back.
  
   I'd really like to get to the bottom of it.  Let me know if I can do
   anything to help.
  
   I may also have to try completely wiping/rebuilding to see if I can
   make
   this thing usable.
  
  
   On Mon, Apr 29, 2013 at 2:38 PM, Travis Rhoden trho...@gmail.com
   wrote:
  
   Hi Sam,
  
   Thanks for being willing to take a look.
  
   I applied the debug settings on one host that 3 out of 3

Re: [ceph-users] Failed assert when starting new OSDs in 0.60

2013-04-29 Thread Travis Rhoden

Thanks Greg.

I quit playing with it because every time I restarted the cluster (service
ceph -a restart), I lost more OSDs..  First time it was 1, 2nd 10, 3rd time
13...  All 13 down OSDs all show the same stacktrace.

 - Travis


On Mon, Apr 29, 2013 at 11:56 AM, Gregory Farnum g...@inktank.com wrote:

 This sounds vaguely familiar to me, and I see
 http://tracker.ceph.com/issues/4052, which is marked as Can't
 reproduce — I think maybe this is fixed in next and master, but
 I'm not sure. For more than that I'd have to defer to Sage or Sam.
 -Greg
 Software Engineer #42 @ http://inktank.com | http://ceph.com


 On Sat, Apr 27, 2013 at 6:43 PM, Travis Rhoden trho...@gmail.com wrote:
  Hey folks,
 
  I'm helping put together a new test/experimental cluster, and hit this
 today
  when bringing the cluster up for the first time (using mkcephfs).
 
  After doing the normal service ceph -a start, I noticed one OSD was
 down,
  and a lot of PGs were stuck creating.  I tried restarting the down OSD,
 but
  it would come up.  It always had this error:
 
  -1 2013-04-27 18:11:56.179804 b6fcd000  2 osd.1 0 boot
   0 2013-04-27 18:11:56.402161 b6fcd000 -1 osd/PG.cc: In function
  'static epoch_t PG::peek_map_epoch(ObjectStore*, coll_t, hobject_t,
  ceph::bufferlist*)' thread b6fcd000 time 2013-04-27 18:11:56.399089
  osd/PG.cc: 2556: FAILED assert(values.size() == 1)
 
   ceph version 0.60-401-g17a3859
 (17a38593d60f5f29b9b66c13c0aaa759762c6d04)
   1: (PG::peek_map_epoch(ObjectStore*, coll_t, hobject_t,
  ceph::buffer::list*)+0x1ad) [0x2c3c0a]
   2: (OSD::load_pgs()+0x357) [0x28cba0]
   3: (OSD::init()+0x741) [0x290a16]
   4: (main()+0x1427) [0x2155c0]
   5: (__libc_start_main()+0x99) [0xb69bcf42]
   NOTE: a copy of the executable, or `objdump -rdS executable` is
 needed to
  interpret this.
 
 
  I then did a full cluster restart, and now I have ten OSDs down -- each
  showing the same exception/failed assert.
 
  Anybody seen this?
 
  I know I'm running a weird version -- it's compiled from source, and was
  provided to me.  The OSDs are all on ARM, and the mon is x86_64.  Just
  looking to see if anyone has seen this particular stack trace of
  load_pgs()/peek_map_epoch() before
 
   - Travis
 
  ___
  ceph-users mailing list
  ceph-users@lists.ceph.com
  http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
 

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Failed assert when starting new OSDs in 0.60

2013-04-29 Thread Travis Rhoden

Hi Sam,

Thanks for being willing to take a look.

I applied the debug settings on one host that 3 out of 3 OSDs with this
problem.  Then tried to start them up.  Here are the resulting logs:

https://dl.dropboxusercontent.com/u/23122069/cephlogs.tgz

 - Travis


On Mon, Apr 29, 2013 at 1:04 PM, Samuel Just sam.j...@inktank.com wrote:

 You appear to be missing pg metadata for some reason.  If you can
 reproduce it with
 debug osd = 20
 debug filestore = 20
 debug ms = 1
 on all of the OSDs, I should be able to track it down.

 I created a bug: #4855.

 Thanks!
 -Sam

 On Mon, Apr 29, 2013 at 9:52 AM, Travis Rhoden trho...@gmail.com wrote:
  Thanks Greg.
 
  I quit playing with it because every time I restarted the cluster
 (service
  ceph -a restart), I lost more OSDs..  First time it was 1, 2nd 10, 3rd
 time
  13...  All 13 down OSDs all show the same stacktrace.
 
   - Travis
 
 
  On Mon, Apr 29, 2013 at 11:56 AM, Gregory Farnum g...@inktank.com
 wrote:
 
  This sounds vaguely familiar to me, and I see
  http://tracker.ceph.com/issues/4052, which is marked as Can't
  reproduce — I think maybe this is fixed in next and master, but
  I'm not sure. For more than that I'd have to defer to Sage or Sam.
  -Greg
  Software Engineer #42 @ http://inktank.com | http://ceph.com
 
 
  On Sat, Apr 27, 2013 at 6:43 PM, Travis Rhoden trho...@gmail.com
 wrote:
   Hey folks,
  
   I'm helping put together a new test/experimental cluster, and hit this
   today
   when bringing the cluster up for the first time (using mkcephfs).
  
   After doing the normal service ceph -a start, I noticed one OSD was
   down,
   and a lot of PGs were stuck creating.  I tried restarting the down
 OSD,
   but
   it would come up.  It always had this error:
  
   -1 2013-04-27 18:11:56.179804 b6fcd000  2 osd.1 0 boot
0 2013-04-27 18:11:56.402161 b6fcd000 -1 osd/PG.cc: In function
   'static epoch_t PG::peek_map_epoch(ObjectStore*, coll_t, hobject_t,
   ceph::bufferlist*)' thread b6fcd000 time 2013-04-27 18:11:56.399089
   osd/PG.cc: 2556: FAILED assert(values.size() == 1)
  
ceph version 0.60-401-g17a3859
   (17a38593d60f5f29b9b66c13c0aaa759762c6d04)
1: (PG::peek_map_epoch(ObjectStore*, coll_t, hobject_t,
   ceph::buffer::list*)+0x1ad) [0x2c3c0a]
2: (OSD::load_pgs()+0x357) [0x28cba0]
3: (OSD::init()+0x741) [0x290a16]
4: (main()+0x1427) [0x2155c0]
5: (__libc_start_main()+0x99) [0xb69bcf42]
NOTE: a copy of the executable, or `objdump -rdS executable` is
   needed to
   interpret this.
  
  
   I then did a full cluster restart, and now I have ten OSDs down --
 each
   showing the same exception/failed assert.
  
   Anybody seen this?
  
   I know I'm running a weird version -- it's compiled from source, and
 was
   provided to me.  The OSDs are all on ARM, and the mon is x86_64.  Just
   looking to see if anyone has seen this particular stack trace of
   load_pgs()/peek_map_epoch() before
  
- Travis
  
   ___
   ceph-users mailing list
   ceph-users@lists.ceph.com
   http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
  
 
 

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] how to get latest (non-point release) debs

2013-03-18 Thread Travis Rhoden

Hey folks,

There are some changes in Bobtail queued up for 0.56.4 that I am really
anxious to get, but that build hasn't been released yet.  Is there an apt
repo I can point out that will get the latest build off of the bobtail
branch?

based on the docs [1] I tried this:

deb http://gitbuilder.ceph.com/ceph-deb-main-x86_64/ref/bobtail precise main

But that was not found.

 - Travis

[1] http://ceph.com/docs/master/install/debian/#development-testing-packages
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Live migration of VM using librbd and OpenStack

2013-03-13 Thread Travis Rhoden

Just for posterity, my ultimate solution was to patch nova on each
compute host to always return True in _check_shared_storage_test_file
(nova/virt/libvirt/driver.py)

This did make migration work with nova live-migration, with one
caveat.  Since Nova is assuming that /var/lib/nova/instances is on
shared storage (and since I hard coded the check to say yes, it
really is), it thinks the /var/lib/nova/instances/domain folder
will exist at both source and destination, and makes no attempt to
create it on the destination.  So before I run live-migration, I pop
over to the source host, and rsync that folder to the destination.

A little dirty, but it allows me to move running VMs around just fine
in cases of maintenance on a host, which is exactly what I need.

Thanks for everyone's feedback.

 - Travis

On Tue, Mar 12, 2013 at 6:33 PM, Travis Rhoden trho...@gmail.com wrote:

 On Tue, Mar 12, 2013 at 5:06 PM, Josh Durgin josh.dur...@inktank.com
 wrote:

 On 03/12/2013 01:48 PM, Travis Rhoden wrote:

 Hi Josh,

 Thanks for the info.  So if I want to do live migration with VMs that
 were
 launched with boot-from-volume, I'll need to use virsh to do the
 migration,
 rather than Nova.  Okay, that should be doable.  As an aside, I will
 probably want to look at the OpenStack DB and figure out how to tell it
 that the VM has moved to a different host.  I'd rather there not be a
 disconnect between Nova and libvirt about where the VM lives.  =)


 It's probably not too hard to edit nova to skip the checks when the
 instance is volume-backed, but if you don't want to do that, libvirt
 should be fine, and a bit more flexible.


 After messing with it for a few hours, I'm thinking about doing just that.
 The nova edits should be easy.  Looks like it tests for shared storage by
 writing a file on the  migration destination, and trying to read it at the
 source.  I should be able to just comment out the check entirely, or make
 the check always pass.

 The virsh migrate strategy has been surprisingly difficult.  Since I'm
 migrating a Nova VM, I had to the following pre-requisites (so far).

 Define the /var/lib/nova/instance/domain dir on the destination
 Define/migrate the nova libvirt-nwfilter for the specific VM

 Then, when I try to do the actual migration, I always get (at the source):

  error: internal error Process exited while reading console log output:
 chardev: opening backend file failed: Permission denied

 So QEMU is bailing, saying it can't read the console.log file.  When I go
 look at that file, it is created, but with owner root:root and perms 0600.
 However, libvirtd makes it libvirt-qemu:kvm, 0600 before KVM tries to
 actually start the VM.  I've always found this dynamic file ownership bit in
 KVM/libvirt/qemu very confusing.  Anyways, I tried a few different things,
 debug logging, etc.  Even tried disabling apparmor.  Still get permission
 denied each time.

 The commands Im running manually should be identical to what OpenStack is
 doing, so I can't figure out why their migrate is working and mine wouldn't.
 Oh well, will edit Nova and give that shot.

  - Travis
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Live migration of VM using librbd and OpenStack

2013-03-12 Thread Travis Rhoden

Hi Josh,

Thanks for the info. So if I want to do live migration with VMs that were
launched with boot-from-volume, I'll need to use virsh to do the migration,
rather than Nova. Okay, that should be doable. As an aside, I will
probably want to look at the OpenStack DB and figure out how to tell it
that the VM has moved to a different host. I'd rather there not be a
disconnect between Nova and libvirt about where the VM lives. =)

Additionally, thanks for saying that the migration is safe with the RBD
cache enabled. I was going to ask that as well.

On Tue, Mar 12, 2013 at 4:38 PM, Josh Durgin josh.dur...@inktank.comwrote:

On 03/12/2013 01:28 PM, Travis Rhoden wrote:

Thanks for the response, Trevor.

The root disk (/var/lib/nova/instances) must be on shared storage to run
the live migrate.

I would argue that it is on shared storage. It is an RBD stored in
Ceph,
and that's available at each host via librbd.

Agreed.

You should be able to run block migration (which is a different form of
the

live-migration) that does not require shared storage.

I think block-migration would not be correct in this instance. There
is no
file to copy (there is no disk file in /var/lib/nova/instances/**
domain).
Where is it going to copy it from/to? It's already an RBD.

I know this is supposed to work [1]. I'm just wondering if it requires
disabled the true live migration in libvirt. I think Josh will know.

Yes, it works with true live migration just fine (even with caching). You
can use virsh migrate or even do it through the virt-manager gui.
Nova is just doing a check that doesn't make sense for volume-backed
instances with live migration there.

Unfortunately I haven't had the time to look at that problem in
nova since that message, but I suspect the same issue is still
there.

Josh

[1]
https://lists.launchpad.net/**openstack/msg15074.htmlhttps://lists.launchpad.net/openstack/msg15074.html

On Tue, Mar 12, 2013 at 4:13 PM, tra26 tr...@cs.drexel.edu wrote:

Travis,

The root disk (/var/lib/nova/instances) must be on shared storage to run
the live migrate. You should be able to run block migration (which is a
different form of the live-migration) that does not require shared
storage.

Take a look at: http://www.sebastien-han.fr/
blog/2012/07/12/openstack-**http://www.sebastien-han.fr/**blog/2012/07/12/openstack-**
block-migration/http://www.**sebastien-han.fr/blog/2012/07/**
12/openstack-block-migration/http://www.sebastien-han.fr/blog/2012/07/12/openstack-block-migration/
**for information regarding the block level migration.

-Trevor

On 2013-03-12 15:57, Travis Rhoden wrote:

Hey folks,

Im wondering if the following is possible. I have OpenStack (Folsom)
configured to boot VMs from volume using Ceph as a backend for Cinder
and Glance. My setup pretty much follows the Ceph guides for this
verbatim. Ive been using this setup for a while now, and its all

been really smooth.

However, I if I try do a live-migration, I get this:

RemoteError: Remote error: RemoteError Remote error:
InvalidSharedStorage_Remote vmhost3 is not on shared storage: Live
migration can not be used without shared storage.

One thing I am doing that may not be normal is that I am trying to do
the true live migration in KVM/libvirt, having set this in my
nova.conf:

live_migration_flag=VIR_MIGRATE_UNDEFINE_SOURCE,VIR_**
MIGRATE_PEER2PEER,VIR_MIGRATE_LIVE

Anyone know if this setup should work? Or if there is something I
should tweak to make it work? I was thinking that having the RBD
available via librbd at both the source and destination host makes
that storage shared storage. Perhaps not if I am trying to do live
migration? If I do OpenStacks normal live migration, it will pause

the VM and move it, which is less than ideal, but workable.

Thanks,

- Travis

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

73 matches

Mail list logo