date:20160615

[ceph-users] Query On Features

2016-06-15 Thread Srikar Somineni

Hi ,
   I am new to CEPH and was going through CEPH architecture and
feature documents.I believe that we don't have FDE encryption in CEPH .Can
we implement FDE encryption on CEPH? Also we are using Compaction on CEPH
monitor. Is it possible (or) do we need compaction on OSD's ? Also we are
supporting Strong Consistency on CEPH. Does Application Consistency (or)
Crash Consistency is covered by the Strong Consistency?
   Can anyone please answer my above queries. Thanks in advance.
Regards,
S.Srikar.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Ubuntu Trusty: kernel 3.13 vs kernel 4.2

2016-06-15 Thread Wido den Hollander


> Op 14 juni 2016 om 9:45 schreef "magicb...@hotmail.com" 
> :
> 
> 
> Hi list,
> 
> is there any opinion/recommendation regarding the ubuntu trusty 
> available kernels and Ceph(hammer, xfs)?
> Does kernel 4.2 worth installing from Ceph(hammer, xfs) perspective?
> 

I have seen some XFS issues with the 3.19 kernel and now most systems are 
running the 4.2 kernel I manage.

But there isn't a major benefit afaik, so if 3.13 works, keep it that way.

Wido

> Thanks :)
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Disk failures

2016-06-15 Thread Christian Balzer

Hello,

On Wed, 15 Jun 2016 08:48:57 +0200 Gandalf Corvotempesta wrote:

> Il 15 giu 2016 03:27, "Christian Balzer"  ha scritto:
> > And that makes deep-scrubbing something of quite limited value.
> 
> This is not true.

Did you read what I and Jan wrote?

> If you checksum *before* writing to disk (so when data is still in ram)
> then when reading back from disk you could do the checksum verification
> and if doesn't match you can heal from the other nodes
>
Very true and Ceph does all its writes from memory with regards to client
writes.
However Ceph doesn't do any checksum verifications on reads, so
potentially corrupted data can and will be served to the clients.

The only time the "healing" can happen is during deep-scrubs (if the data
corruption is persistent and not random) and that is of course possibly
long (up to week with default values) after that corrupt data has been
served to a client.

This is why people are using BTRFS and ZFS for filestore (despite the
problems they in turn create) and why the roadmap for bluestore has
checksums for reads on it as well (or so we've been told).

> Obviously you have to replicate directly from ram when bitrot couldn't
> happen.
> if you write to disk and then replicate the wrote data you could
> replicate a rotted value.

Which is exactly what could happen if you have any kind of data movement,
be it re-weighing OSDs, adding news ones, even the snapshot scenario Jan
mentioned. 
Because in these cases the data is read from the primary PG, the disk.

Christian
-- 
Christian BalzerNetwork/Systems Engineer
ch...@gol.com   Global OnLine Japan/Rakuten Communications
http://www.gol.com/
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Disk failures

2016-06-15 Thread Gandalf Corvotempesta

Il 15 giu 2016 09:42, "Christian Balzer"  ha scritto:
>
> This is why people are using BTRFS and ZFS for filestore (despite the
> problems they in turn create) and why the roadmap for bluestore has
> checksums for reads on it as well (or so we've been told).

Bitrot happens only on files?
what about rbd?
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Disk failures

2016-06-15 Thread Christian Balzer

On Wed, 15 Jun 2016 09:50:43 +0200 Gandalf Corvotempesta wrote:

> Il 15 giu 2016 09:42, "Christian Balzer"  ha scritto:
> >
> > This is why people are using BTRFS and ZFS for filestore (despite the
> > problems they in turn create) and why the roadmap for bluestore has
> > checksums for reads on it as well (or so we've been told).
> 
> Bitrot happens only on files?
It happens on storage devices.

> what about rbd?
You _do_ know how and where Ceph/RBD store their data?

Right now that's on disks/SSDs, formated with a file system.
And XFS or EXT4 will not protect against bitrot, while BTRFS and ZFS will.

See Bill Sharer's mail in this thread just a few hours ago.

Christian
-- 
Christian BalzerNetwork/Systems Engineer
ch...@gol.com   Global OnLine Japan/Rakuten Communications
http://www.gol.com/
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Ceph and Openstack

2016-06-15 Thread Iban Cabrillo

Hi Jon,
   Then this is not the issue, RDB was supported on KVM long time ago.

Cheers, I

2016-06-14 21:40 GMT+02:00 Jonathan D. Proulx :

> On Tue, Jun 14, 2016 at 05:48:11PM +0200, Iban Cabrillo wrote:
> :Hi Jon,
> :   Which is the hypervisor used for your Openstack deployment? We have
> lots
> :of troubles with xen until latest libvirt ( in libvirt < 1.3.2 package,
> RDB
> :driver was not supported )
>
> we're using kvm (Ubuntu 14.04, libvirt 1.2.12 )
>
> -Jon
>
> :
> :Regards, I
> :
> :2016-06-14 17:38 GMT+02:00 Jonathan D. Proulx :
> :
> :> On Tue, Jun 14, 2016 at 02:15:45PM +0200, Fran Barrera wrote:
> :> :Hi all,
> :> :
> :> :I have a problem integration Glance with Ceph.
> :> :
> :> :Openstack Mitaka
> :> :Ceph Jewel
> :> :
> :> :I've following the Ceph doc (
> :> :http://docs.ceph.com/docs/jewel/rbd/rbd-openstack/) but when I try to
> :> list
> :> :or create images, I have an error "Unable to establish connection to
> :> :http://IP:9292/v2/images";, and in the debug mode I can see this:
> :>
> :> This suggests that the Glance API service isn't running properly
> :> and probably isn't related to the rbd backend.
> :>
> :> You should be able to conncet to the glance API endpoint even if the
> :> ceph config is wrong (though you'd probably get 'internal server
> :> errors' if the storage backend isn't set up correctly).
> :>
> :> In either case you'll probably get better resonse on the openstack
> :> lists, but my suggestion would be to try the regular file backend to
> :> verify your glance setup is working, then switch to the rbd backend.
> :>
> :> -Jon
> :>
> :> :
> :> :2016-06-14 14:02:54.634 2256 DEBUG glance_store.capabilities [-] Store
> :> :glance_store._drivers.rbd.Store doesn't support updating dynamic
> storage
> :> :capabilities. Please overwrite 'update_capabilities' method of the
> store
> :> to
> :> :implement updating logics if needed. update_capabilities
> :> :/usr/lib/python2.7/dist-packages/glance_store/capabilities.py:98
> :> :
> :> :I've also tried to remove the database and populate again but the same
> :> :error.
> :> :Cinder with Ceph works correctly.
> :> :
> :> :Any suggestions?
> :> :
> :> :Thanks,
> :> :Fran.
> :>
> :> :___
> :> :ceph-users mailing list
> :> :ceph-users@lists.ceph.com
> :> :http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> :>
> :>
> :> --
> :> ___
> :> ceph-users mailing list
> :> ceph-users@lists.ceph.com
> :> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> :>
> :
> :
> :
> :--
>
> :
> :Iban Cabrillo Bartolome
> :Instituto de Fisica de Cantabria (IFCA)
> :Santander, Spain
> :Tel: +34942200969
> :PGP PUBLIC KEY:
> :http://pgp.mit.edu/pks/lookup?op=get&search=0xD9DF0B3D6C8C08AC
>
> :
> :Bertrand Russell:
> :*"El problema con el mundo es que los estúpidos están seguros de todo y
> los
> :inteligentes están llenos de dudas*"
>
> --
>



-- 

Iban Cabrillo Bartolome
Instituto de Fisica de Cantabria (IFCA)
Santander, Spain
Tel: +34942200969
PGP PUBLIC KEY:
http://pgp.mit.edu/pks/lookup?op=get&search=0xD9DF0B3D6C8C08AC

Bertrand Russell:
*"El problema con el mundo es que los estúpidos están seguros de todo y los
inteligentes están llenos de dudas*"
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] Failing upgrade from Hammer to Jewel on Centos 7

2016-06-15 Thread stephane.davy

Hello ceph users,

I've tried to upgrade from Hammer to Jewel on Centos 7. I've modified the 
ownership of /var/lib/ceph to ceph:ceph as described in upgrade notes, but the 
OSD doesn't start after upgrading. I realize that after boot the OSD are no 
longer mounted at boot time as they would thanks to udev.

-  If I run udevadm trigger to trigger again the udev rules I see 
nothing happening

-  If I run a command like: echo add > /sys/class/block/sdf1/uevent 
then it mount the partition and the OSD and works correctly

The distribution is Centos 7.2.1511

It seems that it looks like the same issue than here:
https://www.mail-archive.com/ceph-users@lists.ceph.com/msg28661.html

But I see no real solution here for Centos.

Any idea?

Thanks for your help,

Stéphane

_

Ce message et ses pieces jointes peuvent contenir des informations 
confidentielles ou privilegiees et ne doivent donc
pas etre diffuses, exploites ou copies sans autorisation. Si vous avez recu ce 
message par erreur, veuillez le signaler
a l'expediteur et le detruire ainsi que les pieces jointes. Les messages 
electroniques etant susceptibles d'alteration,
Orange decline toute responsabilite si ce message a ete altere, deforme ou 
falsifie. Merci.

This message and its attachments may contain confidential or privileged 
information that may be protected by law;
they should not be distributed, used or copied without authorisation.
If you have received this email in error, please notify the sender and delete 
this message and its attachments.
As emails may be altered, Orange is not liable for messages that have been 
modified, changed or falsified.
Thank you.

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Ceph and Openstack

2016-06-15 Thread Fran Barrera

Hi,

Thanks for all replies, I'm using KVM so there not be a problem. If I use
Glance without Ceph it's working well so the problem is with the
integration with Ceph. The service is running but the glance-api not
appear's work well and gives the error "Unable to connect.."

Regards,
Fran.

2016-06-15 10:32 GMT+02:00 Iban Cabrillo :

> Hi Jon,
>Then this is not the issue, RDB was supported on KVM long time ago.
>
> Cheers, I
>
> 2016-06-14 21:40 GMT+02:00 Jonathan D. Proulx :
>
>> On Tue, Jun 14, 2016 at 05:48:11PM +0200, Iban Cabrillo wrote:
>> :Hi Jon,
>> :   Which is the hypervisor used for your Openstack deployment? We have
>> lots
>> :of troubles with xen until latest libvirt ( in libvirt < 1.3.2 package,
>> RDB
>> :driver was not supported )
>>
>> we're using kvm (Ubuntu 14.04, libvirt 1.2.12 )
>>
>> -Jon
>>
>> :
>> :Regards, I
>> :
>> :2016-06-14 17:38 GMT+02:00 Jonathan D. Proulx :
>> :
>> :> On Tue, Jun 14, 2016 at 02:15:45PM +0200, Fran Barrera wrote:
>> :> :Hi all,
>> :> :
>> :> :I have a problem integration Glance with Ceph.
>> :> :
>> :> :Openstack Mitaka
>> :> :Ceph Jewel
>> :> :
>> :> :I've following the Ceph doc (
>> :> :http://docs.ceph.com/docs/jewel/rbd/rbd-openstack/) but when I try to
>> :> list
>> :> :or create images, I have an error "Unable to establish connection to
>> :> :http://IP:9292/v2/images";, and in the debug mode I can see this:
>> :>
>> :> This suggests that the Glance API service isn't running properly
>> :> and probably isn't related to the rbd backend.
>> :>
>> :> You should be able to conncet to the glance API endpoint even if the
>> :> ceph config is wrong (though you'd probably get 'internal server
>> :> errors' if the storage backend isn't set up correctly).
>> :>
>> :> In either case you'll probably get better resonse on the openstack
>> :> lists, but my suggestion would be to try the regular file backend to
>> :> verify your glance setup is working, then switch to the rbd backend.
>> :>
>> :> -Jon
>> :>
>> :> :
>> :> :2016-06-14 14:02:54.634 2256 DEBUG glance_store.capabilities [-] Store
>> :> :glance_store._drivers.rbd.Store doesn't support updating dynamic
>> storage
>> :> :capabilities. Please overwrite 'update_capabilities' method of the
>> store
>> :> to
>> :> :implement updating logics if needed. update_capabilities
>> :> :/usr/lib/python2.7/dist-packages/glance_store/capabilities.py:98
>> :> :
>> :> :I've also tried to remove the database and populate again but the same
>> :> :error.
>> :> :Cinder with Ceph works correctly.
>> :> :
>> :> :Any suggestions?
>> :> :
>> :> :Thanks,
>> :> :Fran.
>> :>
>> :> :___
>> :> :ceph-users mailing list
>> :> :ceph-users@lists.ceph.com
>> :> :http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>> :>
>> :>
>> :> --
>> :> ___
>> :> ceph-users mailing list
>> :> ceph-users@lists.ceph.com
>> :> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>> :>
>> :
>> :
>> :
>> :--
>>
>> :
>> :Iban Cabrillo Bartolome
>> :Instituto de Fisica de Cantabria (IFCA)
>> :Santander, Spain
>> :Tel: +34942200969
>> :PGP PUBLIC KEY:
>> :http://pgp.mit.edu/pks/lookup?op=get&search=0xD9DF0B3D6C8C08AC
>>
>> :
>> :Bertrand Russell:
>> :*"El problema con el mundo es que los estúpidos están seguros de todo y
>> los
>> :inteligentes están llenos de dudas*"
>>
>> --
>>
>
>
>
> --
>
> 
> Iban Cabrillo Bartolome
> Instituto de Fisica de Cantabria (IFCA)
> Santander, Spain
> Tel: +34942200969
> PGP PUBLIC KEY:
> http://pgp.mit.edu/pks/lookup?op=get&search=0xD9DF0B3D6C8C08AC
>
> 
> Bertrand Russell:
> *"El problema con el mundo es que los estúpidos están seguros de todo y
> los inteligentes están llenos de dudas*"
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Ceph and Openstack

2016-06-15 Thread Fran Barrera

Hello,

The problem was in the Ceph documentation. "default_store = rbd" must be in
the "glance_store" section and not in the default section for Openstack
Mitaka and Ceph Jewel.

Thanks,
Fran.

2016-06-15 11:54 GMT+02:00 Fran Barrera :

> Hi,
>
> Thanks for all replies, I'm using KVM so there not be a problem. If I use
> Glance without Ceph it's working well so the problem is with the
> integration with Ceph. The service is running but the glance-api not
> appear's work well and gives the error "Unable to connect.."
>
> Regards,
> Fran.
>
> 2016-06-15 10:32 GMT+02:00 Iban Cabrillo :
>
>> Hi Jon,
>>Then this is not the issue, RDB was supported on KVM long time ago.
>>
>> Cheers, I
>>
>> 2016-06-14 21:40 GMT+02:00 Jonathan D. Proulx :
>>
>>> On Tue, Jun 14, 2016 at 05:48:11PM +0200, Iban Cabrillo wrote:
>>> :Hi Jon,
>>> :   Which is the hypervisor used for your Openstack deployment? We have
>>> lots
>>> :of troubles with xen until latest libvirt ( in libvirt < 1.3.2 package,
>>> RDB
>>> :driver was not supported )
>>>
>>> we're using kvm (Ubuntu 14.04, libvirt 1.2.12 )
>>>
>>> -Jon
>>>
>>> :
>>> :Regards, I
>>> :
>>> :2016-06-14 17:38 GMT+02:00 Jonathan D. Proulx :
>>> :
>>> :> On Tue, Jun 14, 2016 at 02:15:45PM +0200, Fran Barrera wrote:
>>> :> :Hi all,
>>> :> :
>>> :> :I have a problem integration Glance with Ceph.
>>> :> :
>>> :> :Openstack Mitaka
>>> :> :Ceph Jewel
>>> :> :
>>> :> :I've following the Ceph doc (
>>> :> :http://docs.ceph.com/docs/jewel/rbd/rbd-openstack/) but when I try
>>> to
>>> :> list
>>> :> :or create images, I have an error "Unable to establish connection to
>>> :> :http://IP:9292/v2/images";, and in the debug mode I can see this:
>>> :>
>>> :> This suggests that the Glance API service isn't running properly
>>> :> and probably isn't related to the rbd backend.
>>> :>
>>> :> You should be able to conncet to the glance API endpoint even if the
>>> :> ceph config is wrong (though you'd probably get 'internal server
>>> :> errors' if the storage backend isn't set up correctly).
>>> :>
>>> :> In either case you'll probably get better resonse on the openstack
>>> :> lists, but my suggestion would be to try the regular file backend to
>>> :> verify your glance setup is working, then switch to the rbd backend.
>>> :>
>>> :> -Jon
>>> :>
>>> :> :
>>> :> :2016-06-14 14:02:54.634 2256 DEBUG glance_store.capabilities [-]
>>> Store
>>> :> :glance_store._drivers.rbd.Store doesn't support updating dynamic
>>> storage
>>> :> :capabilities. Please overwrite 'update_capabilities' method of the
>>> store
>>> :> to
>>> :> :implement updating logics if needed. update_capabilities
>>> :> :/usr/lib/python2.7/dist-packages/glance_store/capabilities.py:98
>>> :> :
>>> :> :I've also tried to remove the database and populate again but the
>>> same
>>> :> :error.
>>> :> :Cinder with Ceph works correctly.
>>> :> :
>>> :> :Any suggestions?
>>> :> :
>>> :> :Thanks,
>>> :> :Fran.
>>> :>
>>> :> :___
>>> :> :ceph-users mailing list
>>> :> :ceph-users@lists.ceph.com
>>> :> :http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>> :>
>>> :>
>>> :> --
>>> :> ___
>>> :> ceph-users mailing list
>>> :> ceph-users@lists.ceph.com
>>> :> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>> :>
>>> :
>>> :
>>> :
>>> :--
>>>
>>> :
>>> :Iban Cabrillo Bartolome
>>> :Instituto de Fisica de Cantabria (IFCA)
>>> :Santander, Spain
>>> :Tel: +34942200969
>>> :PGP PUBLIC KEY:
>>> :http://pgp.mit.edu/pks/lookup?op=get&search=0xD9DF0B3D6C8C08AC
>>>
>>> :
>>> :Bertrand Russell:
>>> :*"El problema con el mundo es que los estúpidos están seguros de todo y
>>> los
>>> :inteligentes están llenos de dudas*"
>>>
>>> --
>>>
>>
>>
>>
>> --
>>
>> 
>> Iban Cabrillo Bartolome
>> Instituto de Fisica de Cantabria (IFCA)
>> Santander, Spain
>> Tel: +34942200969
>> PGP PUBLIC KEY:
>> http://pgp.mit.edu/pks/lookup?op=get&search=0xD9DF0B3D6C8C08AC
>>
>> 
>> Bertrand Russell:
>> *"El problema con el mundo es que los estúpidos están seguros de todo y
>> los inteligentes están llenos de dudas*"
>>
>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Disk failures

2016-06-15 Thread Gandalf Corvotempesta

Il 15 giu 2016 09:58, "Christian Balzer"  ha scritto
> You _do_ know how and where Ceph/RBD store their data?
>
> Right now that's on disks/SSDs, formated with a file system.
> And XFS or EXT4 will not protect against bitrot, while BTRFS and ZFS will.
>

Wait, I'm new to ceph and some things are not clear to me.
Even using rbd for block devices ceph still write everything as a file on a
filesystem?

I've always thought that only cephfs was using files and rbd was storing
directly to block devices with no fs under it.

Pretty similiar to glusterfs in this.

My mistake.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Ceph and Storage Management with openATTIC (was : June Ceph Tech Talks)

2016-06-15 Thread Lenz Grimmer

Hi there,

On 06/06/2016 09:56 PM, Patrick McGarry wrote:

> So we have gone from not having a Ceph Tech Talk this month…to having
> two! As a part of our regularly scheduled Ceph Tech Talk series, Lenz
> Grimmer from OpenATTIC will be talking about the architecture of their
> management/GUI solution, which will also include a live demo.

Thanks a lot for giving me a chance to talk about our project, Patrick,
much appreciated!

For those of you who want to learn more about openATTIC
(http://openattic.org) in advance or can't make it to the tech talk, I'd
like to give you a quick introduction and some pointers to our project.

openATTIC was started as a "traditional" storage management system
(CIFS/NFS, iSCSI/FC, DRBD, Btrfs, ZFS) around 5 years ago. It supports
managing multiple nodes and has monitoring of the storage resources
built-in (using Nagios/Icinga and PNP4Nagios for storing performance
data in RRDs). The openATTIC Backend is based on Python/Django and we
added a RESTful API and WebUI based on AngularJS and Bootstrap with
version 2.0, which is currently under development.

We started adding Ceph support in early 2015, as an answer to users that
were facing data growth at a faster pace than what a traditional storage
system could keep up with. At first, we added the capability to map and
share RBDs as block volumes as well as providing a simple CRUSH map
editor. We started collaborating with SUSE on the Ceph features at the
beginning of the year and have made good progress on extending the
functionality since then.

In this stage, we use the librados and librbd Python bindings to
communicate with the Ceph cluster. But we're also keeping an eye on the
development of ceph-mgr that is currently being worked on.

For additional remote node management and monitoring features, we intend
to use Salt and collectd. Currently, our focus is on building a
dashboard to monitor the cluster's performance and health (making use of
the D3 JavaScript library for the graphs) as well as creating the WebUI
views that display the cluster's various objects like Pools, OSDs, etc.

The openATTIC development takes place in the open: the code is hosted in
a Mercurial repo on BitBucket [1], all issues (bugs and feature specs)
are tracked in a public Jira instance [2]. New code is submitted via
pull requests and we require code reviews before it is merged.
We also have an extensive test suite that performs tests both on the
REST API level as well as over the WebUI.

The Ceph functionality is still under development [3], and right now the
WebUI does not fully utilize everything the API provides [4], but we'd
like to invite you to take a look at what we have so far and let us know
if we're heading in the right direction with this.

Our intention is to provide a Ceph Management and Monitoring tool that
administrators *want* to use and that makes sense. So any feedback or
comments are welcome and appreciated [5].

Thanks!

Lenz

[1] https://bitbucket.org/openattic/openattic/
[2] https://tracker.openattic.org/
[3]
https://wiki.openattic.org/display/OP/openATTIC+Ceph+Management+Roadmap+and+Implementation+Plan
[4] https://wiki.openattic.org/display/OP/openATTIC+Ceph+REST+API+overview
[5] http://openattic.org/get-involved.html

signature.asc
Description: OpenPGP digital signature
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Ceph file change monitor

2016-06-15 Thread siva kumar

Yes , We need to similar to inotify/fanotity .

came through link
http://docs.ceph.com/docs/master/dev/osd_internals/watch_notify/?highlight=notify#watch-notify

Just want to know if i can use this ?

If yes means how we have to use ?

Thanks,
Siva

On Thu, Jun 9, 2016 at 6:06 PM, Anand Bhat  wrote:

> I think you are looking for inotify/fanotify events for Ceph. Usually
> these are implemented for local file system. Ceph being a networked file
> system, it will not be easy to implement  and will involve network traffic
> to generate events.
>
> Not sure it is in the plan though.
>
> Regards,
> Anand
>
> On Wed, Jun 8, 2016 at 2:46 PM, John Spray  wrote:
>
>> On Wed, Jun 8, 2016 at 8:40 AM, siva kumar <85s...@gmail.com> wrote:
>> > Dear Team,
>> >
>> > We are using ceph storage & cephFS for mounting .
>> >
>> > Our configuration :
>> >
>> > 3 osd
>> > 3 monitor
>> > 4 clients .
>> > ceph version 0.94.5 (9764da52395923e0b32908d83a9f7304401fee43)
>> >
>> > We would like to get file change notifications like what is the event
>> > (ADDED, MODIFIED,DELETED) and for which file the event has occurred.
>> These
>> > notifications should be sent to our server.
>> > How to get these notifications?
>>
>> This isn't a feature that CephFS has right now.  Still, I would be
>> interested to know what protocol/format your server would consume
>> these kinds of notifications in?
>>
>> John
>>
>> > Ultimately we would like to add our custom file watch notification
>> hooks to
>> > ceph so that we can handle this notifications by our self .
>> >
>> > Additional Info :
>> >
>> > [test@ceph-zclient1 ~]$ ceph -s
>> >
>> >> cluster a8c92ae6-6842-4fa2-bfc9-8cdefd28df5c
>> >
>> >  health HEALTH_WARN
>> > mds0: ceph-client1 failing to respond to cache pressure
>> > mds0: ceph-client2 failing to respond to cache pressure
>> > mds0: ceph-client3 failing to respond to cache pressure
>> > mds0: ceph-client4 failing to respond to cache pressure
>> >  monmap e1: 3 mons at
>> >
>> {ceph-zadmin=xxx.xxx.xxx.xxx:6789/0,ceph-zmonitor=xxx.xxx.xxx.xxx:6789/0,ceph-zmonitor1=xxx.xxx.xxx.xxx:6789/0}
>> > election epoch 16, quorum 0,1,2
>> > ceph-zadmin,ceph-zmonitor1,ceph-zmonitor
>> >  mdsmap e52184: 1/1/1 up {0=ceph-zstorage1=up:active}
>> >  osdmap e3278: 3 osds: 3 up, 3 in
>> >   pgmap v5068139: 384 pgs, 3 pools, 518 GB data, 7386 kobjects
>> > 1149 GB used, 5353 GB / 6503 GB avail
>> >  384 active+clean
>> >
>> >   client io 1259 B/s rd, 179 kB/s wr, 11 op/s
>> >
>> >
>> >
>> > Thanks,
>> > S.Sivakumar
>> >
>> >
>> >
>> >
>> > ___
>> > ceph-users mailing list
>> > ceph-users@lists.ceph.com
>> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>> >
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>
>
>
> --
>
> 
> Never say never.
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] Is Dynamic Cache tiering supported in Jewel

2016-06-15 Thread Venkata Manojawa Paritala

Hi,

We are working to try cache tiering in Ceph and would like to know if this
can be attempted dynamically - basically add cache pool to another pool
which is already having IO.

Thank you for your response in advance.

- Manoj
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] Ceph RBD object-map and discard in VM

2016-06-15 Thread list


Hello guys,

We are currently testing Ceph Jewel with object-map feature enabled:

rbd image 'disk-22920':
size 102400 MB in 25600 objects
order 22 (4096 kB objects)
block_name_prefix: rbd_data.7cfa2238e1f29
format: 2
features: layering, exclusive-lock, object-map, fast-diff, deep-flatten
flags:

We use this RBD as disk for a kvm virtual machine with virtio-scsi and 
discard=unmap. We noticed the following paremeters in /sys/block:


# cat /sys/block/sda/queue/discard_*
4096
1073741824
0 <- discard_zeroes_data

While trying to do a mkfs.ext4 on the disk in VM we noticed a low 
performance with using discard.


mkfs.ext4 -E nodiscard /dev/sda1 - tooks 5 seconds to complete
mkfs.ext4 -E discard /dev/sda1 - tooks around 3 monutes

When disabling the object-map the mkfs with discard tooks just 5 
seconds.


Do you have any idea what might cause this issue?

Kernel: 4.2.0-35-generic #40~14.04.1-Ubuntu
Ceph: 10.2.0
Libvirt: 1.3.1
QEMU: 2.5.0

Thanks!

Best regards,
Jonas
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Failing upgrade from Hammer to Jewel on Centos 7

2016-06-15 Thread Martin Palma

Hi Stéphane,

We had the same issue:
https://www.mail-archive.com/ceph-users%40lists.ceph.com/msg27507.html

Since then we have applied the fix suggested by Dan by simple adding
"ceph-disk activate-all" to rc.local

Best,
Martin

On Wed, Jun 15, 2016 at 10:39 AM,   wrote:
> Hello ceph users,
>
>
>
> I’ve tried to upgrade from Hammer to Jewel on Centos 7. I’ve modified the
> ownership of /var/lib/ceph to ceph:ceph as described in upgrade notes, but
> the OSD doesn’t start after upgrading. I realize that after boot the OSD are
> no longer mounted at boot time as they would thanks to udev.
>
> -  If I run udevadm trigger to trigger again the udev rules I see
> nothing happening
>
> -  If I run a command like: echo add > /sys/class/block/sdf1/uevent
> then it mount the partition and the OSD and works correctly
>
>
>
> The distribution is Centos 7.2.1511
>
>
>
> It seems that it looks like the same issue than here:
>
> https://www.mail-archive.com/ceph-users@lists.ceph.com/msg28661.html
>
>
>
> But I see no real solution here for Centos.
>
>
>
> Any idea?
>
>
>
> Thanks for your help,
>
>
>
> Stéphane
>
> _
>
> Ce message et ses pieces jointes peuvent contenir des informations
> confidentielles ou privilegiees et ne doivent donc
> pas etre diffuses, exploites ou copies sans autorisation. Si vous avez recu
> ce message par erreur, veuillez le signaler
> a l'expediteur et le detruire ainsi que les pieces jointes. Les messages
> electroniques etant susceptibles d'alteration,
> Orange decline toute responsabilite si ce message a ete altere, deforme ou
> falsifie. Merci.
>
> This message and its attachments may contain confidential or privileged
> information that may be protected by law;
> they should not be distributed, used or copied without authorisation.
> If you have received this email in error, please notify the sender and
> delete this message and its attachments.
> As emails may be altered, Orange is not liable for messages that have been
> modified, changed or falsified.
> Thank you.
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Disk failures

2016-06-15 Thread Christian Balzer

On Wed, 15 Jun 2016 12:46:49 +0200 Gandalf Corvotempesta wrote:

> Il 15 giu 2016 09:58, "Christian Balzer"  ha scritto
> > You _do_ know how and where Ceph/RBD store their data?
> >
> > Right now that's on disks/SSDs, formated with a file system.
> > And XFS or EXT4 will not protect against bitrot, while BTRFS and ZFS
> > will.
> >
> 
> Wait, I'm new to ceph and some things are not clear to me.
> Even using rbd for block devices ceph still write everything as a file
> on a filesystem?
> 
Ceph RADOS, on which all other views (RBD/RADOSGW/CephFS) are based writes
things to objects, which are by default 4MB in size and files on the OSDs,
which in turn are currently filesystems.
 
> I've always thought that only cephfs was using files and rbd was storing
> directly to block devices with no fs under it.
>
That's the future with bluestore, which is K/V based storage.
 
> Pretty similiar to glusterfs in this.
> 
> My mistake.

Christian
-- 
Christian BalzerNetwork/Systems Engineer
ch...@gol.com   Global OnLine Japan/Rakuten Communications
http://www.gol.com/
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] OSDs stuck in booting state after redeploying

2016-06-15 Thread Kostis Fardelas

Hello,
in the process of redeploying some OSDs in our cluster, after
destroying one of them (down, out, remove from crushmap) and trying to
redeploy it (crush add ,start), we reach a state where the OSD gets
stuck at booting state:
root@staging-rd0-02:~# ceph daemon osd.12 status
{ "cluster_fsid": "XXX",
  "osd_fsid": "XX",
  "whoami": 12,
  "state": "booting",
  "oldest_map": 150201,
  "newest_map": 150779,
  "num_pgs": 0}

No flags that could prevent the OSD to get up is in place. The OSD
never gets marked as up in 'ceph osd tree' and never gets in. If I try
to manual get it in, it gets out after a while. The cluster OSD map
keeps going forward, but the OSD cannot catch-up of course. I started
the OSD with debugging options:
debug osd = 20
debug filestore = 20
debug journal = 20
debug monc = 20
debug ms = 1

and what I see is contiuning OSD logs of this kind:
2016-06-15 16:39:33.876339 7f0256b61700 10 osd.12 150798 do_waiters -- start
2016-06-15 16:39:33.876343 7f0256b61700 10 osd.12 150798 do_waiters -- finish
2016-06-15 16:39:34.390560 7f022e2ee700 20 osd.12 150798
update_osd_stat osd_stat(59384 kB used, 558 GB avail, 558 GB total,
peers []/[] op hist [])
2016-06-15 16:39:34.390622 7f022e2ee700  5 osd.12 150798 heartbeat:
osd_stat(59384 kB used, 558 GB avail, 558 GB total, peers []/[] op
hist [])
2016-06-15 16:39:34.876526 7f0256b61700  5 osd.12 150798 tick
2016-06-15 16:39:34.876561 7f0256b61700 10 osd.12 150798 do_waiters -- start
2016-06-15 16:39:34.876565 7f0256b61700 10 osd.12 150798 do_waiters -- finish
2016-06-15 16:39:35.876729 7f0256b61700  5 osd.12 150798 tick
2016-06-15 16:39:35.876762 7f0256b61700 10 osd.12 150798 do_waiters -- start
2016-06-15 16:39:35.876766 7f0256b61700 10 osd.12 150798 do_waiters -- finish
2016-06-15 16:39:36.646355 7f025535e700 20
filestore(/rados/staging-rd0-02-12) sync_entry woke after 30.000161
2016-06-15 16:39:36.646421 7f025535e700 20
filestore(/rados/staging-rd0-02-12) sync_entry waiting for
max_interval 30.00
2016-06-15 16:39:36.876917 7f0256b61700  5 osd.12 150798 tick
2016-06-15 16:39:36.876949 7f0256b61700 10 osd.12 150798 do_waiters -- start
2016-06-15 16:39:36.876953 7f0256b61700 10 osd.12 150798 do_waiters -- finish
2016-06-15 16:39:37.877112 7f0256b61700  5 osd.12 150798 tick
2016-06-15 16:39:37.877142 7f0256b61700 10 osd.12 150798 do_waiters -- start
2016-06-15 16:39:37.877147 7f0256b61700 10 osd.12 150798 do_waiters -- finish
2016-06-15 16:39:38.877298 7f0256b61700  5 osd.12 150798 tick
2016-06-15 16:39:38.877327 7f0256b61700 10 osd.12 150798 do_waiters -- start
2016-06-15 16:39:38.877331 7f0256b61700 10 osd.12 150798 do_waiters -- finish

Is there a solution for this problem? Known bug? We are on firefly
(0.80.11) and wanted to do some maintenance before going to hammer,
but now we are somewhat stuck.

Best regards,
Kostis
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Is Dynamic Cache tiering supported in Jewel

2016-06-15 Thread Christian Balzer

On Wed, 15 Jun 2016 18:39:14 +0530 Venkata Manojawa Paritala wrote:

> Hi,
> 
> We are working to try cache tiering in Ceph and would like to know if
> this can be attempted dynamically - basically add cache pool to another
> pool which is already having IO.
> 
Yes.

Read the relevant Ceph documentation, my 
"Cache tier operation clarifications" thread and if you're using Jewel the
current "strange cache tier behaviuor with cephfs" (sic) thread.

Christian

> Thank you for your response in advance.
> 
> - Manoj


-- 
Christian BalzerNetwork/Systems Engineer
ch...@gol.com   Global OnLine Japan/Rakuten Communications
http://www.gol.com/
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Failing upgrade from Hammer to Jewel on Centos 7

2016-06-15 Thread stephane.davy

Thanks Martin

Did you also enable ceph-osd@ with i taking all the OSD numbers on your 
server?

Stéphane

-Original Message-
From: m...@palma.bz [mailto:m...@palma.bz] On Behalf Of Martin Palma
Sent: Wednesday, June 15, 2016 16:03
To: DAVY Stephane OBS/OCB
Cc: ceph-users@lists.ceph.com
Subject: Re: [ceph-users] Failing upgrade from Hammer to Jewel on Centos 7

Hi Stéphane,

We had the same issue:
https://www.mail-archive.com/ceph-users%40lists.ceph.com/msg27507.html

Since then we have applied the fix suggested by Dan by simple adding "ceph-disk 
activate-all" to rc.local

Best,
Martin

On Wed, Jun 15, 2016 at 10:39 AM,   wrote:
> Hello ceph users,
>
>
>
> I’ve tried to upgrade from Hammer to Jewel on Centos 7. I’ve modified 
> the ownership of /var/lib/ceph to ceph:ceph as described in upgrade 
> notes, but the OSD doesn’t start after upgrading. I realize that after 
> boot the OSD are no longer mounted at boot time as they would thanks to udev.
>
> -  If I run udevadm trigger to trigger again the udev rules I see
> nothing happening
>
> -  If I run a command like: echo add > /sys/class/block/sdf1/uevent
> then it mount the partition and the OSD and works correctly
>
>
>
> The distribution is Centos 7.2.1511
>
>
>
> It seems that it looks like the same issue than here:
>
> https://www.mail-archive.com/ceph-users@lists.ceph.com/msg28661.html
>
>
>
> But I see no real solution here for Centos.
>
>
>
> Any idea?
>
>
>
> Thanks for your help,
>
>
>
> Stéphane
>
> __
> ___
>
> Ce message et ses pieces jointes peuvent contenir des informations 
> confidentielles ou privilegiees et ne doivent donc pas etre diffuses, 
> exploites ou copies sans autorisation. Si vous avez recu ce message 
> par erreur, veuillez le signaler a l'expediteur et le detruire ainsi 
> que les pieces jointes. Les messages electroniques etant susceptibles 
> d'alteration, Orange decline toute responsabilite si ce message a ete 
> altere, deforme ou falsifie. Merci.
>
> This message and its attachments may contain confidential or 
> privileged information that may be protected by law; they should not 
> be distributed, used or copied without authorisation.
> If you have received this email in error, please notify the sender and 
> delete this message and its attachments.
> As emails may be altered, Orange is not liable for messages that have 
> been modified, changed or falsified.
> Thank you.
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>

_

Ce message et ses pieces jointes peuvent contenir des informations 
confidentielles ou privilegiees et ne doivent donc
pas etre diffuses, exploites ou copies sans autorisation. Si vous avez recu ce 
message par erreur, veuillez le signaler
a l'expediteur et le detruire ainsi que les pieces jointes. Les messages 
electroniques etant susceptibles d'alteration,
Orange decline toute responsabilite si ce message a ete altere, deforme ou 
falsifie. Merci.

This message and its attachments may contain confidential or privileged 
information that may be protected by law;
they should not be distributed, used or copied without authorisation.
If you have received this email in error, please notify the sender and delete 
this message and its attachments.
As emails may be altered, Orange is not liable for messages that have been 
modified, changed or falsified.
Thank you.

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Failing upgrade from Hammer to Jewel on Centos 7

2016-06-15 Thread Martin Palma

No... we simple add the "ceph-disk activate-all" to rc.local.

Best,
Martin

On Wed, Jun 15, 2016 at 4:24 PM,   wrote:
> Thanks Martin
>
> Did you also enable ceph-osd@ with i taking all the OSD numbers on your 
> server?
>
> Stéphane
>
> -Original Message-
> From: m...@palma.bz [mailto:m...@palma.bz] On Behalf Of Martin Palma
> Sent: Wednesday, June 15, 2016 16:03
> To: DAVY Stephane OBS/OCB
> Cc: ceph-users@lists.ceph.com
> Subject: Re: [ceph-users] Failing upgrade from Hammer to Jewel on Centos 7
>
> Hi Stéphane,
>
> We had the same issue:
> https://www.mail-archive.com/ceph-users%40lists.ceph.com/msg27507.html
>
> Since then we have applied the fix suggested by Dan by simple adding 
> "ceph-disk activate-all" to rc.local
>
> Best,
> Martin
>
> On Wed, Jun 15, 2016 at 10:39 AM,   wrote:
>> Hello ceph users,
>>
>>
>>
>> I’ve tried to upgrade from Hammer to Jewel on Centos 7. I’ve modified
>> the ownership of /var/lib/ceph to ceph:ceph as described in upgrade
>> notes, but the OSD doesn’t start after upgrading. I realize that after
>> boot the OSD are no longer mounted at boot time as they would thanks to udev.
>>
>> -  If I run udevadm trigger to trigger again the udev rules I see
>> nothing happening
>>
>> -  If I run a command like: echo add > /sys/class/block/sdf1/uevent
>> then it mount the partition and the OSD and works correctly
>>
>>
>>
>> The distribution is Centos 7.2.1511
>>
>>
>>
>> It seems that it looks like the same issue than here:
>>
>> https://www.mail-archive.com/ceph-users@lists.ceph.com/msg28661.html
>>
>>
>>
>> But I see no real solution here for Centos.
>>
>>
>>
>> Any idea?
>>
>>
>>
>> Thanks for your help,
>>
>>
>>
>> Stéphane
>>
>> __
>> ___
>>
>> Ce message et ses pieces jointes peuvent contenir des informations
>> confidentielles ou privilegiees et ne doivent donc pas etre diffuses,
>> exploites ou copies sans autorisation. Si vous avez recu ce message
>> par erreur, veuillez le signaler a l'expediteur et le detruire ainsi
>> que les pieces jointes. Les messages electroniques etant susceptibles
>> d'alteration, Orange decline toute responsabilite si ce message a ete
>> altere, deforme ou falsifie. Merci.
>>
>> This message and its attachments may contain confidential or
>> privileged information that may be protected by law; they should not
>> be distributed, used or copied without authorisation.
>> If you have received this email in error, please notify the sender and
>> delete this message and its attachments.
>> As emails may be altered, Orange is not liable for messages that have
>> been modified, changed or falsified.
>> Thank you.
>>
>>
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>
> _
>
> Ce message et ses pieces jointes peuvent contenir des informations 
> confidentielles ou privilegiees et ne doivent donc
> pas etre diffuses, exploites ou copies sans autorisation. Si vous avez recu 
> ce message par erreur, veuillez le signaler
> a l'expediteur et le detruire ainsi que les pieces jointes. Les messages 
> electroniques etant susceptibles d'alteration,
> Orange decline toute responsabilite si ce message a ete altere, deforme ou 
> falsifie. Merci.
>
> This message and its attachments may contain confidential or privileged 
> information that may be protected by law;
> they should not be distributed, used or copied without authorisation.
> If you have received this email in error, please notify the sender and delete 
> this message and its attachments.
> As emails may be altered, Orange is not liable for messages that have been 
> modified, changed or falsified.
> Thank you.
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] Switches and latency

2016-06-15 Thread Gandalf Corvotempesta

Let's assume a fully redundant network.
We need 4 switches, 2 for the public network, 2 for the cluster network.

10GBase-T has higher latency than SFP+ but are also cheaper, as manu
new servers ha 10GBaseT integrated onboard and there is no need for
twinax cables or transceaver.

I think that low latency is needed on the public networks, where
servers has to read data from and not on the cluster network, used
only for replication purposes.

What do you think ? using SFP+ everywhere would increase the total cost.
And what about Infiniband (IPoIB) for the cluster network and SFP+ for
the public network? IB refurbished switches are cheaper than 10GB SFP+
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Failing upgrade from Hammer to Jewel on Centos 7

2016-06-15 Thread stephane.davy

Hmm, I wanted to run a final "yum update" on my server to have a clean 
situation, and it seems that Jewel 10.2.2 is around the corner. Unfortunately, 
the fix below doesn't work anymore, fstab is needed again :-(

Stéphane


-Original Message-
From: m...@palma.bz [mailto:m...@palma.bz] On Behalf Of Martin Palma
Sent: Wednesday, June 15, 2016 16:58
To: DAVY Stephane OBS/OCB
Cc: ceph-users@lists.ceph.com
Subject: Re: [ceph-users] Failing upgrade from Hammer to Jewel on Centos 7

No... we simple add the "ceph-disk activate-all" to rc.local.

Best,
Martin

_

Ce message et ses pieces jointes peuvent contenir des informations 
confidentielles ou privilegiees et ne doivent donc
pas etre diffuses, exploites ou copies sans autorisation. Si vous avez recu ce 
message par erreur, veuillez le signaler
a l'expediteur et le detruire ainsi que les pieces jointes. Les messages 
electroniques etant susceptibles d'alteration,
Orange decline toute responsabilite si ce message a ete altere, deforme ou 
falsifie. Merci.

This message and its attachments may contain confidential or privileged 
information that may be protected by law;
they should not be distributed, used or copied without authorisation.
If you have received this email in error, please notify the sender and delete 
this message and its attachments.
As emails may be altered, Orange is not liable for messages that have been 
modified, changed or falsified.
Thank you.

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] rgw bucket deletion woes

2016-06-15 Thread Pavan Rallabhandi

To update this thread, this is now fixed via 
https://github.com/ceph/ceph/pull/8679

Thanks!

From: Ben Hines mailto:bhi...@gmail.com>>
Date: Thursday, March 17, 2016 at 4:47 AM
To: Yehuda Sadeh-Weinraub mailto:yeh...@redhat.com>>
Cc: Pavan Rallabhandi 
mailto:prallabha...@walmartlabs.com>>, 
"ceph-us...@ceph.com" 
mailto:ceph-us...@ceph.com>>
Subject: Re: [ceph-users] rgw bucket deletion woes

We would be a big user of this. We delete large buckets often and it takes 
forever.

Though didn't I read that 'object expiration' support is on the near-term RGW 
roadmap? That may do what we want.. we're creating thousands of objects a day, 
and thousands of objects a day will be expiring, so RGW will need to handle.

-Ben

On Wed, Mar 16, 2016 at 9:40 AM, Yehuda Sadeh-Weinraub 
mailto:yeh...@redhat.com>> wrote:
On Tue, Mar 15, 2016 at 11:36 PM, Pavan Rallabhandi
mailto:prallabha...@walmartlabs.com>> wrote:
> Hi,
>
> I find this to be discussed here before, but couldn¹t find any solution
> hence the mail. In RGW, for a bucket holding objects in the range of ~
> millions, one can find it to take for ever to delete the bucket(via
> radosgw-admin). I understand the gc(and its parameters) that would reclaim
> the space eventually, but am looking more at the bucket deletion options
> that can possibly speed up the operation.
>
> I realize, currently rgw_remove_bucket(), does it 1000 objects at a time,
> serially. Wanted to know if there is a reason(that am possibly missing and
> discussed) for this to be left that way, otherwise I was considering a
> patch to make it happen better.
>

There is no real reason. You might want to have a version of that
command that doesn't schedule the removal to gc, but rather removes
all the object parts by itself. Otherwise, you're just going to flood
the gc. You'll need to iterate through all the objects, and for each
object you'll need to remove all of it's rados objects (starting with
the tail, then the head). Removal of each rados object can be done
asynchronously, but you'll need to throttle the operations, not send
everything to the osds at once (which will be impossible, as the
objecter will throttle the requests anyway, which will lead to a high
memory consumption).

Thanks,
Yehuda
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] which CentOS 7 kernel is compatible with jewel?

2016-06-15 Thread Michael Kuriger

Still not working with newer client.  But I get a different error now.

[root@test ~]# rbd ls
test1

[root@test ~]# rbd showmapped

[root@test ~]# rbd map test1
rbd: sysfs write failed
RBD image feature set mismatch. You can disable features unsupported by the 
kernel with "rbd feature disable".
In some cases useful info is found in syslog - try "dmesg | tail" or so.
rbd: map failed: (6) No such device or address

[root@test ~]# dmesg | tail
[52056.980880] rbd: loaded (major 251)
[52056.990399] libceph: mon0 10.1.77.165:6789 session established
[52056.992567] libceph: client4966 fsid f1466aaa-b08b-4103-ba7f-69165d675ba1
[52057.024913] rbd: image mk7193.np.wc1.yellowpages.com: image uses unsupported 
features: 0x3c
[52085.856605] libceph: mon0 10.1.77.165:6789 session established
[52085.858696] libceph: client4969 fsid f1466aaa-b08b-4103-ba7f-69165d675ba1
[52085.883350] rbd: image test1: image uses unsupported features: 0x3c
[52167.683868] libceph: mon1 10.1.78.75:6789 session established
[52167.685990] libceph: client4937 fsid f1466aaa-b08b-4103-ba7f-69165d675ba1
[52167.709796] rbd: image test1: image uses unsupported features: 0x3c

[root@test ~]# uname -a
Linux test.np.4.6.2-1.el7.elrepo.x86_64 #1 SMP Wed Jun 8 14:49:20 EDT 2016 
x86_64 x86_64 x86_64 GNU/Linux

[root@test ~]# ceph --version
ceph version 10.2.1 (3a66dd4f30852819c1bdaa8ec23c795d4ad77269)




 

 
Michael Kuriger
Sr. Unix Systems Engineer
* mk7...@yp.com |( 818-649-7235








On 6/14/16, 12:28 PM, "Ilya Dryomov"  wrote:

>On Mon, Jun 13, 2016 at 8:37 PM, Michael Kuriger  wrote:
>> I just realized that this issue is probably because I’m running jewel 10.2.1 
>> on the servers side, but accessing from a client running hammer 0.94.7 or 
>> infernalis 9.2.1
>>
>> Here is what happens if I run rbd ls from a client on infernalis.  I was 
>> testing this access since we weren’t planning on building rpms for Jewel on 
>> CentOS 6
>>
>> $ rbd ls
>> 2016-06-13 11:24:06.881591 7fe61e568700  0 -- :/3877046932 >> 
>> 10.1.77.165:6789/0 pipe(0x562ed3ea7550 sd=3 :0 s=1 pgs=0 cs=0 l=1 
>> c=0x562ed3ea0ac0).fault
>> 2016-06-13 11:24:09.882051 7fe61137f700  0 -- :/3877046932 >> 
>> 10.1.78.75:6789/0 pipe(0x7fe608000c00 sd=3 :0 s=1 pgs=0 cs=0 l=1 
>> c=0x7fe608004ef0).fault
>> 2016-06-13 11:24:12.882389 7fe61e568700  0 -- :/3877046932 >> 
>> 10.1.77.165:6789/0 pipe(0x7fe608008350 sd=4 :0 s=1 pgs=0 cs=0 l=1 
>> c=0x7fe60800c5f0).fault
>> 2016-06-13 11:24:18.883642 7fe61e568700  0 -- :/3877046932 >> 
>> 10.1.77.165:6789/0 pipe(0x7fe608008350 sd=3 :0 s=1 pgs=0 cs=0 l=1 
>> c=0x7fe6080078e0).fault
>> 2016-06-13 11:24:21.884259 7fe61137f700  0 -- :/3877046932 >> 
>> 10.1.78.75:6789/0 pipe(0x7fe608000c00 sd=4 :0 s=1 pgs=0 cs=0 l=1 
>> c=0x7fe608007110).fault
>
>Accessing jewel with older clients should work as long as you don't
>enable jewel tunables and such; the same goes for older kernels.  Can
>you do
>
>rbd --debug-ms=20 ls
>
>and attach the output?
>
>Thanks,
>
>Ilya
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] which CentOS 7 kernel is compatible with jewel?

2016-06-15 Thread Ilya Dryomov

On Wed, Jun 15, 2016 at 6:56 PM, Michael Kuriger  wrote:
> Still not working with newer client.  But I get a different error now.
>
> [root@test ~]# rbd ls
> test1
>
> [root@test ~]# rbd showmapped
>
> [root@test ~]# rbd map test1
> rbd: sysfs write failed
> RBD image feature set mismatch. You can disable features unsupported by the 
> kernel with "rbd feature disable".
> In some cases useful info is found in syslog - try "dmesg | tail" or so.
> rbd: map failed: (6) No such device or address
>
> [root@test ~]# dmesg | tail
> [52056.980880] rbd: loaded (major 251)
> [52056.990399] libceph: mon0 10.1.77.165:6789 session established
> [52056.992567] libceph: client4966 fsid f1466aaa-b08b-4103-ba7f-69165d675ba1
> [52057.024913] rbd: image mk7193.np.wc1.yellowpages.com: image uses 
> unsupported features: 0x3c
> [52085.856605] libceph: mon0 10.1.77.165:6789 session established
> [52085.858696] libceph: client4969 fsid f1466aaa-b08b-4103-ba7f-69165d675ba1
> [52085.883350] rbd: image test1: image uses unsupported features: 0x3c
> [52167.683868] libceph: mon1 10.1.78.75:6789 session established
> [52167.685990] libceph: client4937 fsid f1466aaa-b08b-4103-ba7f-69165d675ba1
> [52167.709796] rbd: image test1: image uses unsupported features: 0x3c
>
> [root@test ~]# uname -a
> Linux test.np.4.6.2-1.el7.elrepo.x86_64 #1 SMP Wed Jun 8 14:49:20 EDT 2016 
> x86_64 x86_64 x86_64 GNU/Linux
>
> [root@test ~]# ceph --version
> ceph version 10.2.1 (3a66dd4f30852819c1bdaa8ec23c795d4ad77269)

See http://www.spinics.net/lists/ceph-users/msg27787.html.

Thanks,

Ilya
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] which CentOS 7 kernel is compatible with jewel?

2016-06-15 Thread Michael Kuriger

Hmm, if I only enable layering features I can get it to work.  But I’m puzzled 
why all the (default) features are not working with my system fully up to date.

Any ideas?  Is this not yet supported?


[root@test ~]# rbd create `hostname` --size 102400 --image-feature layering
[
root@test ~]# rbd map `hostname`
/dev/rbd0

[root@test ~]# rbd info `hostname`
rbd image ‘test.np.wc1.example.com':
size 102400 MB in 25600 objects
order 22 (4096 kB objects)
block_name_prefix: rbd_data.13582ae8944a
format: 2
features: layering
flags: 




 

 
Michael Kuriger
Sr. Unix Systems Engineer
* mk7...@yp.com |( 818-649-7235








On 6/15/16, 9:56 AM, "ceph-users on behalf of Michael Kuriger" 
 wrote:

>Still not working with newer client.  But I get a different error now.
>
>
>
>[root@test ~]# rbd ls
>
>test1
>
>
>
>[root@test ~]# rbd showmapped
>
>
>
>[root@test ~]# rbd map test1
>
>rbd: sysfs write failed
>
>RBD image feature set mismatch. You can disable features unsupported by the 
>kernel with "rbd feature disable".
>
>In some cases useful info is found in syslog - try "dmesg | tail" or so.
>
>rbd: map failed: (6) No such device or address
>
>
>
>[root@test ~]# dmesg | tail
>
>[52056.980880] rbd: loaded (major 251)
>
>[52056.990399] libceph: mon0 10.1.77.165:6789 session established
>
>[52056.992567] libceph: client4966 fsid f1466aaa-b08b-4103-ba7f-69165d675ba1
>
>[52057.024913] rbd: image mk7193.np.wc1.yellowpages.com: image uses 
>unsupported features: 0x3c
>
>[52085.856605] libceph: mon0 10.1.77.165:6789 session established
>
>[52085.858696] libceph: client4969 fsid f1466aaa-b08b-4103-ba7f-69165d675ba1
>
>[52085.883350] rbd: image test1: image uses unsupported features: 0x3c
>
>[52167.683868] libceph: mon1 10.1.78.75:6789 session established
>
>[52167.685990] libceph: client4937 fsid f1466aaa-b08b-4103-ba7f-69165d675ba1
>
>[52167.709796] rbd: image test1: image uses unsupported features: 0x3c
>
>
>
>[root@test ~]# uname -a
>
>Linux test.np.4.6.2-1.el7.elrepo.x86_64 #1 SMP Wed Jun 8 14:49:20 EDT 2016 
>x86_64 x86_64 x86_64 GNU/Linux
>
>
>
>[root@test ~]# ceph --version
>
>ceph version 10.2.1 (3a66dd4f30852819c1bdaa8ec23c795d4ad77269)
>
>
>
>
>
>
>
>
>
> 
>
>
>
> 
>
>Michael Kuriger
>
>Sr. Unix Systems Engineer
>
>* mk7...@yp.com |( 818-649-7235
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>On 6/14/16, 12:28 PM, "Ilya Dryomov"  wrote:
>
>
>
>>On Mon, Jun 13, 2016 at 8:37 PM, Michael Kuriger  wrote:
>
>>> I just realized that this issue is probably because I’m running jewel 
>>> 10.2.1 on the servers side, but accessing from a client running hammer 
>>> 0.94.7 or infernalis 9.2.1
>
>>>
>
>>> Here is what happens if I run rbd ls from a client on infernalis.  I was 
>>> testing this access since we weren’t planning on building rpms for Jewel on 
>>> CentOS 6
>
>>>
>
>>> $ rbd ls
>
>>> 2016-06-13 11:24:06.881591 7fe61e568700  0 -- :/3877046932 >> 
>>> 10.1.77.165:6789/0 pipe(0x562ed3ea7550 sd=3 :0 s=1 pgs=0 cs=0 l=1 
>>> c=0x562ed3ea0ac0).fault
>
>>> 2016-06-13 11:24:09.882051 7fe61137f700  0 -- :/3877046932 >> 
>>> 10.1.78.75:6789/0 pipe(0x7fe608000c00 sd=3 :0 s=1 pgs=0 cs=0 l=1 
>>> c=0x7fe608004ef0).fault
>
>>> 2016-06-13 11:24:12.882389 7fe61e568700  0 -- :/3877046932 >> 
>>> 10.1.77.165:6789/0 pipe(0x7fe608008350 sd=4 :0 s=1 pgs=0 cs=0 l=1 
>>> c=0x7fe60800c5f0).fault
>
>>> 2016-06-13 11:24:18.883642 7fe61e568700  0 -- :/3877046932 >> 
>>> 10.1.77.165:6789/0 pipe(0x7fe608008350 sd=3 :0 s=1 pgs=0 cs=0 l=1 
>>> c=0x7fe6080078e0).fault
>
>>> 2016-06-13 11:24:21.884259 7fe61137f700  0 -- :/3877046932 >> 
>>> 10.1.78.75:6789/0 pipe(0x7fe608000c00 sd=4 :0 s=1 pgs=0 cs=0 l=1 
>>> c=0x7fe608007110).fault
>
>>
>
>>Accessing jewel with older clients should work as long as you don't
>
>>enable jewel tunables and such; the same goes for older kernels.  Can
>
>>you do
>
>>
>
>>rbd --debug-ms=20 ls
>
>>
>
>>and attach the output?
>
>>
>
>>Thanks,
>
>>
>
>>Ilya
>
>___
>ceph-users mailing list
>ceph-users@lists.ceph.com
>https://urldefense.proofpoint.com/v2/url?u=http-3A__lists.ceph.com_listinfo.cgi_ceph-2Dusers-2Dceph.com&d=CwIGaQ&c=lXkdEK1PC7UK9oKA-BBSI8p1AamzLOSncm6Vfn0C_UQ&r=CSYA9OS6Qd7fQySI2LDvlQ&m=87up-v2FeckUxAE8N-S9YPgbNa_YWlaYrV8efOsXeEs&s=k9uAOwbxafawJqm096e0GZqUPU2YbN3qm1GBol7ZvN4&e=
> 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] which CentOS 7 kernel is compatible with jewel?

2016-06-15 Thread Ilya Dryomov

On Wed, Jun 15, 2016 at 7:05 PM, Michael Kuriger  wrote:
> Hmm, if I only enable layering features I can get it to work.  But I’m 
> puzzled why all the (default) features are not working with my system fully 
> up to date.
>
> Any ideas?  Is this not yet supported?

Yes, these features aren't yet supported by the kernel client.

Thanks,

Ilya
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] OSDs not coming up on one host

2016-06-15 Thread Kostis Fardelas

Hello Jacob, Gregory,

did you manage to start up those OSDs at last? I came across a very
much alike incident [1] (no flags preventing the OSDs from getting UP
in the cluster though, no hardware problems reported) and I wonder if
you found out what was the culprit in your case.

[1] http://permalink.gmane.org/gmane.comp.file-systems.ceph.user/30432

Best regards,
Kostis

On 17 April 2015 at 02:04, Gregory Farnum  wrote:
> The monitor looks like it's not generating a new OSDMap including the
> booting OSDs. I could say with more certainty what's going on with the
> monitor log file, but I'm betting you've got one of the noin or noup
> family of flags set. I *think* these will be output in "ceph -w" or in
> "ceph osd dump", although I can't say for certain in Firefly.
> -Greg
>
> On Fri, Apr 10, 2015 at 1:57 AM, Jacob Reid  
> wrote:
>> On Fri, Apr 10, 2015 at 09:55:20AM +0100, Jacob Reid wrote:
>>> On Thu, Apr 09, 2015 at 05:21:47PM +0100, Jacob Reid wrote:
>>> > On Thu, Apr 09, 2015 at 08:46:07AM -0700, Gregory Farnum wrote:
>>> > > On Thu, Apr 9, 2015 at 8:14 AM, Jacob Reid 
>>> > >  wrote:
>>> > > > On Thu, Apr 09, 2015 at 06:43:45AM -0700, Gregory Farnum wrote:
>>> > > >> You can turn up debugging ("debug osd = 10" and "debug filestore = 
>>> > > >> 10"
>>> > > >> are probably enough, or maybe 20 each) and see what comes out to get
>>> > > >> more information about why the threads are stuck.
>>> > > >>
>>> > > >> But just from the log my answer is the same as before, and now I 
>>> > > >> don't
>>> > > >> trust that controller (or maybe its disks), regardless of what it's
>>> > > >> admitting to. ;)
>>> > > >> -Greg
>>> > > >>
>>> > > >
>>> > > > Ran with osd and filestore debug both at 20; still nothing jumping 
>>> > > > out at me. Logfile attached as it got huge fairly quickly, but mostly 
>>> > > > seems to be the same extra lines. I tried running some test I/O on 
>>> > > > the drives in question to try and provoke some kind of problem, but 
>>> > > > they seem fine now...
>>> > >
>>> > > Okay, this is strange. Something very wonky is happening with your
>>> > > scheduler — it looks like these threads are all idle, and they're
>>> > > scheduling wakeups that handle an appreciable amount of time after
>>> > > they're supposed to. For instance:
>>> > > 2015-04-09 15:56:55.953116 7f70a7963700 20
>>> > > filestore(/var/lib/ceph/osd/osd.15) sync_entry woke after 5.416704
>>> > > 2015-04-09 15:56:55.953153 7f70a7963700 20
>>> > > filestore(/var/lib/ceph/osd/osd.15) sync_entry waiting for
>>> > > max_interval 5.00
>>> > >
>>> > > This is the thread that syncs your backing store, and it always sets
>>> > > itself to get woken up at 5-second intervals — but here it took >5.4
>>> > > seconds, and later on in your log it takes more than 6 seconds.
>>> > > It looks like all the threads which are getting timed out are also
>>> > > idle, but are taking so much longer to wake up than they're set for
>>> > > that they get a timeout warning.
>>> > >
>>> > > There might be some bugs in here where we're expecting wakeups to be
>>> > > more precise than they can be, but these sorts of misses are
>>> > > definitely not normal. Is this server overloaded on the CPU? Have you
>>> > > done something to make the scheduler or wakeups wonky?
>>> > > -Greg
>>> >
>>> > CPU load is minimal - the host does nothing but run OSDs and has 8 cores 
>>> > that are all sitting idle with a load average of 0.1. I haven't done 
>>> > anything to scheduling. That was with the debug logging on, if that could 
>>> > be the cause of any delays. A scheduler issue seems possible - I haven't 
>>> > done anything to it, but `time sleep 5` run a few times returns anything 
>>> > spread randomly from 5.002 to 7.1(!) seconds but mostly in the 5.5-6.0 
>>> > region where it managed fairly consistently <5.2 on the other servers in 
>>> > the cluster and <5.02 on my desktop. I have disabled the CPU power saving 
>>> > mode as the only thing I could think of that might be having an effect on 
>>> > this, and running the same test again gives more sane results... we'll 
>>> > see if this reflects in the OSD logs or not, I guess. If this is the 
>>> > cause, it's probably something that the next version might want to make a 
>>> > specific warning case of detecting. I will keep you updated as to their 
>>> > behaviour now...
>>> > ___
>>> > ceph-users mailing list
>>> > ceph-users@lists.ceph.com
>>> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>
>>> Overnight, nothing changed - I am no longer seeing the timeout in the logs 
>>> but all the OSDs in questions are still happily sitting at booting and 
>>> showing as down in the tree. Debug 20 logfile attached again.
>> ...and here actually *is* the logfile, which I managed to forget... must be 
>> Friday, I guess.
>>
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/

Re: [ceph-users] Is Dynamic Cache tiering supported in Jewel

2016-06-15 Thread Oliver Dzombic

Hi,

yes thats no problem.

In addition to what christian told you, these two links are helpful to
understand the stuff:

https://software.intel.com/en-us/blogs/2015/03/03/ceph-cache-tiering-introduction

http://docs.ceph.com/docs/jewel/rados/operations/cache-tiering/

http://docs.ceph.com/docs/jewel/rados/operations/crush-map/#placing-different-pools-on-different-osds

Good luck !

-- 
Mit freundlichen Gruessen / Best regards

Oliver Dzombic
IP-Interactive

mailto:i...@ip-interactive.de

Anschrift:

IP Interactive UG ( haftungsbeschraenkt )
Zum Sonnenberg 1-3
63571 Gelnhausen

HRB 93402 beim Amtsgericht Hanau
Geschäftsführung: Oliver Dzombic

Steuer Nr.: 35 236 3622 1
UST ID: DE274086107


Am 15.06.2016 um 15:09 schrieb Venkata Manojawa Paritala:
> Hi,
> 
> We are working to try cache tiering in Ceph and would like to know if
> this can be attempted dynamically - basically add cache pool to another
> pool which is already having IO.
> 
> Thank you for your response in advance.
> 
> - Manoj
> 
> 
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] Ceph osd too full

2016-06-15 Thread Hauke Homburg


-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Hello,

I have a ceph jewel Cluster with 5 Server and 40 OSD.
The Cluster is very full, but at this Moment i cannot use 10 Percent of
the Volume because the ceph health health says some Harddisks are too
full. They are between 75 and 95 Percent full. A

ceph osd reweight-by-utilization doesn't help. How can i Fill the osd
constantly to use the maximun space?

Regards

Hauke

- -- 
www.w3-creative.de

www.westchat.de
-BEGIN PGP SIGNATURE-
Version: GnuPG v2.0.19 (GNU/Linux)

iQIcBAEBAgAGBQJXYZrdAAoJEEIVizQb/Y0mrM0P/0YqB5Zb69I/HDblqfSmg+26
1Io5j/vTz9gs5orHEvvU6wNZiEVnh8jfeczzxMaNQ+zW4MGED/ahrpZoHnJ5xEbb
a4xqpvrZdFYFYrhgrFDEQEo3cqC3L5E4VjR4aBp77WjH/Q7G9v62IHrNM0uU7Yfg
RKw7/zxHmZQBWek5Co7AtRmzZdjS7RelaVyEHQ7Vu2nO1aZUNYvjgUvVCHdos/TG
F3yiwFcXEk7H6EHyHs6dUoTgm0OOVw/MjOD7kLtM/uModEZoxQT5uuvod6iHZ5nE
eNkV/ipcTbUaDdkBbpBKhfNjsoyYLetNblEWbmrWw8bmorjq0CmtKT229cBrNZW8
bdPbrbG6/TCkydVm0KHEgU97FsIPI6yqJxSCnsFEBNFjYVvBlysqK1awXHK+tTjV
v3arQFFEIRC8salEoIWaGx97M3S/HuqcTV3zlZ+OrfXblrB5h3YJTonnxyi4Z1c7
7imsMneNAYhlVcZtcWxNxKB8/wu0sX8yvjkwYMh1bIF3H/pt0JhoyJsWvEcKgEbH
s37nJ6I3hFZc9okefLK6uz9zIkZ1CLzYdTSnZS0pIDufHZVvuJe3nN1PSOAZ24JI
H4eV5INWS81f0EzOfUXRkfq86uDEtNIpLa3J+CHuYcnNYOc3TA/vBTB3QOBXaIcF
tT3jp+p3+DiDmvuynICc
=AU62
-END PGP SIGNATURE-


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] Ceph Day Switzerland slides and video

2016-06-15 Thread Dan van der Ster

Dear Ceph Community,

Yesterday we had the pleasure of hosting Ceph Day Switzerland, and we
wanted to let you know that the slides and videos of most talks have
been posted online:

  https://indico.cern.ch/event/542464/timetable/

Thanks again to all the speakers and attendees!

Hervé & Dan

CERN IT Dept
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] v10.2.2 Jewel released

2016-06-15 Thread Sage Weil

This point release fixes several important bugs in RBD mirroring, RGW 
multi-site, CephFS, and RADOS.

We recommend that all v10.2.x users upgrade.

For more detailed information, see the release notes at

http://docs.ceph.com/docs/master/release-notes/#v10-2-2-jewel

or the complete changelog at

http://docs.ceph.com/docs/master/_downloads/v10.2.2.txt

Getting Ceph


* Git at git://github.com/ceph/ceph.git
* Tarball at http://download.ceph.com/tarballs/ceph-10.2.2.tar.gz
* For packages, see http://ceph.com/docs/master/install/get-packages
* For ceph-deploy, see http://ceph.com/docs/master/install/install-ceph-deploy


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Switches and latency

2016-06-15 Thread Nick Fisk

> -Original Message-
> From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of
> Gandalf Corvotempesta
> Sent: 15 June 2016 17:03
> To: ceph-us...@ceph.com
> Subject: [ceph-users] Switches and latency
> 
> Let's assume a fully redundant network.
> We need 4 switches, 2 for the public network, 2 for the cluster network.

I would reconsider if you need separate switches for each network, vlans
would normally be sufficient. If bandwidth is not an issue, you could even
tag both vlans over the same uplinks. Then there is the discussion around
whether separate networks are really essential

> 
> 10GBase-T has higher latency than SFP+ but are also cheaper, as manu new
> servers ha 10GBaseT integrated onboard and there is no need for twinax
> cables or transceaver.

Very true and is one for the reasons we switched to it, the other being I
was fed up having to solve the "this cable doesn't work with this
switch/NIC" challenge. Why cables need eeprom to say which devices they will
work with is lost on me!!!

On the latency front I wouldn't be too concerned. 10GB-T has about 2us more
latency per hop than SFP. Lowest latency's commonly seen in Ceph are around
500-1000us for reads and 2000us for writes. So unless you are trying to get
every last 0.01 of a percent, I don't think you will notice. It might be
wise to link the switches together though with SFP or 40G, so the higher
latency only effects the last hop to the host and will put you in a better
place if/when you need to scale your network out.

> 
> I think that low latency is needed on the public networks, where servers
has
> to read data from and not on the cluster network, used only for
replication
> purposes.
> 
> What do you think ? using SFP+ everywhere would increase the total cost.
> And what about Infiniband (IPoIB) for the cluster network and SFP+ for the
> public network? IB refurbished switches are cheaper than 10GB SFP+
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Switches and latency

2016-06-15 Thread Gandalf Corvotempesta

2016-06-15 22:13 GMT+02:00 Nick Fisk :
> I would reconsider if you need separate switches for each network, vlans
> would normally be sufficient. If bandwidth is not an issue, you could even
> tag both vlans over the same uplinks. Then there is the discussion around
> whether separate networks are really essential

Are you suggesting to use the same switch port for both public and
private network
by using vlans? This will slow down everything, as the same port is
used for both
replication and public access.

What I can do is buying 2 switches with 24 ports and using, at the
moment, port 1 to 12
for public (vlan100) and port 13 to 24 for private (vlan200)

When I'll have to grow the cluster with more than 12 OSDs servers or more than
12 "frontend" servers, i'll buy 2 switches more and move all cabling
to the newer ones.

Even better, to keep cost low: 2 12 ports switches, 6 used as front, 6
used as cluster network.
Will allow me to use 6 OSDs servers (288TB raw, by using 12 4TB disks
on each server) and
6 hypervisors servers to access the cluster.
When I have to grow, i'll change 2 switches with bigger ones.

(side question: which switch should I change? The cluster one or the
public one  ? Changing the
cluster one would trigger an healing during the cabling switch as ceph
will loose 1 OSD server for a
couple of seconds, right? Changing the  frontend one will trigger a
VMs migration as the whole node
loose the storage access or just a temporary I/O freeze?)

> Very true and is one for the reasons we switched to it, the other being I
> was fed up having to solve the "this cable doesn't work with this
> switch/NIC" challenge. Why cables need eeprom to say which devices they will
> work with is lost on me!!!

Twinax cables aren't standard and could not work with my switches?
if so, 10BaseT for the rest of my life!

> On the latency front I wouldn't be too concerned. 10GB-T has about 2us more
> latency per hop than SFP. Lowest latency's commonly seen in Ceph are around
> 500-1000us for reads and 2000us for writes. So unless you are trying to get
> every last 0.01 of a percent, I don't think you will notice. It might be
> wise to link the switches together though with SFP or 40G, so the higher
> latency only effects the last hop to the host and will put you in a better
> place if/when you need to scale your network out.

My network is very flat. I'll have 2 hop maximum:

OSD server -> cluster switch (TOR) -> spine switch -> cluster switch
(TOR) -> OSD server
This only in case of multiple racks. In a single rack i'll have just 1
hop between OSD server and the cluster switch Top Of Rack.

I can aggregate links between TOR and Spine by using 4x 10GBaseT
ports. I don't have any 10/40GB switch and would be too expensive.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] Fio randwrite does not work on Centos 7.2 VM

2016-06-15 Thread Mansour Shafaei Moghaddam

Hi All,

Has anyone faced a similar issue? I do not have a problem with random read,
sequential read, and sequential writes though. Everytime I try running fio
for random writes, one osd in the cluster crashes. Here is the what I see
at the tail of the log:

 ceph version 0.94.6 (e832001feaf8c176593e0325c8298e3f16dfb403)
 1: ceph-osd() [0x9d6334]
 2: (()+0xf100) [0x7fa2e88fb100]
 3: (gsignal()+0x37) [0x7fa2e73145f7]
 4: (abort()+0x148) [0x7fa2e7315ce8]
 5: (__gnu_cxx::__verbose_terminate_handler()+0x165) [0x7fa2e7c189d5]
 6: (()+0x5e946) [0x7fa2e7c16946]
 7: (()+0x5e973) [0x7fa2e7c16973]
 8: (()+0x5eb93) [0x7fa2e7c16b93]
 9: (ceph::__ceph_assert_fail(char const*, char const*, int, char
const*)+0x24a) [0xacd8ea]
 10: (FileStore::_do_transaction(ObjectStore::Transaction&, unsigned long,
int, ThreadPool::TPHandle*)+0xa24) [0x8b8114]
 11: (FileStore::_do_transactions(std::list >&, unsigned long,
ThreadPool::TPHandle*)+0x64) [0x8bcf34]
 12: (FileStore::_do_op(FileStore::OpSequencer*,
ThreadPool::TPHandle&)+0x17e) [0x8bd0ce]
 13: (ThreadPool::worker(ThreadPool::WorkThread*)+0xa56) [0xabe326]
 14: (ThreadPool::WorkThread::entry()+0x10) [0xabf3d0]
 15: (()+0x7dc5) [0x7fa2e88f3dc5]
 16: (clone()+0x6d) [0x7fa2e73d528d]
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Switches and latency

2016-06-15 Thread Nick Fisk

> -Original Message-
> From: Gandalf Corvotempesta [mailto:gandalf.corvotempe...@gmail.com]
> Sent: 15 June 2016 21:33
> To: n...@fisk.me.uk
> Cc: ceph-us...@ceph.com
> Subject: Re: [ceph-users] Switches and latency
> 
> 2016-06-15 22:13 GMT+02:00 Nick Fisk :
> > I would reconsider if you need separate switches for each network,
> > vlans would normally be sufficient. If bandwidth is not an issue, you
> > could even tag both vlans over the same uplinks. Then there is the
> > discussion around whether separate networks are really essential
> 
> Are you suggesting to use the same switch port for both public and private
> network by using vlans? This will slow down everything, as the same port is
> used for both replication and public access.

Possibly, but by how much? 20GB of bandwidth is a lot to feed 12x7.2k disks, 
particularly if they start doing any sort of non-sequential IO. 

> 
> What I can do is buying 2 switches with 24 ports and using, at the moment,
> port 1 to 12 for public (vlan100) and port 13 to 24 for private (vlan200)
> 
> When I'll have to grow the cluster with more than 12 OSDs servers or more
> than
> 12 "frontend" servers, i'll buy 2 switches more and move all cabling to the
> newer ones.
> 
> Even better, to keep cost low: 2 12 ports switches, 6 used as front, 6 used as
> cluster network.
> Will allow me to use 6 OSDs servers (288TB raw, by using 12 4TB disks on each
> server) and
> 6 hypervisors servers to access the cluster.
> When I have to grow, i'll change 2 switches with bigger ones.
> 
> (side question: which switch should I change? The cluster one or the public
> one  ? Changing the cluster one would trigger an healing during the cabling
> switch as ceph will loose 1 OSD server for a couple of seconds, right?
> Changing the  frontend one will trigger a VMs migration as the whole node
> loose the storage access or just a temporary I/O freeze?)

I think you want to try and keep it simple as possible and make the right 
decision 1st time round. Buy a TOR switch that will accommodate the number of 
servers you wish to put in your rack and you should never have a need to change 
it. 

I think there are issues when one of networks is down and not the other, so 
stick to keeping each server terminating into the same switch for all its 
connections, otherwise you are just inviting trouble to happen.

> 
> > Very true and is one for the reasons we switched to it, the other
> > being I was fed up having to solve the "this cable doesn't work with
> > this switch/NIC" challenge. Why cables need eeprom to say which
> > devices they will work with is lost on me!!!
> 
> Twinax cables aren't standard and could not work with my switches?
> if so, 10BaseT for the rest of my life!

Yeah and its worse if you want to connect too different manufacturers kit as 
you sometimes even need a bespoke cable that has the right vendor matched on 
either end. I think some vendors are better than others, but I just got fed up 
with it and liked the fact that with 10G-t it just works.

> > On the latency front I wouldn't be too concerned. 10GB-T has about 2us
> > more latency per hop than SFP. Lowest latency's commonly seen in Ceph
> > are around 500-1000us for reads and 2000us for writes. So unless you
> > are trying to get every last 0.01 of a percent, I don't think you will
> > notice. It might be wise to link the switches together though with SFP
> > or 40G, so the higher latency only effects the last hop to the host
> > and will put you in a better place if/when you need to scale your network
> out.
> 
> My network is very flat. I'll have 2 hop maximum:
> 
> OSD server -> cluster switch (TOR) -> spine switch -> cluster switch
> (TOR) -> OSD server
> This only in case of multiple racks. In a single rack i'll have just 1 hop 
> between
> OSD server and the cluster switch Top Of Rack.
> 
> I can aggregate links between TOR and Spine by using 4x 10GBaseT ports. I
> don't have any 10/40GB switch and would be too expensive.

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Fio randwrite does not work on Centos 7.2 VM

2016-06-15 Thread Somnath Roy

There should be a line in the log specifying which assert is failing , post 
that along with say 10 lines from top of that..

Thanks & Regards
Somnath

From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of 
Mansour Shafaei Moghaddam
Sent: Wednesday, June 15, 2016 1:57 PM
To: ceph-users@lists.ceph.com
Subject: [ceph-users] Fio randwrite does not work on Centos 7.2 VM

Hi All,

Has anyone faced a similar issue? I do not have a problem with random read, 
sequential read, and sequential writes though. Everytime I try running fio for 
random writes, one osd in the cluster crashes. Here is the what I see at the 
tail of the log:

 ceph version 0.94.6 (e832001feaf8c176593e0325c8298e3f16dfb403)
 1: ceph-osd() [0x9d6334]
 2: (()+0xf100) [0x7fa2e88fb100]
 3: (gsignal()+0x37) [0x7fa2e73145f7]
 4: (abort()+0x148) [0x7fa2e7315ce8]
 5: (__gnu_cxx::__verbose_terminate_handler()+0x165) [0x7fa2e7c189d5]
 6: (()+0x5e946) [0x7fa2e7c16946]
 7: (()+0x5e973) [0x7fa2e7c16973]
 8: (()+0x5eb93) [0x7fa2e7c16b93]
 9: (ceph::__ceph_assert_fail(char const*, char const*, int, char 
const*)+0x24a) [0xacd8ea]
 10: (FileStore::_do_transaction(ObjectStore::Transaction&, unsigned long, int, 
ThreadPool::TPHandle*)+0xa24) [0x8b8114]
 11: (FileStore::_do_transactions(std::list >&, unsigned long, 
ThreadPool::TPHandle*)+0x64) [0x8bcf34]
 12: (FileStore::_do_op(FileStore::OpSequencer*, ThreadPool::TPHandle&)+0x17e) 
[0x8bd0ce]
 13: (ThreadPool::worker(ThreadPool::WorkThread*)+0xa56) [0xabe326]
 14: (ThreadPool::WorkThread::entry()+0x10) [0xabf3d0]
 15: (()+0x7dc5) [0x7fa2e88f3dc5]
 16: (clone()+0x6d) [0x7fa2e73d528d]

PLEASE NOTE: The information contained in this electronic mail message is 
intended only for the use of the designated recipient(s) named above. If the 
reader of this message is not the intended recipient, you are hereby notified 
that you have received this message in error and that any review, 
dissemination, distribution, or copying of this message is strictly prohibited. 
If you have received this communication in error, please notify the sender by 
telephone or e-mail (as shown above) immediately and destroy any and all copies 
of this message in your possession (whether hard copies or electronically 
stored copies).
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Fio randwrite does not work on Centos 7.2 VM

2016-06-15 Thread Mansour Shafaei Moghaddam

It fails at "FileStore.cc: 2761". Here is a more complete log:

-9> 2016-06-15 10:55:13.205014 7fa2dcd85700 -1 dump_open_fds unable to
open /proc/self/fd
-8> 2016-06-15 10:55:13.205085 7fa2cb402700  2
filestore(/var/lib/ceph/osd/ceph-0) waiting 51 > 50 ops || 328390 >
104857600
-7> 2016-06-15 10:55:13.205094 7fa2cd406700  2
filestore(/var/lib/ceph/osd/ceph-0) waiting 51 > 50 ops || 328389 >
104857600
-6> 2016-06-15 10:55:13.205111 7fa2cac01700  2
filestore(/var/lib/ceph/osd/ceph-0) waiting 51 > 50 ops || 328317 >
104857600
-5> 2016-06-15 10:55:13.205118 7fa2ca400700  2
filestore(/var/lib/ceph/osd/ceph-0) waiting 51 > 50 ops || 328390 >
104857600
-4> 2016-06-15 10:55:13.205121 7fa2cdc07700  2
filestore(/var/lib/ceph/osd/ceph-0) waiting 51 > 50 ops || 328390 >
104857600
-3> 2016-06-15 10:55:13.205153 7fa2de588700  5 -- op tracker -- seq:
1476, time: 2016-06-15 10:55:13.205153, event: journaled_completion_queued,
op: osd_op(client.4109.0:1457 rb.0.100a.6b8b4567.6b6c
[set-alloc-hint object_size 4194304 write_size 4194304,write 1884160~4096]
0.cbe1d8a4 ack+ondisk+write e9)
-2> 2016-06-15 10:55:13.205183 7fa2de588700  5 -- op tracker -- seq:
1483, time: 2016-06-15 10:55:13.205183, event:
write_thread_in_journal_buffer, op: osd_op(client.4109.0:1464
rb.0.100a.6b8b4567.524d [set-alloc-hint object_size 4194304
write_size 4194304,write 3051520~4096] 0.6778c255 ack+ondisk+write e9)
-1> 2016-06-15 10:55:13.205400 7fa2de588700  5 -- op tracker -- seq:
1483, time: 2016-06-15 10:55:13.205400, event: journaled_completion_queued,
op: osd_op(client.4109.0:1464 rb.0.100a.6b8b4567.524d
[set-alloc-hint object_size 4194304 write_size 4194304,write 3051520~4096]
0.6778c255 ack+ondisk+write e9)
 0> 2016-06-15 10:55:13.206559 7fa2dcd85700 -1 os/FileStore.cc: In
function 'unsigned int
FileStore::_do_transaction(ObjectStore::Transaction&, uint64_t, int,
ThreadPool::TPHandle*)' thread 7fa2dcd85700 time 2016-06-15 10:55:13.205018
os/FileStore.cc: 2761: FAILED assert(0 == "unexpected error")

 ceph version 0.94.6 (e832001feaf8c176593e0325c8298e3f16dfb403)
 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char
const*)+0x78) [0xacd718]
 2: (FileStore::_do_transaction(ObjectStore::Transaction&, unsigned long,
int, ThreadPool::TPHandle*)+0xa24) [0x8b8114]
 3: (FileStore::_do_transactions(std::list >&, unsigned long,
ThreadPool::TPHandle*)+0x64) [0x8bcf34]
 4: (FileStore::_do_op(FileStore::OpSequencer*,
ThreadPool::TPHandle&)+0x17e) [0x8bd0ce]
 5: (ThreadPool::worker(ThreadPool::WorkThread*)+0xa56) [0xabe326]
 6: (ThreadPool::WorkThread::entry()+0x10) [0xabf3d0]
 7: (()+0x7dc5) [0x7fa2e88f3dc5]
 8: (clone()+0x6d) [0x7fa2e73d528d]


On Wed, Jun 15, 2016 at 2:05 PM, Somnath Roy 
wrote:

> There should be a line in the log specifying which assert is failing ,
> post that along with say 10 lines from top of that..
>
>
>
> Thanks & Regards
>
> Somnath
>
>
>
> *From:* ceph-users [mailto:ceph-users-boun...@lists.ceph.com] *On Behalf
> Of *Mansour Shafaei Moghaddam
> *Sent:* Wednesday, June 15, 2016 1:57 PM
> *To:* ceph-users@lists.ceph.com
> *Subject:* [ceph-users] Fio randwrite does not work on Centos 7.2 VM
>
>
>
> Hi All,
>
>
>
> Has anyone faced a similar issue? I do not have a problem with random
> read, sequential read, and sequential writes though. Everytime I try
> running fio for random writes, one osd in the cluster crashes. Here is the
> what I see at the tail of the log:
>
>
>
>  ceph version 0.94.6 (e832001feaf8c176593e0325c8298e3f16dfb403)
>
>  1: ceph-osd() [0x9d6334]
>
>  2: (()+0xf100) [0x7fa2e88fb100]
>
>  3: (gsignal()+0x37) [0x7fa2e73145f7]
>
>  4: (abort()+0x148) [0x7fa2e7315ce8]
>
>  5: (__gnu_cxx::__verbose_terminate_handler()+0x165) [0x7fa2e7c189d5]
>
>  6: (()+0x5e946) [0x7fa2e7c16946]
>
>  7: (()+0x5e973) [0x7fa2e7c16973]
>
>  8: (()+0x5eb93) [0x7fa2e7c16b93]
>
>  9: (ceph::__ceph_assert_fail(char const*, char const*, int, char
> const*)+0x24a) [0xacd8ea]
>
>  10: (FileStore::_do_transaction(ObjectStore::Transaction&, unsigned long,
> int, ThreadPool::TPHandle*)+0xa24) [0x8b8114]
>
>  11: (FileStore::_do_transactions(std::list std::allocator >&, unsigned long,
> ThreadPool::TPHandle*)+0x64) [0x8bcf34]
>
>  12: (FileStore::_do_op(FileStore::OpSequencer*,
> ThreadPool::TPHandle&)+0x17e) [0x8bd0ce]
>
>  13: (ThreadPool::worker(ThreadPool::WorkThread*)+0xa56) [0xabe326]
>
>  14: (ThreadPool::WorkThread::entry()+0x10) [0xabf3d0]
>
>  15: (()+0x7dc5) [0x7fa2e88f3dc5]
>
>  16: (clone()+0x6d) [0x7fa2e73d528d]
>
>
>
>
> PLEASE NOTE: The information contained in this electronic mail message is
> intended only for the use of the designated recipient(s) named above. If
> the reader of this message is not the intended recipient, you are hereby
> notified that you have received this message in error and that any review,
> dissemination, distribution, or copying of this message is strictly
> prohibited. If you have received this communication in error, please not

Re: [ceph-users] Fio randwrite does not work on Centos 7.2 VM

2016-06-15 Thread Samuel Just

I think you hit the os process fd limit.  You need to adjust it.
-Sam

On Wed, Jun 15, 2016 at 2:07 PM, Mansour Shafaei Moghaddam
 wrote:
> It fails at "FileStore.cc: 2761". Here is a more complete log:
>
> -9> 2016-06-15 10:55:13.205014 7fa2dcd85700 -1 dump_open_fds unable to
> open /proc/self/fd
> -8> 2016-06-15 10:55:13.205085 7fa2cb402700  2
> filestore(/var/lib/ceph/osd/ceph-0) waiting 51 > 50 ops || 328390 >
> 104857600
> -7> 2016-06-15 10:55:13.205094 7fa2cd406700  2
> filestore(/var/lib/ceph/osd/ceph-0) waiting 51 > 50 ops || 328389 >
> 104857600
> -6> 2016-06-15 10:55:13.205111 7fa2cac01700  2
> filestore(/var/lib/ceph/osd/ceph-0) waiting 51 > 50 ops || 328317 >
> 104857600
> -5> 2016-06-15 10:55:13.205118 7fa2ca400700  2
> filestore(/var/lib/ceph/osd/ceph-0) waiting 51 > 50 ops || 328390 >
> 104857600
> -4> 2016-06-15 10:55:13.205121 7fa2cdc07700  2
> filestore(/var/lib/ceph/osd/ceph-0) waiting 51 > 50 ops || 328390 >
> 104857600
> -3> 2016-06-15 10:55:13.205153 7fa2de588700  5 -- op tracker -- seq:
> 1476, time: 2016-06-15 10:55:13.205153, event: journaled_completion_queued,
> op: osd_op(client.4109.0:1457 rb.0.100a.6b8b4567.6b6c
> [set-alloc-hint object_size 4194304 write_size 4194304,write 1884160~4096]
> 0.cbe1d8a4 ack+ondisk+write e9)
> -2> 2016-06-15 10:55:13.205183 7fa2de588700  5 -- op tracker -- seq:
> 1483, time: 2016-06-15 10:55:13.205183, event:
> write_thread_in_journal_buffer, op: osd_op(client.4109.0:1464
> rb.0.100a.6b8b4567.524d [set-alloc-hint object_size 4194304
> write_size 4194304,write 3051520~4096] 0.6778c255 ack+ondisk+write e9)
> -1> 2016-06-15 10:55:13.205400 7fa2de588700  5 -- op tracker -- seq:
> 1483, time: 2016-06-15 10:55:13.205400, event: journaled_completion_queued,
> op: osd_op(client.4109.0:1464 rb.0.100a.6b8b4567.524d
> [set-alloc-hint object_size 4194304 write_size 4194304,write 3051520~4096]
> 0.6778c255 ack+ondisk+write e9)
>  0> 2016-06-15 10:55:13.206559 7fa2dcd85700 -1 os/FileStore.cc: In
> function 'unsigned int FileStore::_do_transaction(ObjectStore::Transaction&,
> uint64_t, int, ThreadPool::TPHandle*)' thread 7fa2dcd85700 time 2016-06-15
> 10:55:13.205018
> os/FileStore.cc: 2761: FAILED assert(0 == "unexpected error")
>
>  ceph version 0.94.6 (e832001feaf8c176593e0325c8298e3f16dfb403)
>  1: (ceph::__ceph_assert_fail(char const*, char const*, int, char
> const*)+0x78) [0xacd718]
>  2: (FileStore::_do_transaction(ObjectStore::Transaction&, unsigned long,
> int, ThreadPool::TPHandle*)+0xa24) [0x8b8114]
>  3: (FileStore::_do_transactions(std::list std::allocator >&, unsigned long,
> ThreadPool::TPHandle*)+0x64) [0x8bcf34]
>  4: (FileStore::_do_op(FileStore::OpSequencer*,
> ThreadPool::TPHandle&)+0x17e) [0x8bd0ce]
>  5: (ThreadPool::worker(ThreadPool::WorkThread*)+0xa56) [0xabe326]
>  6: (ThreadPool::WorkThread::entry()+0x10) [0xabf3d0]
>  7: (()+0x7dc5) [0x7fa2e88f3dc5]
>  8: (clone()+0x6d) [0x7fa2e73d528d]
>
>
> On Wed, Jun 15, 2016 at 2:05 PM, Somnath Roy 
> wrote:
>>
>> There should be a line in the log specifying which assert is failing ,
>> post that along with say 10 lines from top of that..
>>
>>
>>
>> Thanks & Regards
>>
>> Somnath
>>
>>
>>
>> From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of
>> Mansour Shafaei Moghaddam
>> Sent: Wednesday, June 15, 2016 1:57 PM
>> To: ceph-users@lists.ceph.com
>> Subject: [ceph-users] Fio randwrite does not work on Centos 7.2 VM
>>
>>
>>
>> Hi All,
>>
>>
>>
>> Has anyone faced a similar issue? I do not have a problem with random
>> read, sequential read, and sequential writes though. Everytime I try running
>> fio for random writes, one osd in the cluster crashes. Here is the what I
>> see at the tail of the log:
>>
>>
>>
>>  ceph version 0.94.6 (e832001feaf8c176593e0325c8298e3f16dfb403)
>>
>>  1: ceph-osd() [0x9d6334]
>>
>>  2: (()+0xf100) [0x7fa2e88fb100]
>>
>>  3: (gsignal()+0x37) [0x7fa2e73145f7]
>>
>>  4: (abort()+0x148) [0x7fa2e7315ce8]
>>
>>  5: (__gnu_cxx::__verbose_terminate_handler()+0x165) [0x7fa2e7c189d5]
>>
>>  6: (()+0x5e946) [0x7fa2e7c16946]
>>
>>  7: (()+0x5e973) [0x7fa2e7c16973]
>>
>>  8: (()+0x5eb93) [0x7fa2e7c16b93]
>>
>>  9: (ceph::__ceph_assert_fail(char const*, char const*, int, char
>> const*)+0x24a) [0xacd8ea]
>>
>>  10: (FileStore::_do_transaction(ObjectStore::Transaction&, unsigned long,
>> int, ThreadPool::TPHandle*)+0xa24) [0x8b8114]
>>
>>  11: (FileStore::_do_transactions(std::list> std::allocator >&, unsigned long,
>> ThreadPool::TPHandle*)+0x64) [0x8bcf34]
>>
>>  12: (FileStore::_do_op(FileStore::OpSequencer*,
>> ThreadPool::TPHandle&)+0x17e) [0x8bd0ce]
>>
>>  13: (ThreadPool::worker(ThreadPool::WorkThread*)+0xa56) [0xabe326]
>>
>>  14: (ThreadPool::WorkThread::entry()+0x10) [0xabf3d0]
>>
>>  15: (()+0x7dc5) [0x7fa2e88f3dc5]
>>
>>  16: (clone()+0x6d) [0x7fa2e73d528d]
>>
>>
>>
>>
>>
>> PLEASE NOTE: The information contained in this electronic mail message is
>> intended only for the use of the desi

Re: [ceph-users] Switches and latency

2016-06-15 Thread Gandalf Corvotempesta

2016-06-15 22:59 GMT+02:00 Nick Fisk :
> Possibly, but by how much? 20GB of bandwidth is a lot to feed 12x7.2k disks, 
> particularly if they start doing any sort of non-sequential IO.

Assuming 100MB/s for each SATA disk, 12 disks are 1200MB/s = 9600mbit/s
Why are you talking about 20Gb/s ? By using VLANs on the same port for
both public and cluster traffic,
i'll have 10Gb/s to share, but all disks can saturate the whole nic
(9600mbit/s on a 1mbit/s network)

I can't aggregate 2 ports, or I have to buy stackable switches with
support for LAG across both switches, much more expansive.
And obviously I can't use only one switch. Network must be fault tollerance.

> I think you want to try and keep it simple as possible and make the right 
> decision 1st time round. Buy a TOR switch that will accommodate the number of 
> servers you wish to put in your rack and you should never have a need to 
> change it.
>
> I think there are issues when one of networks is down and not the other, so 
> stick to keeping each server terminating into the same switch for all its 
> connections, otherwise you are just inviting trouble to happen.

This is not good. a network could fail. In a HA cluster, network
failure must be taken in consideration.
What I would like to do is to unplug cable from switch 1 and plug to
switch 2. a couple of seconds max. (obviously switch2 will be
temporary connected to switch1)
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Fio randwrite does not work on Centos 7.2 VM

2016-06-15 Thread Somnath Roy

You ran out of fd limit..Increase with ulimit..

From: Mansour Shafaei Moghaddam [mailto:mansoor.shaf...@gmail.com]
Sent: Wednesday, June 15, 2016 2:08 PM
To: Somnath Roy
Cc: ceph-users@lists.ceph.com
Subject: Re: [ceph-users] Fio randwrite does not work on Centos 7.2 VM

It fails at "FileStore.cc: 2761". Here is a more complete log:

-9> 2016-06-15 10:55:13.205014 7fa2dcd85700 -1 dump_open_fds unable to open 
/proc/self/fd
-8> 2016-06-15 10:55:13.205085 7fa2cb402700  2 
filestore(/var/lib/ceph/osd/ceph-0) waiting 51 > 50 ops || 328390 > 104857600
-7> 2016-06-15 10:55:13.205094 7fa2cd406700  2 
filestore(/var/lib/ceph/osd/ceph-0) waiting 51 > 50 ops || 328389 > 104857600
-6> 2016-06-15 10:55:13.205111 7fa2cac01700  2 
filestore(/var/lib/ceph/osd/ceph-0) waiting 51 > 50 ops || 328317 > 104857600
-5> 2016-06-15 10:55:13.205118 7fa2ca400700  2 
filestore(/var/lib/ceph/osd/ceph-0) waiting 51 > 50 ops || 328390 > 104857600
-4> 2016-06-15 10:55:13.205121 7fa2cdc07700  2 
filestore(/var/lib/ceph/osd/ceph-0) waiting 51 > 50 ops || 328390 > 104857600
-3> 2016-06-15 10:55:13.205153 7fa2de588700  5 -- op tracker -- seq: 1476, 
time: 2016-06-15 10:55:13.205153, event: journaled_completion_queued, op: 
osd_op(client.4109.0:1457 rb.0.100a.6b8b4567.6b6c [set-alloc-hint 
object_size 4194304 write_size 4194304,write 1884160~4096] 0.cbe1d8a4 
ack+ondisk+write e9)
-2> 2016-06-15 10:55:13.205183 7fa2de588700  5 -- op tracker -- seq: 1483, 
time: 2016-06-15 10:55:13.205183, event: write_thread_in_journal_buffer, op: 
osd_op(client.4109.0:1464 rb.0.100a.6b8b4567.524d [set-alloc-hint 
object_size 4194304 write_size 4194304,write 3051520~4096] 0.6778c255 
ack+ondisk+write e9)
-1> 2016-06-15 10:55:13.205400 7fa2de588700  5 -- op tracker -- seq: 1483, 
time: 2016-06-15 10:55:13.205400, event: journaled_completion_queued, op: 
osd_op(client.4109.0:1464 rb.0.100a.6b8b4567.524d [set-alloc-hint 
object_size 4194304 write_size 4194304,write 3051520~4096] 0.6778c255 
ack+ondisk+write e9)
 0> 2016-06-15 10:55:13.206559 7fa2dcd85700 -1 os/FileStore.cc: In function 
'unsigned int FileStore::_do_transaction(ObjectStore::Transaction&, uint64_t, 
int, ThreadPool::TPHandle*)' thread 7fa2dcd85700 time 2016-06-15 10:55:13.205018
os/FileStore.cc: 2761: FAILED assert(0 == "unexpected error")

 ceph version 0.94.6 (e832001feaf8c176593e0325c8298e3f16dfb403)
 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x78) 
[0xacd718]
 2: (FileStore::_do_transaction(ObjectStore::Transaction&, unsigned long, int, 
ThreadPool::TPHandle*)+0xa24) [0x8b8114]
 3: (FileStore::_do_transactions(std::list >&, unsigned long, 
ThreadPool::TPHandle*)+0x64) [0x8bcf34]
 4: (FileStore::_do_op(FileStore::OpSequencer*, ThreadPool::TPHandle&)+0x17e) 
[0x8bd0ce]
 5: (ThreadPool::worker(ThreadPool::WorkThread*)+0xa56) [0xabe326]
 6: (ThreadPool::WorkThread::entry()+0x10) [0xabf3d0]
 7: (()+0x7dc5) [0x7fa2e88f3dc5]
 8: (clone()+0x6d) [0x7fa2e73d528d]


On Wed, Jun 15, 2016 at 2:05 PM, Somnath Roy 
mailto:somnath@sandisk.com>> wrote:
There should be a line in the log specifying which assert is failing , post 
that along with say 10 lines from top of that..

Thanks & Regards
Somnath

From: ceph-users 
[mailto:ceph-users-boun...@lists.ceph.com]
 On Behalf Of Mansour Shafaei Moghaddam
Sent: Wednesday, June 15, 2016 1:57 PM
To: ceph-users@lists.ceph.com
Subject: [ceph-users] Fio randwrite does not work on Centos 7.2 VM

Hi All,

Has anyone faced a similar issue? I do not have a problem with random read, 
sequential read, and sequential writes though. Everytime I try running fio for 
random writes, one osd in the cluster crashes. Here is the what I see at the 
tail of the log:

 ceph version 0.94.6 (e832001feaf8c176593e0325c8298e3f16dfb403)
 1: ceph-osd() [0x9d6334]
 2: (()+0xf100) [0x7fa2e88fb100]
 3: (gsignal()+0x37) [0x7fa2e73145f7]
 4: (abort()+0x148) [0x7fa2e7315ce8]
 5: (__gnu_cxx::__verbose_terminate_handler()+0x165) [0x7fa2e7c189d5]
 6: (()+0x5e946) [0x7fa2e7c16946]
 7: (()+0x5e973) [0x7fa2e7c16973]
 8: (()+0x5eb93) [0x7fa2e7c16b93]
 9: (ceph::__ceph_assert_fail(char const*, char const*, int, char 
const*)+0x24a) [0xacd8ea]
 10: (FileStore::_do_transaction(ObjectStore::Transaction&, unsigned long, int, 
ThreadPool::TPHandle*)+0xa24) [0x8b8114]
 11: (FileStore::_do_transactions(std::list >&, unsigned long, 
ThreadPool::TPHandle*)+0x64) [0x8bcf34]
 12: (FileStore::_do_op(FileStore::OpSequencer*, ThreadPool::TPHandle&)+0x17e) 
[0x8bd0ce]
 13: (ThreadPool::worker(ThreadPool::WorkThread*)+0xa56) [0xabe326]
 14: (ThreadPool::WorkThread::entry()+0x10) [0xabf3d0]
 15: (()+0x7dc5) [0x7fa2e88f3dc5]
 16: (clone()+0x6d) [0x7fa2e73d528d]


PLEASE NOTE: The information contained in this electronic mail message is 
intended only for the use of the designated recipient(s) named above. If the 
reader of this message is not the inten

[ceph-users] CephFS Bug found with CentOS 7.2

2016-06-15 Thread Jason Gress

While trying to use CephFS as a clustered filesystem, we stumbled upon a 
reproducible bug that is unfortunately pretty serious, as it leads to data 
loss.  Here is the situation:

We have two systems, named ftp01 and ftp02.  They are both running CentOS 7.2, 
with this kernel release and ceph packages:

kernel-3.10.0-327.18.2.el7.x86_64

[root@ftp01 cron]# rpm -qa | grep ceph
ceph-base-10.2.1-0.el7.x86_64
ceph-deploy-1.5.33-0.noarch
ceph-mon-10.2.1-0.el7.x86_64
libcephfs1-10.2.1-0.el7.x86_64
ceph-selinux-10.2.1-0.el7.x86_64
ceph-mds-10.2.1-0.el7.x86_64
ceph-common-10.2.1-0.el7.x86_64
ceph-10.2.1-0.el7.x86_64
python-cephfs-10.2.1-0.el7.x86_64
ceph-osd-10.2.1-0.el7.x86_64

Mounted like so:
XX.XX.XX.XX:/ftp/cron /var/spool/cron ceph 
_netdev,relatime,name=ftp01,secretfile=/etc/ceph/ftp01.secret 0 0
And:
XX.XX.XX.XX:/ftp/cron /var/spool/cron ceph 
_netdev,relatime,name=ftp02,secretfile=/etc/ceph/ftp02.secret 0 0

This filesystem has 234GB worth of data on it, and I created another 
subdirectory and mounted it, NFS style.

Here were the steps to reproduce:

First, I created a file (I was mounting /var/spool/cron on two systems) on 
ftp01:
(crond is not running right now on either system to keep the variables down)

[root@ftp01 cron]# cp /tmp/root .

Shows up on both fine:
[root@ftp01 cron]# ls -la
total 2
drwx--   1 root root0 Jun 15 15:50 .
drwxr-xr-x. 10 root root  104 May 19 09:34 ..
-rw---   1 root root 2043 Jun 15 15:50 root
[root@ftp01 cron]# md5sum root
0636c8deaeadfea7b9ddaa29652b43ae  root

[root@ftp02 cron]# ls -la
total 2
drwx--   1 root root 2043 Jun 15 15:50 .
drwxr-xr-x. 10 root root  104 May 19 09:34 ..
-rw---   1 root root 2043 Jun 15 15:50 root
[root@ftp02 cron]# md5sum root
0636c8deaeadfea7b9ddaa29652b43ae  root

Now, I vim the file on one of them:
[root@ftp01 cron]# vim root
[root@ftp01 cron]# ls -la
total 2
drwx--   1 root root0 Jun 15 15:51 .
drwxr-xr-x. 10 root root  104 May 19 09:34 ..
-rw---   1 root root 2044 Jun 15 15:50 root
[root@ftp01 cron]# md5sum root
7a0c346bbd2b61c5fe990bb277c00917  root

[root@ftp02 cron]# md5sum root
7a0c346bbd2b61c5fe990bb277c00917  root

So far so good, right?  Then, a few seconds later:

[root@ftp02 cron]# ls -la
total 0
drwx--   1 root root   0 Jun 15 15:51 .
drwxr-xr-x. 10 root root 104 May 19 09:34 ..
-rw---   1 root root   0 Jun 15 15:50 root
[root@ftp02 cron]# cat root
[root@ftp02 cron]# md5sum root
d41d8cd98f00b204e9800998ecf8427e  root

And on ftp01:

[root@ftp01 cron]# ls -la
total 2
drwx--   1 root root0 Jun 15 15:51 .
drwxr-xr-x. 10 root root  104 May 19 09:34 ..
-rw---   1 root root 2044 Jun 15 15:50 root
[root@ftp01 cron]# md5sum root
7a0c346bbd2b61c5fe990bb277c00917  root

I later create a 'root2' on ftp02 and cause a similar issue.  The end results 
are two non-matching files:

[root@ftp01 cron]# ls -la
total 2
drwx--   1 root root0 Jun 15 15:53 .
drwxr-xr-x. 10 root root  104 May 19 09:34 ..
-rw---   1 root root 2044 Jun 15 15:50 root
-rw-r--r--   1 root root0 Jun 15 15:53 root2

[root@ftp02 cron]# ls -la
total 2
drwx--   1 root root0 Jun 15 15:53 .
drwxr-xr-x. 10 root root  104 May 19 09:34 ..
-rw---   1 root root0 Jun 15 15:50 root
-rw-r--r--   1 root root 1503 Jun 15 15:53 root2

We were able to reproduce this on two other systems with the same cephfs 
filesystem.  I have also seen cases where the file would just blank out on both 
as well.

We could not reproduce it with our dev/test cluster running the development 
ceph version:

ceph-10.2.2-1.g502540f.el7.x86_64

Is this a known bug with the current production Jewel release?  If so, will it 
be patched in the next release?

Thank you very much,

Jason Gress



"This message and any attachments may contain confidential information. If you
have received this  message in error, any use or distribution is prohibited. 
Please notify us by reply e-mail if you have mistakenly received this message,
and immediately and permanently delete it and any attachments. Thank you."___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Switches and latency

2016-06-15 Thread Nick Fisk

> -Original Message-
> From: Gandalf Corvotempesta [mailto:gandalf.corvotempe...@gmail.com]
> Sent: 15 June 2016 22:13
> To: n...@fisk.me.uk
> Cc: ceph-us...@ceph.com
> Subject: Re: [ceph-users] Switches and latency
> 
> 2016-06-15 22:59 GMT+02:00 Nick Fisk :
> > Possibly, but by how much? 20GB of bandwidth is a lot to feed 12x7.2k
> disks, particularly if they start doing any sort of non-sequential IO.
> 
> Assuming 100MB/s for each SATA disk, 12 disks are 1200MB/s = 9600mbit/s
> Why are you talking about 20Gb/s ? By using VLANs on the same port for
> both public and cluster traffic, i'll have 10Gb/s to share, but all disks can
> saturate the whole nic (9600mbit/s on a 1mbit/s network)

So this is probably a very optimistic figure, any sort of non 4MB sequential 
workload will rapidly decrease this number, are you planning on using SSD 
journals, this will impact the possible bandwidth you will achieve.

I was assuming each node has 2 Nic's in a Bond going to separate switch. You 
get 20Gb/s of bandwidth and redundancy.

> 
> I can't aggregate 2 ports, or I have to buy stackable switches with support 
> for
> LAG across both switches, much more expansive.
> And obviously I can't use only one switch. Network must be fault tollerance.

As above, check out the linux bonding options. ALB mode gives both RX and TX 
load balancing, although I think it may have some weird fringe cases you need 
to test before going live with it.

> 
> > I think you want to try and keep it simple as possible and make the right
> decision 1st time round. Buy a TOR switch that will accommodate the number
> of servers you wish to put in your rack and you should never have a need to
> change it.
> >
> > I think there are issues when one of networks is down and not the other,
> so stick to keeping each server terminating into the same switch for all its
> connections, otherwise you are just inviting trouble to happen.
> 
> This is not good. a network could fail. In a HA cluster, network failure must 
> be
> taken in consideration.
> What I would like to do is to unplug cable from switch 1 and plug to switch 
> 2. a
> couple of seconds max. (obviously switch2 will be temporary connected to
> switch1)

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] CephFS Bug found with CentOS 7.2

2016-06-15 Thread John Spray

On Wed, Jun 15, 2016 at 10:21 PM, Jason Gress  wrote:
> While trying to use CephFS as a clustered filesystem, we stumbled upon a
> reproducible bug that is unfortunately pretty serious, as it leads to data
> loss.  Here is the situation:
>
> We have two systems, named ftp01 and ftp02.  They are both running CentOS
> 7.2, with this kernel release and ceph packages:
>
> kernel-3.10.0-327.18.2.el7.x86_64

That is an old-ish kernel to be using with cephfs.  It may well be the
source of your issues.

> [root@ftp01 cron]# rpm -qa | grep ceph
> ceph-base-10.2.1-0.el7.x86_64
> ceph-deploy-1.5.33-0.noarch
> ceph-mon-10.2.1-0.el7.x86_64
> libcephfs1-10.2.1-0.el7.x86_64
> ceph-selinux-10.2.1-0.el7.x86_64
> ceph-mds-10.2.1-0.el7.x86_64
> ceph-common-10.2.1-0.el7.x86_64
> ceph-10.2.1-0.el7.x86_64
> python-cephfs-10.2.1-0.el7.x86_64
> ceph-osd-10.2.1-0.el7.x86_64
>
> Mounted like so:
> XX.XX.XX.XX:/ftp/cron /var/spool/cron ceph
> _netdev,relatime,name=ftp01,secretfile=/etc/ceph/ftp01.secret 0 0
> And:
> XX.XX.XX.XX:/ftp/cron /var/spool/cron ceph
> _netdev,relatime,name=ftp02,secretfile=/etc/ceph/ftp02.secret 0 0
>
> This filesystem has 234GB worth of data on it, and I created another
> subdirectory and mounted it, NFS style.
>
> Here were the steps to reproduce:
>
> First, I created a file (I was mounting /var/spool/cron on two systems) on
> ftp01:
> (crond is not running right now on either system to keep the variables down)
>
> [root@ftp01 cron]# cp /tmp/root .
>
> Shows up on both fine:
> [root@ftp01 cron]# ls -la
> total 2
> drwx--   1 root root0 Jun 15 15:50 .
> drwxr-xr-x. 10 root root  104 May 19 09:34 ..
> -rw---   1 root root 2043 Jun 15 15:50 root
> [root@ftp01 cron]# md5sum root
> 0636c8deaeadfea7b9ddaa29652b43ae  root
>
> [root@ftp02 cron]# ls -la
> total 2
> drwx--   1 root root 2043 Jun 15 15:50 .
> drwxr-xr-x. 10 root root  104 May 19 09:34 ..
> -rw---   1 root root 2043 Jun 15 15:50 root
> [root@ftp02 cron]# md5sum root
> 0636c8deaeadfea7b9ddaa29652b43ae  root
>
> Now, I vim the file on one of them:
> [root@ftp01 cron]# vim root
> [root@ftp01 cron]# ls -la
> total 2
> drwx--   1 root root0 Jun 15 15:51 .
> drwxr-xr-x. 10 root root  104 May 19 09:34 ..
> -rw---   1 root root 2044 Jun 15 15:50 root
> [root@ftp01 cron]# md5sum root
> 7a0c346bbd2b61c5fe990bb277c00917  root
>
> [root@ftp02 cron]# md5sum root
> 7a0c346bbd2b61c5fe990bb277c00917  root
>
> So far so good, right?  Then, a few seconds later:
>
> [root@ftp02 cron]# ls -la
> total 0
> drwx--   1 root root   0 Jun 15 15:51 .
> drwxr-xr-x. 10 root root 104 May 19 09:34 ..
> -rw---   1 root root   0 Jun 15 15:50 root
> [root@ftp02 cron]# cat root
> [root@ftp02 cron]# md5sum root
> d41d8cd98f00b204e9800998ecf8427e  root
>
> And on ftp01:
>
> [root@ftp01 cron]# ls -la
> total 2
> drwx--   1 root root0 Jun 15 15:51 .
> drwxr-xr-x. 10 root root  104 May 19 09:34 ..
> -rw---   1 root root 2044 Jun 15 15:50 root
> [root@ftp01 cron]# md5sum root
> 7a0c346bbd2b61c5fe990bb277c00917  root
>
> I later create a 'root2' on ftp02 and cause a similar issue.  The end
> results are two non-matching files:
>
> [root@ftp01 cron]# ls -la
> total 2
> drwx--   1 root root0 Jun 15 15:53 .
> drwxr-xr-x. 10 root root  104 May 19 09:34 ..
> -rw---   1 root root 2044 Jun 15 15:50 root
> -rw-r--r--   1 root root0 Jun 15 15:53 root2
>
> [root@ftp02 cron]# ls -la
> total 2
> drwx--   1 root root0 Jun 15 15:53 .
> drwxr-xr-x. 10 root root  104 May 19 09:34 ..
> -rw---   1 root root0 Jun 15 15:50 root
> -rw-r--r--   1 root root 1503 Jun 15 15:53 root2
>
> We were able to reproduce this on two other systems with the same cephfs
> filesystem.  I have also seen cases where the file would just blank out on
> both as well.
>
> We could not reproduce it with our dev/test cluster running the development
> ceph version:
>
> ceph-10.2.2-1.g502540f.el7.x86_64

Strange.  In that cluster, was the same 3.x kernel in use?  There
aren't a whole lot of changes on the server side in v10.2.2 that I
could imagine affecting this case.

The best thing to do right now is to try using ceph-fuse in your
production environment, to check that it is not exhibiting the same
behaviour as the old kernel client.  Once you confirm that, I would
recommend upgrading your kernel to the most recent 4.x that you are
comfortable with, and confirm that that also does not exhibit the bad
behaviour.

John

> Is this a known bug with the current production Jewel release?  If so, will
> it be patched in the next release?
>
> Thank you very much,
>
> Jason Gress
>
> "This message and any attachments may contain confidential information. If
> you
> have received this  message in error, any use or distribution is prohibited.
> Please notify us by reply e-mail if you have mistakenly received this
> message,
> and immediately and permanently delete it and any attachments. Thank you."
>
>
> ___
> ceph-users

Re: [ceph-users] CephFS Bug found with CentOS 7.2

2016-06-15 Thread Oliver Dzombic

Hi,

i have identical setup, except that i run 10.2.2 now.

I can not reproduce that.

-- 
Mit freundlichen Gruessen / Best regards

Oliver Dzombic
IP-Interactive

mailto:i...@ip-interactive.de

Anschrift:

IP Interactive UG ( haftungsbeschraenkt )
Zum Sonnenberg 1-3
63571 Gelnhausen

HRB 93402 beim Amtsgericht Hanau
Geschäftsführung: Oliver Dzombic

Steuer Nr.: 35 236 3622 1
UST ID: DE274086107


Am 15.06.2016 um 23:21 schrieb Jason Gress:
> While trying to use CephFS as a clustered filesystem, we stumbled upon a
> reproducible bug that is unfortunately pretty serious, as it leads to
> data loss.  Here is the situation:
> 
> We have two systems, named ftp01 and ftp02.  They are both running
> CentOS 7.2, with this kernel release and ceph packages:
> 
> kernel-3.10.0-327.18.2.el7.x86_64
> 
> [root@ftp01 cron]# rpm -qa | grep ceph
> ceph-base-10.2.1-0.el7.x86_64
> ceph-deploy-1.5.33-0.noarch
> ceph-mon-10.2.1-0.el7.x86_64
> libcephfs1-10.2.1-0.el7.x86_64
> ceph-selinux-10.2.1-0.el7.x86_64
> ceph-mds-10.2.1-0.el7.x86_64
> ceph-common-10.2.1-0.el7.x86_64
> ceph-10.2.1-0.el7.x86_64
> python-cephfs-10.2.1-0.el7.x86_64
> ceph-osd-10.2.1-0.el7.x86_64
> 
> Mounted like so:
> XX.XX.XX.XX:/ftp/cron /var/spool/cron ceph
> _netdev,relatime,name=ftp01,secretfile=/etc/ceph/ftp01.secret 0 0
> And:
> XX.XX.XX.XX:/ftp/cron /var/spool/cron ceph
> _netdev,relatime,name=ftp02,secretfile=/etc/ceph/ftp02.secret 0 0
> 
> This filesystem has 234GB worth of data on it, and I created another
> subdirectory and mounted it, NFS style.
> 
> Here were the steps to reproduce:
> 
> First, I created a file (I was mounting /var/spool/cron on two systems)
> on ftp01:
> (crond is not running right now on either system to keep the variables down)
> 
> [root@ftp01 cron]# cp /tmp/root .
> 
> Shows up on both fine:
> [root@ftp01 cron]# ls -la
> total 2
> drwx--   1 root root0 Jun 15 15:50 .
> drwxr-xr-x. 10 root root  104 May 19 09:34 ..
> -rw---   1 root root 2043 Jun 15 15:50 root
> [root@ftp01 cron]# md5sum root
> 0636c8deaeadfea7b9ddaa29652b43ae  root
> 
> [root@ftp02 cron]# ls -la
> total 2
> drwx--   1 root root 2043 Jun 15 15:50 .
> drwxr-xr-x. 10 root root  104 May 19 09:34 ..
> -rw---   1 root root 2043 Jun 15 15:50 root
> [root@ftp02 cron]# md5sum root
> 0636c8deaeadfea7b9ddaa29652b43ae  root
> 
> Now, I vim the file on one of them:
> [root@ftp01 cron]# vim root
> [root@ftp01 cron]# ls -la
> total 2
> drwx--   1 root root0 Jun 15 15:51 .
> drwxr-xr-x. 10 root root  104 May 19 09:34 ..
> -rw---   1 root root 2044 Jun 15 15:50 root
> [root@ftp01 cron]# md5sum root
> 7a0c346bbd2b61c5fe990bb277c00917  root
> 
> [root@ftp02 cron]# md5sum root
> 7a0c346bbd2b61c5fe990bb277c00917  root
> 
> So far so good, right?  Then, a few seconds later:
> 
> [root@ftp02 cron]# ls -la
> total 0
> drwx--   1 root root   0 Jun 15 15:51 .
> drwxr-xr-x. 10 root root 104 May 19 09:34 ..
> -rw---   1 root root   0 Jun 15 15:50 root
> [root@ftp02 cron]# cat root
> [root@ftp02 cron]# md5sum root
> d41d8cd98f00b204e9800998ecf8427e  root
> 
> And on ftp01:
> 
> [root@ftp01 cron]# ls -la
> total 2
> drwx--   1 root root0 Jun 15 15:51 .
> drwxr-xr-x. 10 root root  104 May 19 09:34 ..
> -rw---   1 root root 2044 Jun 15 15:50 root
> [root@ftp01 cron]# md5sum root
> 7a0c346bbd2b61c5fe990bb277c00917  root
> 
> I later create a 'root2' on ftp02 and cause a similar issue.  The end
> results are two non-matching files:
> 
> [root@ftp01 cron]# ls -la
> total 2
> drwx--   1 root root0 Jun 15 15:53 .
> drwxr-xr-x. 10 root root  104 May 19 09:34 ..
> -rw---   1 root root 2044 Jun 15 15:50 root
> -rw-r--r--   1 root root0 Jun 15 15:53 root2
> 
> [root@ftp02 cron]# ls -la
> total 2
> drwx--   1 root root0 Jun 15 15:53 .
> drwxr-xr-x. 10 root root  104 May 19 09:34 ..
> -rw---   1 root root0 Jun 15 15:50 root
> -rw-r--r--   1 root root 1503 Jun 15 15:53 root2
> 
> We were able to reproduce this on two other systems with the same cephfs
> filesystem.  I have also seen cases where the file would just blank out
> on both as well.
> 
> We could not reproduce it with our dev/test cluster running the
> development ceph version:
> 
> ceph-10.2.2-1.g502540f.el7.x86_64
> 
> Is this a known bug with the current production Jewel release?  If so,
> will it be patched in the next release?
> 
> Thank you very much,
> 
> Jason Gress
> 
> "This message and any attachments may contain confidential information. If you
> have received this  message in error, any use or distribution is prohibited. 
> Please notify us by reply e-mail if you have mistakenly received this message,
> and immediately and permanently delete it and any attachments. Thank you."
> 
> 
> 
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 
___
ceph-users mailing list
ceph-users@lists.ceph.com

Re: [ceph-users] Ceph osd too full

2016-06-15 Thread Kostis Fardelas

Hi Hauke,
you could increase the mon/osd full/near full ratios but at this level
of disk space scarcity, things may need your constant attention
especially in case of failure given the risk of closing down the
cluster IO. Modifying crush weights may be of use too.

Regards,
Kostis

On 15 June 2016 at 21:13, Hauke Homburg  wrote:
>
> -BEGIN PGP SIGNED MESSAGE-
> Hash: SHA1
>
> Hello,
>
> I have a ceph jewel Cluster with 5 Server and 40 OSD.
> The Cluster is very full, but at this Moment i cannot use 10 Percent of
> the Volume because the ceph health health says some Harddisks are too
> full. They are between 75 and 95 Percent full. A
>
> ceph osd reweight-by-utilization doesn't help. How can i Fill the osd
> constantly to use the maximun space?
>
> Regards
>
> Hauke
>
> - --
> www.w3-creative.de
>
> www.westchat.de
> -BEGIN PGP SIGNATURE-
> Version: GnuPG v2.0.19 (GNU/Linux)
>
> iQIcBAEBAgAGBQJXYZrdAAoJEEIVizQb/Y0mrM0P/0YqB5Zb69I/HDblqfSmg+26
> 1Io5j/vTz9gs5orHEvvU6wNZiEVnh8jfeczzxMaNQ+zW4MGED/ahrpZoHnJ5xEbb
> a4xqpvrZdFYFYrhgrFDEQEo3cqC3L5E4VjR4aBp77WjH/Q7G9v62IHrNM0uU7Yfg
> RKw7/zxHmZQBWek5Co7AtRmzZdjS7RelaVyEHQ7Vu2nO1aZUNYvjgUvVCHdos/TG
> F3yiwFcXEk7H6EHyHs6dUoTgm0OOVw/MjOD7kLtM/uModEZoxQT5uuvod6iHZ5nE
> eNkV/ipcTbUaDdkBbpBKhfNjsoyYLetNblEWbmrWw8bmorjq0CmtKT229cBrNZW8
> bdPbrbG6/TCkydVm0KHEgU97FsIPI6yqJxSCnsFEBNFjYVvBlysqK1awXHK+tTjV
> v3arQFFEIRC8salEoIWaGx97M3S/HuqcTV3zlZ+OrfXblrB5h3YJTonnxyi4Z1c7
> 7imsMneNAYhlVcZtcWxNxKB8/wu0sX8yvjkwYMh1bIF3H/pt0JhoyJsWvEcKgEbH
> s37nJ6I3hFZc9okefLK6uz9zIkZ1CLzYdTSnZS0pIDufHZVvuJe3nN1PSOAZ24JI
> H4eV5INWS81f0EzOfUXRkfq86uDEtNIpLa3J+CHuYcnNYOc3TA/vBTB3QOBXaIcF
> tT3jp+p3+DiDmvuynICc
> =AU62
> -END PGP SIGNATURE-
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Fio randwrite does not work on Centos 7.2 VM

2016-06-15 Thread Mansour Shafaei Moghaddam

Thanks a lot. Got solved.

On Wed, Jun 15, 2016 at 2:12 PM, Samuel Just  wrote:

> I think you hit the os process fd limit.  You need to adjust it.
> -Sam
>
> On Wed, Jun 15, 2016 at 2:07 PM, Mansour Shafaei Moghaddam
>  wrote:
> > It fails at "FileStore.cc: 2761". Here is a more complete log:
> >
> > -9> 2016-06-15 10:55:13.205014 7fa2dcd85700 -1 dump_open_fds unable
> to
> > open /proc/self/fd
> > -8> 2016-06-15 10:55:13.205085 7fa2cb402700  2
> > filestore(/var/lib/ceph/osd/ceph-0) waiting 51 > 50 ops || 328390 >
> > 104857600
> > -7> 2016-06-15 10:55:13.205094 7fa2cd406700  2
> > filestore(/var/lib/ceph/osd/ceph-0) waiting 51 > 50 ops || 328389 >
> > 104857600
> > -6> 2016-06-15 10:55:13.205111 7fa2cac01700  2
> > filestore(/var/lib/ceph/osd/ceph-0) waiting 51 > 50 ops || 328317 >
> > 104857600
> > -5> 2016-06-15 10:55:13.205118 7fa2ca400700  2
> > filestore(/var/lib/ceph/osd/ceph-0) waiting 51 > 50 ops || 328390 >
> > 104857600
> > -4> 2016-06-15 10:55:13.205121 7fa2cdc07700  2
> > filestore(/var/lib/ceph/osd/ceph-0) waiting 51 > 50 ops || 328390 >
> > 104857600
> > -3> 2016-06-15 10:55:13.205153 7fa2de588700  5 -- op tracker -- seq:
> > 1476, time: 2016-06-15 10:55:13.205153, event:
> journaled_completion_queued,
> > op: osd_op(client.4109.0:1457 rb.0.100a.6b8b4567.6b6c
> > [set-alloc-hint object_size 4194304 write_size 4194304,write
> 1884160~4096]
> > 0.cbe1d8a4 ack+ondisk+write e9)
> > -2> 2016-06-15 10:55:13.205183 7fa2de588700  5 -- op tracker -- seq:
> > 1483, time: 2016-06-15 10:55:13.205183, event:
> > write_thread_in_journal_buffer, op: osd_op(client.4109.0:1464
> > rb.0.100a.6b8b4567.524d [set-alloc-hint object_size 4194304
> > write_size 4194304,write 3051520~4096] 0.6778c255 ack+ondisk+write e9)
> > -1> 2016-06-15 10:55:13.205400 7fa2de588700  5 -- op tracker -- seq:
> > 1483, time: 2016-06-15 10:55:13.205400, event:
> journaled_completion_queued,
> > op: osd_op(client.4109.0:1464 rb.0.100a.6b8b4567.524d
> > [set-alloc-hint object_size 4194304 write_size 4194304,write
> 3051520~4096]
> > 0.6778c255 ack+ondisk+write e9)
> >  0> 2016-06-15 10:55:13.206559 7fa2dcd85700 -1 os/FileStore.cc: In
> > function 'unsigned int
> FileStore::_do_transaction(ObjectStore::Transaction&,
> > uint64_t, int, ThreadPool::TPHandle*)' thread 7fa2dcd85700 time
> 2016-06-15
> > 10:55:13.205018
> > os/FileStore.cc: 2761: FAILED assert(0 == "unexpected error")
> >
> >  ceph version 0.94.6 (e832001feaf8c176593e0325c8298e3f16dfb403)
> >  1: (ceph::__ceph_assert_fail(char const*, char const*, int, char
> > const*)+0x78) [0xacd718]
> >  2: (FileStore::_do_transaction(ObjectStore::Transaction&, unsigned long,
> > int, ThreadPool::TPHandle*)+0xa24) [0x8b8114]
> >  3: (FileStore::_do_transactions(std::list > std::allocator >&, unsigned long,
> > ThreadPool::TPHandle*)+0x64) [0x8bcf34]
> >  4: (FileStore::_do_op(FileStore::OpSequencer*,
> > ThreadPool::TPHandle&)+0x17e) [0x8bd0ce]
> >  5: (ThreadPool::worker(ThreadPool::WorkThread*)+0xa56) [0xabe326]
> >  6: (ThreadPool::WorkThread::entry()+0x10) [0xabf3d0]
> >  7: (()+0x7dc5) [0x7fa2e88f3dc5]
> >  8: (clone()+0x6d) [0x7fa2e73d528d]
> >
> >
> > On Wed, Jun 15, 2016 at 2:05 PM, Somnath Roy 
> > wrote:
> >>
> >> There should be a line in the log specifying which assert is failing ,
> >> post that along with say 10 lines from top of that..
> >>
> >>
> >>
> >> Thanks & Regards
> >>
> >> Somnath
> >>
> >>
> >>
> >> From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf
> Of
> >> Mansour Shafaei Moghaddam
> >> Sent: Wednesday, June 15, 2016 1:57 PM
> >> To: ceph-users@lists.ceph.com
> >> Subject: [ceph-users] Fio randwrite does not work on Centos 7.2 VM
> >>
> >>
> >>
> >> Hi All,
> >>
> >>
> >>
> >> Has anyone faced a similar issue? I do not have a problem with random
> >> read, sequential read, and sequential writes though. Everytime I try
> running
> >> fio for random writes, one osd in the cluster crashes. Here is the what
> I
> >> see at the tail of the log:
> >>
> >>
> >>
> >>  ceph version 0.94.6 (e832001feaf8c176593e0325c8298e3f16dfb403)
> >>
> >>  1: ceph-osd() [0x9d6334]
> >>
> >>  2: (()+0xf100) [0x7fa2e88fb100]
> >>
> >>  3: (gsignal()+0x37) [0x7fa2e73145f7]
> >>
> >>  4: (abort()+0x148) [0x7fa2e7315ce8]
> >>
> >>  5: (__gnu_cxx::__verbose_terminate_handler()+0x165) [0x7fa2e7c189d5]
> >>
> >>  6: (()+0x5e946) [0x7fa2e7c16946]
> >>
> >>  7: (()+0x5e973) [0x7fa2e7c16973]
> >>
> >>  8: (()+0x5eb93) [0x7fa2e7c16b93]
> >>
> >>  9: (ceph::__ceph_assert_fail(char const*, char const*, int, char
> >> const*)+0x24a) [0xacd8ea]
> >>
> >>  10: (FileStore::_do_transaction(ObjectStore::Transaction&, unsigned
> long,
> >> int, ThreadPool::TPHandle*)+0xa24) [0x8b8114]
> >>
> >>  11: (FileStore::_do_transactions(std::list >> std::allocator >&, unsigned long,
> >> ThreadPool::TPHandle*)+0x64) [0x8bcf34]
> >>
> >>  12: (FileStore::_do_op(FileStore::OpSequencer*,
> >> ThreadPool::TPHandle&)+0x17e) [0x8bd0ce]

Re: [ceph-users] Ceph osd too full

2016-06-15 Thread Christian Balzer



Hello,

what Kostis said, in particular with regard to change crush weights (NOT
re-weight).

Also the output of "ceph -s" if you please, insufficient PGs can make OSD
imbalances worse.

Look at your output of "ceph df detail" and "ceph osd tree".
Find the worst outliers and carefully (a few % at most) adjust their weight
up and down respectively. 
Keep an eye on your host weight (in the tree output), you want your hosts
to stay at the same weight ultimately.

This is the output for one of my storage nodes after all the juggling,
now all OSDs are within 100GB or 2% of each others, but as you can see
some OSDs needed a LOT of nudging (default weight was 5):

ID WEIGHT  REWEIGHT SIZE  USEAVAIL  %USE  VAR  
11 5.0  1.0 5411G   759G  4652G 14.04 0.80 
12 5.0  1.0 5411G   725G  4686G 13.40 0.77 
13 4.7  1.0 5411G   797G  4614G 14.74 0.84 
14 4.7  1.0 5411G   786G  4625G 14.53 0.83 
15 5.5  1.0 5411G   752G  4658G 13.91 0.80 
16 4.7  1.0 5411G   801G  4610G 14.81 0.85 
17 5.2  1.0 5411G   734G  4677G 13.57 0.78 

Christian

On Thu, 16 Jun 2016 00:57:00 +0300 Kostis Fardelas wrote:

> Hi Hauke,
> you could increase the mon/osd full/near full ratios but at this level
> of disk space scarcity, things may need your constant attention
> especially in case of failure given the risk of closing down the
> cluster IO. Modifying crush weights may be of use too.
> 
> Regards,
> Kostis
> 
> On 15 June 2016 at 21:13, Hauke Homburg  wrote:
> >
> > -BEGIN PGP SIGNED MESSAGE-
> > Hash: SHA1
> >
> > Hello,
> >
> > I have a ceph jewel Cluster with 5 Server and 40 OSD.
> > The Cluster is very full, but at this Moment i cannot use 10 Percent of
> > the Volume because the ceph health health says some Harddisks are too
> > full. They are between 75 and 95 Percent full. A
> >
> > ceph osd reweight-by-utilization doesn't help. How can i Fill the osd
> > constantly to use the maximun space?
> >
> > Regards
> >
> > Hauke
> >
> > - --
> > www.w3-creative.de
> >
> > www.westchat.de
> > -BEGIN PGP SIGNATURE-
> > Version: GnuPG v2.0.19 (GNU/Linux)
> >
> > iQIcBAEBAgAGBQJXYZrdAAoJEEIVizQb/Y0mrM0P/0YqB5Zb69I/HDblqfSmg+26
> > 1Io5j/vTz9gs5orHEvvU6wNZiEVnh8jfeczzxMaNQ+zW4MGED/ahrpZoHnJ5xEbb
> > a4xqpvrZdFYFYrhgrFDEQEo3cqC3L5E4VjR4aBp77WjH/Q7G9v62IHrNM0uU7Yfg
> > RKw7/zxHmZQBWek5Co7AtRmzZdjS7RelaVyEHQ7Vu2nO1aZUNYvjgUvVCHdos/TG
> > F3yiwFcXEk7H6EHyHs6dUoTgm0OOVw/MjOD7kLtM/uModEZoxQT5uuvod6iHZ5nE
> > eNkV/ipcTbUaDdkBbpBKhfNjsoyYLetNblEWbmrWw8bmorjq0CmtKT229cBrNZW8
> > bdPbrbG6/TCkydVm0KHEgU97FsIPI6yqJxSCnsFEBNFjYVvBlysqK1awXHK+tTjV
> > v3arQFFEIRC8salEoIWaGx97M3S/HuqcTV3zlZ+OrfXblrB5h3YJTonnxyi4Z1c7
> > 7imsMneNAYhlVcZtcWxNxKB8/wu0sX8yvjkwYMh1bIF3H/pt0JhoyJsWvEcKgEbH
> > s37nJ6I3hFZc9okefLK6uz9zIkZ1CLzYdTSnZS0pIDufHZVvuJe3nN1PSOAZ24JI
> > H4eV5INWS81f0EzOfUXRkfq86uDEtNIpLa3J+CHuYcnNYOc3TA/vBTB3QOBXaIcF
> > tT3jp+p3+DiDmvuynICc
> > =AU62
> > -END PGP SIGNATURE-
> >
> >
> > ___
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 


-- 
Christian BalzerNetwork/Systems Engineer
ch...@gol.com   Global OnLine Japan/Rakuten Communications
http://www.gol.com/
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Switches and latency

2016-06-15 Thread Christian Balzer

Hello,

Gandalf, first read:
https://www.mail-archive.com/ceph-users@lists.ceph.com/msg29546.html

And this thread by Nick:
https://www.mail-archive.com/ceph-users@lists.ceph.com/msg29708.html

More comments inline.

On Wed, 15 Jun 2016 22:26:51 +0100 Nick Fisk wrote:

> > -Original Message-
> > From: Gandalf Corvotempesta [mailto:gandalf.corvotempe...@gmail.com]
> > Sent: 15 June 2016 22:13
> > To: n...@fisk.me.uk
> > Cc: ceph-us...@ceph.com
> > Subject: Re: [ceph-users] Switches and latency
> > 
> > 2016-06-15 22:59 GMT+02:00 Nick Fisk :
> > > Possibly, but by how much? 20GB of bandwidth is a lot to feed 12x7.2k
> > disks, particularly if they start doing any sort of non-sequential IO.
> > 
> > Assuming 100MB/s for each SATA disk, 12 disks are 1200MB/s = 9600mbit/s
> > Why are you talking about 20Gb/s ? By using VLANs on the same port for
> > both public and cluster traffic, i'll have 10Gb/s to share, but all
> > disks can saturate the whole nic (9600mbit/s on a 1mbit/s network)
> 
> So this is probably a very optimistic figure, any sort of non 4MB
> sequential workload will rapidly decrease this number, are you planning
> on using SSD journals, this will impact the possible bandwidth you will
> achieve.
> 
Overly optimistic. 
In an idle cluster with synthetic tests you might get sequential reads
that are around 150MB/s per HDD.
As for writes, think 80MB/s, again in an idle cluster.

Any realistic, random I/O and you're looking at 50MB/s at most either way.

So your storage nodes can't really saturate even a single 10Gb/s link in
real life situations. 

Journal SSDs can improve on things, but that's mostly for IOPS. 
In fact they easily become the bottleneck bandwidth wise and are so on
most of my storage nodes.
Because you'd need at least 2 400GB DC S3710 SSDs to get around 1GB/s
writes, or one link worth.

Splitting things in cluster and public networks ONLY makes sense when your
storage node can saturate ALL the network bandwidth, which usually is only
the case when it comes to very expensive SSD/NVMe only nodes.

Going back to your original post, with a split network the latency in both
networks counts the same, as a client write will NOT be acknowledged until
it has reach the journal of all replicas, so having a higher latency
cluster network is counterproductive.

And again, in real life you'll run out of IOPS long before you run out of
bandwidth, I/O or network wise.

> I was assuming each node has 2 Nic's in a Bond going to separate switch.
> You get 20Gb/s of bandwidth and redundancy.
> 
> > 
> > I can't aggregate 2 ports, or I have to buy stackable switches with
> > support for LAG across both switches, much more expansive.
> > And obviously I can't use only one switch. Network must be fault
> > tollerance.
> 
> As above, check out the linux bonding options. ALB mode gives both RX
> and TX load balancing, although I think it may have some weird fringe
> cases you need to test before going live with it.
> 
Look at alternative MC-LAG capable switches from Penguin, Quanta, etc.
These tend to be half the price of similar offerings from Brocade or Cisco.

Or if you can start with a clean slate (including the clients), look at
Infiniband. 
All my production clusters are running entirely IB (IPoIB currently) and
I'm very happy with the performance, latency and cost.

> > 
> > > I think you want to try and keep it simple as possible and make the
> > > right
> > decision 1st time round. Buy a TOR switch that will accommodate the
> > number of servers you wish to put in your rack and you should never
> > have a need to change it.
> > >
> > > I think there are issues when one of networks is down and not the
> > > other,
> > so stick to keeping each server terminating into the same switch for
> > all its connections, otherwise you are just inviting trouble to happen.
> > 
> > This is not good. a network could fail. In a HA cluster, network
> > failure must be taken in consideration.
> > What I would like to do is to unplug cable from switch 1 and plug to
> > switch 2. a couple of seconds max. (obviously switch2 will be
> > temporary connected to switch1)
> 

You will want 2 switches and 2 ports on each host.
Just use one network or if you feel ambitious, use VLANs for public and
cluster, your choice.

If anyhow possible/affordable, run LACP on your host ports to your MC-LAG
capable switches.

If you can't afford this, spend time (to learn and test) instead of money
on running OSFP equal cost multi-path on your storage nodes and get the
same benefits, fully redundant and load-balanced links.

Lastly, if you can't do either of these, run your things in ALB (may not
work) or simple fail-over mode. 10Gb/s is going to be fast enough in
nearly all situations you'll encounter with these storage nodes.

Christian
-- 
Christian BalzerNetwork/Systems Engineer
ch...@gol.com   Global OnLine Japan/Rakuten Communications
http://www.gol.com/
__

Re: [ceph-users] CephFS Bug found with CentOS 7.2

2016-06-15 Thread Yan, Zheng

On Thu, Jun 16, 2016 at 5:21 AM, Jason Gress  wrote:
> While trying to use CephFS as a clustered filesystem, we stumbled upon a
> reproducible bug that is unfortunately pretty serious, as it leads to data
> loss.  Here is the situation:
>
> We have two systems, named ftp01 and ftp02.  They are both running CentOS
> 7.2, with this kernel release and ceph packages:
>
> kernel-3.10.0-327.18.2.el7.x86_64
>
> [root@ftp01 cron]# rpm -qa | grep ceph
> ceph-base-10.2.1-0.el7.x86_64
> ceph-deploy-1.5.33-0.noarch
> ceph-mon-10.2.1-0.el7.x86_64
> libcephfs1-10.2.1-0.el7.x86_64
> ceph-selinux-10.2.1-0.el7.x86_64
> ceph-mds-10.2.1-0.el7.x86_64
> ceph-common-10.2.1-0.el7.x86_64
> ceph-10.2.1-0.el7.x86_64
> python-cephfs-10.2.1-0.el7.x86_64
> ceph-osd-10.2.1-0.el7.x86_64
>
> Mounted like so:
> XX.XX.XX.XX:/ftp/cron /var/spool/cron ceph
> _netdev,relatime,name=ftp01,secretfile=/etc/ceph/ftp01.secret 0 0
> And:
> XX.XX.XX.XX:/ftp/cron /var/spool/cron ceph
> _netdev,relatime,name=ftp02,secretfile=/etc/ceph/ftp02.secret 0 0
>
> This filesystem has 234GB worth of data on it, and I created another
> subdirectory and mounted it, NFS style.
>
> Here were the steps to reproduce:
>
> First, I created a file (I was mounting /var/spool/cron on two systems) on
> ftp01:
> (crond is not running right now on either system to keep the variables down)
>
> [root@ftp01 cron]# cp /tmp/root .
>
> Shows up on both fine:
> [root@ftp01 cron]# ls -la
> total 2
> drwx--   1 root root0 Jun 15 15:50 .
> drwxr-xr-x. 10 root root  104 May 19 09:34 ..
> -rw---   1 root root 2043 Jun 15 15:50 root
> [root@ftp01 cron]# md5sum root
> 0636c8deaeadfea7b9ddaa29652b43ae  root
>
> [root@ftp02 cron]# ls -la
> total 2
> drwx--   1 root root 2043 Jun 15 15:50 .
> drwxr-xr-x. 10 root root  104 May 19 09:34 ..
> -rw---   1 root root 2043 Jun 15 15:50 root
> [root@ftp02 cron]# md5sum root
> 0636c8deaeadfea7b9ddaa29652b43ae  root
>
> Now, I vim the file on one of them:
> [root@ftp01 cron]# vim root
> [root@ftp01 cron]# ls -la
> total 2
> drwx--   1 root root0 Jun 15 15:51 .
> drwxr-xr-x. 10 root root  104 May 19 09:34 ..
> -rw---   1 root root 2044 Jun 15 15:50 root
> [root@ftp01 cron]# md5sum root
> 7a0c346bbd2b61c5fe990bb277c00917  root
>
> [root@ftp02 cron]# md5sum root
> 7a0c346bbd2b61c5fe990bb277c00917  root
>
> So far so good, right?  Then, a few seconds later:
>
> [root@ftp02 cron]# ls -la
> total 0
> drwx--   1 root root   0 Jun 15 15:51 .
> drwxr-xr-x. 10 root root 104 May 19 09:34 ..
> -rw---   1 root root   0 Jun 15 15:50 root
> [root@ftp02 cron]# cat root
> [root@ftp02 cron]# md5sum root
> d41d8cd98f00b204e9800998ecf8427e  root

please enable mds debugging (add "debug mds = 20" to ceph.conf) and
kernel dynamic debug (echo module ceph +p > echo module ceph +p
>/sys/kernel/debug/dynamic_debug/control). repeat these steps and send
both logs to us.

Regards
Yan, Zheng


>
> And on ftp01:
>
> [root@ftp01 cron]# ls -la
> total 2
> drwx--   1 root root0 Jun 15 15:51 .
> drwxr-xr-x. 10 root root  104 May 19 09:34 ..
> -rw---   1 root root 2044 Jun 15 15:50 root
> [root@ftp01 cron]# md5sum root
> 7a0c346bbd2b61c5fe990bb277c00917  root
>
> I later create a 'root2' on ftp02 and cause a similar issue.  The end
> results are two non-matching files:
>
> [root@ftp01 cron]# ls -la
> total 2
> drwx--   1 root root0 Jun 15 15:53 .
> drwxr-xr-x. 10 root root  104 May 19 09:34 ..
> -rw---   1 root root 2044 Jun 15 15:50 root
> -rw-r--r--   1 root root0 Jun 15 15:53 root2
>
> [root@ftp02 cron]# ls -la
> total 2
> drwx--   1 root root0 Jun 15 15:53 .
> drwxr-xr-x. 10 root root  104 May 19 09:34 ..
> -rw---   1 root root0 Jun 15 15:50 root
> -rw-r--r--   1 root root 1503 Jun 15 15:53 root2
>
> We were able to reproduce this on two other systems with the same cephfs
> filesystem.  I have also seen cases where the file would just blank out on
> both as well.
>
> We could not reproduce it with our dev/test cluster running the development
> ceph version:
>
> ceph-10.2.2-1.g502540f.el7.x86_64
>
> Is this a known bug with the current production Jewel release?  If so, will
> it be patched in the next release?
>
> Thank you very much,
>
> Jason Gress
>
> "This message and any attachments may contain confidential information. If
> you
> have received this  message in error, any use or distribution is prohibited.
> Please notify us by reply e-mail if you have mistakenly received this
> message,
> and immediately and permanently delete it and any attachments. Thank you."
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Is Dynamic Cache tiering supported in Jewel

2016-06-15 Thread Venkata Manojawa Paritala

Thank Oliver. Will check the links and get back.

Best Regards,
Manoj

On Wed, Jun 15, 2016 at 11:14 PM, Oliver Dzombic 
wrote:

> Hi,
>
> yes thats no problem.
>
> In addition to what christian told you, these two links are helpful to
> understand the stuff:
>
>
> https://software.intel.com/en-us/blogs/2015/03/03/ceph-cache-tiering-introduction
>
> http://docs.ceph.com/docs/jewel/rados/operations/cache-tiering/
>
>
> http://docs.ceph.com/docs/jewel/rados/operations/crush-map/#placing-different-pools-on-different-osds
>
> Good luck !
>
> --
> Mit freundlichen Gruessen / Best regards
>
> Oliver Dzombic
> IP-Interactive
>
> mailto:i...@ip-interactive.de
>
> Anschrift:
>
> IP Interactive UG ( haftungsbeschraenkt )
> Zum Sonnenberg 1-3
> 63571 Gelnhausen
>
> HRB 93402 beim Amtsgericht Hanau
> Geschäftsführung: Oliver Dzombic
>
> Steuer Nr.: 35 236 3622 1
> UST ID: DE274086107
>
>
> Am 15.06.2016 um 15:09 schrieb Venkata Manojawa Paritala:
> > Hi,
> >
> > We are working to try cache tiering in Ceph and would like to know if
> > this can be attempted dynamically - basically add cache pool to another
> > pool which is already having IO.
> >
> > Thank you for your response in advance.
> >
> > - Manoj
> >
> >
> > ___
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] How to select particular OSD to act as primary OSD.

2016-06-15 Thread Kanchana. P

When one osd is set to weighatge 1 and rest all to zeros, then the osd
which is set to 1 is taken as a primary osd everytime. Can someone let me
know is this the correct way of setting primary afffinity.

1. Have a  crush rule set to take osds 4,9,14,18
2. Changed primary affinity weightage of osd 4 to 1 and rest all to zeros.
3. Placed objects in pool. Everytime it takes 4 as primary osd as expected
(Tried by placing 15 objects in pool, everytime it took osd 4. Need to
check how it behaves when IO tool is used)
4. Changed primary affinity weightage of osd 4 to 1 and osd.9->0.75,
osd.14->0.70, osd.18->0.6799
5. When a object is placed it takes osds in different order, it not
considering 4 as primary osd.
6. Changed primary affinity weighatge of osd 4 to 0.90 and rest all to
zeros.
7. It is taking osds in different order, not considering 4 as primary.
8. Only when one OSD is set to 1 and all other to 0, it is taking osd with
weight 1 as primary.
9. If we use different osds in each rule set and if we don't repeat the
osds in other rules, point 8 will meet our requirement.

On Tue, Jun 14, 2016 at 6:06 PM, Kanchana. P 
wrote:

> Thanks for the reply shylesh, but the procedure is not working. In
> ceph.com it is mentioned that we can make particular osd as a primary osd
> by setting primary affinity weightage between 0-1. But it is not working.
> On 14 Jun 2016 16:15, "shylesh kumar"  wrote:
>
>> Hi,
>>
>> I think you can edit the crush rule something like below
>>
>> rule another_replicated_ruleset {
>> ruleset 1
>> type replicated
>> min_size 1
>> max_size 10
>> step take default
>> step take osd1
>> step choose firstn 1 type osd
>> step emit
>> step take osd2
>> step choose firstn 1 type osd
>> step emit
>> step take osd5
>> step choose firstn 1 type osd
>> step emit
>> step take osd4
>> step choose firstn 1 type osd
>> step emit
>> }
>>
>> and create pool using this rule.
>>
>> It might work , though I am not 100% sure.
>>
>> Thanks,
>> Shylesh
>>
>> On Tue, Jun 14, 2016 at 4:05 PM, Kanchana. P 
>> wrote:
>>
>>> Hi,
>>>
>>> How to select particular OSD to act as primary OSD.
>>> I modified the ceph.conf file and added
>>> [mon]
>>> ...
>>> mon osd allow primary affinity = true
>>> Restarted ceph target, now primary affinity is set to true in all
>>> monitor nodes.
>>> Using the below commands set some weights to the osds.
>>>
>>> $ ceph osd primary-affinity osd.1 0.25
>>> $ ceph osd primary-affinity osd.6 0.50
>>> $ ceph osd primary-affinity osd.11 0.75
>>> $ ceph osd primary-affinity osd.16 1
>>>
>>> Created a pool "poolA" and set a crush_ruleset so that it takes OSDs in
>>> order 16,11,6,1
>>> Even after setting the primary affinity weight, it took osds in
>>> different order.
>>> Can we select the primary OSD, if so, how can we do that. Please let me
>>> know what I am missing here to set an OSD as a primary OSD.
>>>
>>> ___
>>> ceph-users mailing list
>>> ceph-users@lists.ceph.com
>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>
>>>
>>
>>
>> --
>> Thanks & Regards
>> Shylesh Kumar M
>>
>>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] How can I make daemon for ceph-dash

2016-06-15 Thread 한승진

I am using ceph-dash for dashboard of ceph clusters.

There are contrib directory for apache,nginx,wsgi in ceph-dash sources.

However, I cannot adjust those files to start ceph-dah as a apache daemon
or other daemon.

How to run ceph-dash as a daemon?

thanks.
John Haan
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

53 matches

Mail list logo