date:20170619

Re: [ceph-users] FAILED assert(i.first <= i.last)

2017-06-19 Thread Wido den Hollander


> Op 18 juni 2017 om 16:21 schreef Peter Rosell :
> 
> 
> Hi,
> I have a small cluster with only three nodes, 4 OSDs + 3 OSDs. I have been
> running version 0.87.2 (Giant) for over 2.5 year, but a couple of day ago I
> upgraded to 0.94.10 (Hammer) and then up to 10.2.7 (Jewel). Both the
> upgrades went great. Started with monitors, osd and finally mds. The log
> shows all 448 pgs active+clean. I'm running all daemons inside docker and
> ceph version 10.2.7 (50e863e0f4bc8f4b9e31156de690d765af245185)
> 
> Today I had a power outage and I had to take down the servers. When I now
> start the servers again one OSD daemon doesn't start properly. It keeps
> crashing.
> 
> I noticed that the two first restarts of the osd daemon crashed with this
> error:
> FAILED assert(rollback_info_trimmed_to_riter == log.rbegin())
> 
> After that it always fails with "FAILED assert(i.first <= i.last)"
> 
> I have 15 logs like this one:
> Jun 18 08:56:18 island sh[27991]: 2017-06-18 08:56:18.300641 7f5c5e0ff8c0
> -1 log_channel(cluster) log [ERR] : 2.38 log bound mismatch, info
> (19544'666742,19691'671046] actual [19499'665843,19691'671046]
> 
> I removed the directories _head, but that just removed these error
> logs. It crashes anyway.
> 
> Anyone has any suggestions what to do to make it start up correct. Of
> course I can remove the OSD from the cluster and re-add it, but it feels
> like a bug.

Are you sure? Since you had a power failure it could be that certain parts 
weren't committed to disk/FS properly when the power failed. That really 
depends on the hardware and configuration.

Please, do not try to repair this OSD. Wipe it and re-add it to the cluster.

Wido

> A small snippet from the logs is added below. I didn't include the event
> list. If it will help I can send it too.
> 
> Jun 18 13:52:23 island sh[7068]: osd/osd_types.cc: In function 'static bool
> pg_interval_t::check_new_interval(int, int, const std::vector&, const
> std::vector&, int, int, const std::vector&, const
> std::vector&, epoch_t, epoch_t, OSDMapRef, OSDMapRef, pg_t,
> IsPGRecoverablePredicate*, std::map*,
> std::ostream*)' thread 7f4fc2500700 time 2017-06-18 13:52:23.593991
> Jun 18 13:52:23 island sh[7068]: osd/osd_types.cc: 3132: FAILED
> assert(i.first <= i.last)
> Jun 18 13:52:23 island sh[7068]:  ceph version 10.2.7
> (50e863e0f4bc8f4b9e31156de690d765af245185)
> Jun 18 13:52:23 island sh[7068]:  1: (ceph::__ceph_assert_fail(char const*,
> char const*, int, char const*)+0x80) [0x559fe4c14360]
> Jun 18 13:52:23 island sh[7068]:  2:
> (pg_interval_t::check_new_interval(int, int, std::vector std::allocator > const&, std::vector >
> const&, int, int, std::vector > const&,
> std::vector > const&, unsigned int, unsigned int,
> std::shared_ptr, std::shared_ptr, pg_t,
> IsPGRecoverablePredicate*, std::map std::less, std::allocator pg_interval_t> > >*, std::ostream*)+0x72c) [0x559fe47f723c]
> Jun 18 13:52:23 island sh[7068]:  3:
> (PG::start_peering_interval(std::shared_ptr, std::vector std::allocator > const&, int, std::vector >
> const&, int, ObjectStore::Transaction*)+0x3ff) [0x559fe461439f]
> Jun 18 13:52:23 island sh[7068]:  4:
> (PG::RecoveryState::Reset::react(PG::AdvMap const&)+0x478) [0x559fe4615828]
> Jun 18 13:52:23 island sh[7068]:  5:
> (boost::statechart::simple_state PG::RecoveryState::RecoveryMachine, boost::mpl::list mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na,
> mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na,
> mpl_::na, mpl_::na, mpl_::na, mpl_::na>,
> (boost::statechart::history_mode)0>::react_impl(boost::statechart::event_base
> const&, void const*)+0x176) [0x559fe4645b86]
> Jun 18 13:52:23 island sh[7068]:  6:
> (boost::statechart::state_machine PG::RecoveryState::Initial, std::allocator,
> boost::statechart::null_exception_translator>::process_event(boost::statechart::event_base
> const&)+0x69) [0x559fe4626d49]
> Jun 18 13:52:23 island sh[7068]:  7:
> (PG::handle_advance_map(std::shared_ptr,
> std::shared_ptr, std::vector >&,
> int, std::vector >&, int, PG::RecoveryCtx*)+0x49e)
> [0x559fe45fa5ae]
> Jun 18 13:52:23 island sh[7068]:  8: (OSD::advance_pg(unsigned int, PG*,
> ThreadPool::TPHandle&, PG::RecoveryCtx*, std::set,
> std::less >,
> std::allocator > >*)+0x2f2) [0x559fe452c042]
> Jun 18 13:52:23 island sh[7068]:  9:
> (OSD::process_peering_events(std::__cxx11::list >
> const&, ThreadPool::TPHandle&)+0x214) [0x559fe4546d34]
> Jun 18 13:52:23 island sh[7068]:  10:
> (ThreadPool::BatchWorkQueue::_void_process(void*,
> ThreadPool::TPHandle&)+0x25) [0x559fe458f8e5]
> Jun 18 13:52:23 island sh[7068]:  11:
> (ThreadPool::worker(ThreadPool::WorkThread*)+0xdb1) [0x559fe4c06531]
> Jun 18 13:52:23 island sh[7068]:  12:
> (ThreadPool::WorkThread::entry()+0x10) [0x559fe4c07630]
> Jun 18 13:52:23 island sh[7068]:  13: (()+0x76fa) [0x7f4fe256b6fa]
> Jun 18 13:52:23 island sh[7068]:  14: (clone()+0x6d) [0x7f4fe05e3b5d]
> Jun 18 13:52:23 island sh[7068]:  NOTE: a copy of the executable, or
>

Re: [ceph-users] Kernel RBD client talking to multiple storage clusters

2017-06-19 Thread Wido den Hollander


> Op 19 juni 2017 om 5:15 schreef Alex Gorbachev :
> 
> 
> Has anyone run into such config where a single client consumes storage from
> several ceph clusters, unrelated to each other (different MONs and OSDs,
> and keys)?
> 

Should be possible, you can simply supply a different ceph.conf using the "-c" 
flag for the 'rbd' command and thus point to a different cluster.

Wido

> We have a Hammer and a Jewel cluster now, and this may be a way to have
> very clean migrations.
> 
> Best regards,
> Alex
> Storcium
> -- 
> --
> Alex Gorbachev
> Storcium
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] FAILED assert(i.first <= i.last)

2017-06-19 Thread Peter Rosell

I have my servers on UPS and shutdown them manually the way I use to turn
them off. There where enough power in the UPS after the servers were
shutdown because is continued to beep. Anyway, I will wipe it and re-add
it. Thanks for your reply.

/Peter

mån 19 juni 2017 kl 09:11 skrev Wido den Hollander :

>
> > Op 18 juni 2017 om 16:21 schreef Peter Rosell :
> >
> >
> > Hi,
> > I have a small cluster with only three nodes, 4 OSDs + 3 OSDs. I have
> been
> > running version 0.87.2 (Giant) for over 2.5 year, but a couple of day
> ago I
> > upgraded to 0.94.10 (Hammer) and then up to 10.2.7 (Jewel). Both the
> > upgrades went great. Started with monitors, osd and finally mds. The log
> > shows all 448 pgs active+clean. I'm running all daemons inside docker and
> > ceph version 10.2.7 (50e863e0f4bc8f4b9e31156de690d765af245185)
> >
> > Today I had a power outage and I had to take down the servers. When I now
> > start the servers again one OSD daemon doesn't start properly. It keeps
> > crashing.
> >
> > I noticed that the two first restarts of the osd daemon crashed with this
> > error:
> > FAILED assert(rollback_info_trimmed_to_riter == log.rbegin())
> >
> > After that it always fails with "FAILED assert(i.first <= i.last)"
> >
> > I have 15 logs like this one:
> > Jun 18 08:56:18 island sh[27991]: 2017-06-18 08:56:18.300641 7f5c5e0ff8c0
> > -1 log_channel(cluster) log [ERR] : 2.38 log bound mismatch, info
> > (19544'666742,19691'671046] actual [19499'665843,19691'671046]
> >
> > I removed the directories _head, but that just removed these error
> > logs. It crashes anyway.
> >
> > Anyone has any suggestions what to do to make it start up correct. Of
> > course I can remove the OSD from the cluster and re-add it, but it feels
> > like a bug.
>
> Are you sure? Since you had a power failure it could be that certain parts
> weren't committed to disk/FS properly when the power failed. That really
> depends on the hardware and configuration.
>
> Please, do not try to repair this OSD. Wipe it and re-add it to the
> cluster.
>
> Wido
>
> > A small snippet from the logs is added below. I didn't include the event
> > list. If it will help I can send it too.
> >
> > Jun 18 13:52:23 island sh[7068]: osd/osd_types.cc: In function 'static
> bool
> > pg_interval_t::check_new_interval(int, int, const std::vector&,
> const
> > std::vector&, int, int, const std::vector&, const
> > std::vector&, epoch_t, epoch_t, OSDMapRef, OSDMapRef, pg_t,
> > IsPGRecoverablePredicate*, std::map*,
> > std::ostream*)' thread 7f4fc2500700 time 2017-06-18 13:52:23.593991
> > Jun 18 13:52:23 island sh[7068]: osd/osd_types.cc: 3132: FAILED
> > assert(i.first <= i.last)
> > Jun 18 13:52:23 island sh[7068]:  ceph version 10.2.7
> > (50e863e0f4bc8f4b9e31156de690d765af245185)
> > Jun 18 13:52:23 island sh[7068]:  1: (ceph::__ceph_assert_fail(char
> const*,
> > char const*, int, char const*)+0x80) [0x559fe4c14360]
> > Jun 18 13:52:23 island sh[7068]:  2:
> > (pg_interval_t::check_new_interval(int, int, std::vector > std::allocator > const&, std::vector >
> > const&, int, int, std::vector > const&,
> > std::vector > const&, unsigned int, unsigned
> int,
> > std::shared_ptr, std::shared_ptr, pg_t,
> > IsPGRecoverablePredicate*, std::map > std::less, std::allocator > pg_interval_t> > >*, std::ostream*)+0x72c) [0x559fe47f723c]
> > Jun 18 13:52:23 island sh[7068]:  3:
> > (PG::start_peering_interval(std::shared_ptr,
> std::vector > std::allocator > const&, int, std::vector >
> > const&, int, ObjectStore::Transaction*)+0x3ff) [0x559fe461439f]
> > Jun 18 13:52:23 island sh[7068]:  4:
> > (PG::RecoveryState::Reset::react(PG::AdvMap const&)+0x478)
> [0x559fe4615828]
> > Jun 18 13:52:23 island sh[7068]:  5:
> > (boost::statechart::simple_state > PG::RecoveryState::RecoveryMachine, boost::mpl::list > mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na,
> > mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na,
> > mpl_::na, mpl_::na, mpl_::na, mpl_::na>,
> >
> (boost::statechart::history_mode)0>::react_impl(boost::statechart::event_base
> > const&, void const*)+0x176) [0x559fe4645b86]
> > Jun 18 13:52:23 island sh[7068]:  6:
> > (boost::statechart::state_machine > PG::RecoveryState::Initial, std::allocator,
> >
> boost::statechart::null_exception_translator>::process_event(boost::statechart::event_base
> > const&)+0x69) [0x559fe4626d49]
> > Jun 18 13:52:23 island sh[7068]:  7:
> > (PG::handle_advance_map(std::shared_ptr,
> > std::shared_ptr, std::vector >&,
> > int, std::vector >&, int,
> PG::RecoveryCtx*)+0x49e)
> > [0x559fe45fa5ae]
> > Jun 18 13:52:23 island sh[7068]:  8: (OSD::advance_pg(unsigned int, PG*,
> > ThreadPool::TPHandle&, PG::RecoveryCtx*,
> std::set,
> > std::less >,
> > std::allocator > >*)+0x2f2) [0x559fe452c042]
> > Jun 18 13:52:23 island sh[7068]:  9:
> > (OSD::process_peering_events(std::__cxx11::list
> >
> > const&, ThreadPool::TPHandle&)+0x214) [0x559fe4546d34]
> > Jun 18 13:52:23 island sh[7068]:  10:
> > (Thr

Re: [ceph-users] FAILED assert(i.first <= i.last)

2017-06-19 Thread Wido den Hollander


> Op 19 juni 2017 om 9:55 schreef Peter Rosell :
> 
> 
> I have my servers on UPS and shutdown them manually the way I use to turn
> them off. There where enough power in the UPS after the servers were
> shutdown because is continued to beep. Anyway, I will wipe it and re-add
> it. Thanks for your reply.
> 

Ok, you didn't mention that in the first post. I assumed a sudden power failure.

My general recommendation is to wipe a single OSD if it has issues. The reason 
is that I've seen many cases where people ran XFS repair, played with the files 
on the disk and then had data corruption.

That's why I'd say that you should try to avoid fixing single OSDs when you 
don't need to.

Wido

> /Peter
> 
> mån 19 juni 2017 kl 09:11 skrev Wido den Hollander :
> 
> >
> > > Op 18 juni 2017 om 16:21 schreef Peter Rosell :
> > >
> > >
> > > Hi,
> > > I have a small cluster with only three nodes, 4 OSDs + 3 OSDs. I have
> > been
> > > running version 0.87.2 (Giant) for over 2.5 year, but a couple of day
> > ago I
> > > upgraded to 0.94.10 (Hammer) and then up to 10.2.7 (Jewel). Both the
> > > upgrades went great. Started with monitors, osd and finally mds. The log
> > > shows all 448 pgs active+clean. I'm running all daemons inside docker and
> > > ceph version 10.2.7 (50e863e0f4bc8f4b9e31156de690d765af245185)
> > >
> > > Today I had a power outage and I had to take down the servers. When I now
> > > start the servers again one OSD daemon doesn't start properly. It keeps
> > > crashing.
> > >
> > > I noticed that the two first restarts of the osd daemon crashed with this
> > > error:
> > > FAILED assert(rollback_info_trimmed_to_riter == log.rbegin())
> > >
> > > After that it always fails with "FAILED assert(i.first <= i.last)"
> > >
> > > I have 15 logs like this one:
> > > Jun 18 08:56:18 island sh[27991]: 2017-06-18 08:56:18.300641 7f5c5e0ff8c0
> > > -1 log_channel(cluster) log [ERR] : 2.38 log bound mismatch, info
> > > (19544'666742,19691'671046] actual [19499'665843,19691'671046]
> > >
> > > I removed the directories _head, but that just removed these error
> > > logs. It crashes anyway.
> > >
> > > Anyone has any suggestions what to do to make it start up correct. Of
> > > course I can remove the OSD from the cluster and re-add it, but it feels
> > > like a bug.
> >
> > Are you sure? Since you had a power failure it could be that certain parts
> > weren't committed to disk/FS properly when the power failed. That really
> > depends on the hardware and configuration.
> >
> > Please, do not try to repair this OSD. Wipe it and re-add it to the
> > cluster.
> >
> > Wido
> >
> > > A small snippet from the logs is added below. I didn't include the event
> > > list. If it will help I can send it too.
> > >
> > > Jun 18 13:52:23 island sh[7068]: osd/osd_types.cc: In function 'static
> > bool
> > > pg_interval_t::check_new_interval(int, int, const std::vector&,
> > const
> > > std::vector&, int, int, const std::vector&, const
> > > std::vector&, epoch_t, epoch_t, OSDMapRef, OSDMapRef, pg_t,
> > > IsPGRecoverablePredicate*, std::map*,
> > > std::ostream*)' thread 7f4fc2500700 time 2017-06-18 13:52:23.593991
> > > Jun 18 13:52:23 island sh[7068]: osd/osd_types.cc: 3132: FAILED
> > > assert(i.first <= i.last)
> > > Jun 18 13:52:23 island sh[7068]:  ceph version 10.2.7
> > > (50e863e0f4bc8f4b9e31156de690d765af245185)
> > > Jun 18 13:52:23 island sh[7068]:  1: (ceph::__ceph_assert_fail(char
> > const*,
> > > char const*, int, char const*)+0x80) [0x559fe4c14360]
> > > Jun 18 13:52:23 island sh[7068]:  2:
> > > (pg_interval_t::check_new_interval(int, int, std::vector > > std::allocator > const&, std::vector >
> > > const&, int, int, std::vector > const&,
> > > std::vector > const&, unsigned int, unsigned
> > int,
> > > std::shared_ptr, std::shared_ptr, pg_t,
> > > IsPGRecoverablePredicate*, std::map > > std::less, std::allocator > > pg_interval_t> > >*, std::ostream*)+0x72c) [0x559fe47f723c]
> > > Jun 18 13:52:23 island sh[7068]:  3:
> > > (PG::start_peering_interval(std::shared_ptr,
> > std::vector > > std::allocator > const&, int, std::vector >
> > > const&, int, ObjectStore::Transaction*)+0x3ff) [0x559fe461439f]
> > > Jun 18 13:52:23 island sh[7068]:  4:
> > > (PG::RecoveryState::Reset::react(PG::AdvMap const&)+0x478)
> > [0x559fe4615828]
> > > Jun 18 13:52:23 island sh[7068]:  5:
> > > (boost::statechart::simple_state > > PG::RecoveryState::RecoveryMachine, boost::mpl::list > > mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na,
> > > mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na,
> > > mpl_::na, mpl_::na, mpl_::na, mpl_::na>,
> > >
> > (boost::statechart::history_mode)0>::react_impl(boost::statechart::event_base
> > > const&, void const*)+0x176) [0x559fe4645b86]
> > > Jun 18 13:52:23 island sh[7068]:  6:
> > > (boost::statechart::state_machine > > PG::RecoveryState::Initial, std::allocator,
> > >
> > boost::statechart::null_exception_translator>::process_event(boost::statechart::eve

Re: [ceph-users] RadosGW not working after upgrade to Hammer

2017-06-19 Thread Gerson Jamal

Hi,

Anyone can found the solution for this issue.
I upgrade from Firefly do Hammer, and i'm facing with this problem.

Thanks in advance

On Mon, Jun 19, 2017 at 10:32 AM, Gerson Jamal 
wrote:

> Hi,
>
> Anyone can found the solution for this issue.
> I upgrade from Firefly do Hammer, and i'm facing with this problem.
>
> Thanks in advance
>
> --
> Regards
>
> Gerson Razaque Jamal
>

-- 
Cumprimentos

Gerson Razaque Jamal
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] What is "up:standby"? in ceph mds stat => e5: 1/1/1 up {0=ceph-test-3=up:active}, 2 up:standby

2017-06-19 Thread John Spray

On Fri, Jun 16, 2017 at 12:14 PM, Stéphane Klein
 wrote:
> 2017-06-16 13:07 GMT+02:00 Daniel Carrasco :
>>
>> On MDS nodes, by default only the first you add is active: The others
>> joins the cluster as standby MDS daemons. When the active fails, then an
>> standby MDS becomes active and continues with the work.
>>
>
> Thanks, it is possible to add this information here
> http://docs.ceph.com/docs/master/cephfs/createfs/ to improve the
> documentation?

Yes, you can do that by cloning the ceph source code, making the edit
in doc/, and then submitting a pull request.  For more details see:
http://docs.ceph.com/docs/hammer/start/documenting-ceph/

John

>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] FAILED assert(i.first <= i.last)

2017-06-19 Thread Peter Rosell

That sound like an easy rule to follow. thanks again for you reply.

/Peter

mån 19 juni 2017 kl 10:19 skrev Wido den Hollander :

>
> > Op 19 juni 2017 om 9:55 schreef Peter Rosell :
> >
> >
> > I have my servers on UPS and shutdown them manually the way I use to turn
> > them off. There where enough power in the UPS after the servers were
> > shutdown because is continued to beep. Anyway, I will wipe it and re-add
> > it. Thanks for your reply.
> >
>
> Ok, you didn't mention that in the first post. I assumed a sudden power
> failure.
>
> My general recommendation is to wipe a single OSD if it has issues. The
> reason is that I've seen many cases where people ran XFS repair, played
> with the files on the disk and then had data corruption.
>
> That's why I'd say that you should try to avoid fixing single OSDs when
> you don't need to.
>
> Wido
>
> > /Peter
> >
> > mån 19 juni 2017 kl 09:11 skrev Wido den Hollander :
> >
> > >
> > > > Op 18 juni 2017 om 16:21 schreef Peter Rosell <
> peter.ros...@gmail.com>:
> > > >
> > > >
> > > > Hi,
> > > > I have a small cluster with only three nodes, 4 OSDs + 3 OSDs. I have
> > > been
> > > > running version 0.87.2 (Giant) for over 2.5 year, but a couple of day
> > > ago I
> > > > upgraded to 0.94.10 (Hammer) and then up to 10.2.7 (Jewel). Both the
> > > > upgrades went great. Started with monitors, osd and finally mds. The
> log
> > > > shows all 448 pgs active+clean. I'm running all daemons inside
> docker and
> > > > ceph version 10.2.7 (50e863e0f4bc8f4b9e31156de690d765af245185)
> > > >
> > > > Today I had a power outage and I had to take down the servers. When
> I now
> > > > start the servers again one OSD daemon doesn't start properly. It
> keeps
> > > > crashing.
> > > >
> > > > I noticed that the two first restarts of the osd daemon crashed with
> this
> > > > error:
> > > > FAILED assert(rollback_info_trimmed_to_riter == log.rbegin())
> > > >
> > > > After that it always fails with "FAILED assert(i.first <= i.last)"
> > > >
> > > > I have 15 logs like this one:
> > > > Jun 18 08:56:18 island sh[27991]: 2017-06-18 08:56:18.300641
> 7f5c5e0ff8c0
> > > > -1 log_channel(cluster) log [ERR] : 2.38 log bound mismatch, info
> > > > (19544'666742,19691'671046] actual [19499'665843,19691'671046]
> > > >
> > > > I removed the directories _head, but that just removed these
> error
> > > > logs. It crashes anyway.
> > > >
> > > > Anyone has any suggestions what to do to make it start up correct. Of
> > > > course I can remove the OSD from the cluster and re-add it, but it
> feels
> > > > like a bug.
> > >
> > > Are you sure? Since you had a power failure it could be that certain
> parts
> > > weren't committed to disk/FS properly when the power failed. That
> really
> > > depends on the hardware and configuration.
> > >
> > > Please, do not try to repair this OSD. Wipe it and re-add it to the
> > > cluster.
> > >
> > > Wido
> > >
> > > > A small snippet from the logs is added below. I didn't include the
> event
> > > > list. If it will help I can send it too.
> > > >
> > > > Jun 18 13:52:23 island sh[7068]: osd/osd_types.cc: In function
> 'static
> > > bool
> > > > pg_interval_t::check_new_interval(int, int, const std::vector&,
> > > const
> > > > std::vector&, int, int, const std::vector&, const
> > > > std::vector&, epoch_t, epoch_t, OSDMapRef, OSDMapRef, pg_t,
> > > > IsPGRecoverablePredicate*, std::map*,
> > > > std::ostream*)' thread 7f4fc2500700 time 2017-06-18 13:52:23.593991
> > > > Jun 18 13:52:23 island sh[7068]: osd/osd_types.cc: 3132: FAILED
> > > > assert(i.first <= i.last)
> > > > Jun 18 13:52:23 island sh[7068]:  ceph version 10.2.7
> > > > (50e863e0f4bc8f4b9e31156de690d765af245185)
> > > > Jun 18 13:52:23 island sh[7068]:  1: (ceph::__ceph_assert_fail(char
> > > const*,
> > > > char const*, int, char const*)+0x80) [0x559fe4c14360]
> > > > Jun 18 13:52:23 island sh[7068]:  2:
> > > > (pg_interval_t::check_new_interval(int, int, std::vector > > > std::allocator > const&, std::vector >
> > > > const&, int, int, std::vector > const&,
> > > > std::vector > const&, unsigned int, unsigned
> > > int,
> > > > std::shared_ptr, std::shared_ptr, pg_t,
> > > > IsPGRecoverablePredicate*, std::map > > > std::less, std::allocator > > > pg_interval_t> > >*, std::ostream*)+0x72c) [0x559fe47f723c]
> > > > Jun 18 13:52:23 island sh[7068]:  3:
> > > > (PG::start_peering_interval(std::shared_ptr,
> > > std::vector > > > std::allocator > const&, int, std::vector std::allocator >
> > > > const&, int, ObjectStore::Transaction*)+0x3ff) [0x559fe461439f]
> > > > Jun 18 13:52:23 island sh[7068]:  4:
> > > > (PG::RecoveryState::Reset::react(PG::AdvMap const&)+0x478)
> > > [0x559fe4615828]
> > > > Jun 18 13:52:23 island sh[7068]:  5:
> > > > (boost::statechart::simple_state > > > PG::RecoveryState::RecoveryMachine, boost::mpl::list mpl_::na,
> > > > mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na,
> > > > mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, m

Re: [ceph-users] ceph packages on stretch from eu.ceph.com

2017-06-19 Thread Ronny Aasen

Thanks for the suggestions. i did do a trial with the proxmox ones, on a 
single node machine tho.


But i hope now that debian 9 is released and stable, that the ceph repos 
will incluclude stretch soon..  Hint Hint :)


I am itching to try to upgrade my testing cluster. :)

kind regards
Ronny Aasen



On 26. april 2017 19:46, Alexandre DERUMIER wrote:

you can try the proxmox stretch repository if you want

http://download.proxmox.com/debian/ceph-luminous/dists/stretch/



- Mail original -
De: "Wido den Hollander" 
À: "ceph-users" , "Ronny Aasen" 

Envoyé: Mercredi 26 Avril 2017 16:58:04
Objet: Re: [ceph-users] ceph packages on stretch from eu.ceph.com


Op 25 april 2017 om 20:07 schreef Ronny Aasen :


Hello

i am trying to install ceph on debian stretch from

http://eu.ceph.com/debian-jewel/dists/

but there is no stretch repo there.

now with stretch being frozen, it is a good time to be testing ceph on
stretch. is it possible to get packages for stretch on jewel, kraken,
and lumious ?


Afaik packages are only build for stable releases. As Stretch isn't out there 
are no packages.

You can try if the Ubuntu 16.04 (Xenial) packages work.

Wido





kind regards

Ronny Aasen

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] EXT: ceph-lvm - a tool to deploy OSDs from LVM volumes

2017-06-19 Thread John Spray

On Fri, Jun 16, 2017 at 7:23 PM, Alfredo Deza  wrote:
> On Fri, Jun 16, 2017 at 2:11 PM, Warren Wang - ISD
>  wrote:
>> I would prefer that this is something more generic, to possibly support 
>> other backends one day, like ceph-volume. Creating one tool per backend 
>> seems silly.
>>
>> Also, ceph-lvm seems to imply that ceph itself has something to do with lvm, 
>> which it really doesn’t. This is simply to deal with the underlying disk. If 
>> there’s resistance to something more generic like ceph-volume, then it 
>> should at least be called something like ceph-disk-lvm.
>
> Sage, you had mentioned the need for "composable" tools for this, and
> I think that if we go with `ceph-volume` we could allow plugins for
> each strategy. We are starting with `lvm` support so that would look
> like: `ceph-volume lvm`
>
> The `lvm` functionality could be implemented as a plugin itself, and
> when we start working with supporting regular disks, then `ceph-volume
> disk` can come along, etc...
>
> It would also open the door for anyone to be able to write a plugin to
> `ceph-volume` to implement their own logic, while at the same time
> re-using most of what we are implementing today: logging, reporting,
> systemd support, OSD metadata, etc...
>
> If we were to separate these into single-purpose tools, all those
> would need to be re-done.

Couple of thoughts:
 - let's keep this in the Ceph repository unless there's a strong
reason not to -- it'll enable the tool's branching to automatically
happen in line with Ceph's.
 - I agree with others that a single entrypoint (i.e. executable) will
be more manageable than having conspicuously separate tools, but we
shouldn't worry too much about making things "plugins" as such -- they
can just be distinct code inside one tool, sharing as much or as
little as they need.

What if we delivered this set of LVM functionality as "ceph-disk lvm
..." commands to minimise the impression that the tooling is changing,
even if internally it's all new/distinct code?

At the risk of being a bit picky about language, I don't like calling
this anything with "volume" in the name, because afaik we've never
ever called OSDs or the drives they occupy "volumes", so we're
introducing a whole new noun, and a widely used (to mean different
things) one at that.

John

>
>
>>
>> 2 cents from one of the LVM for Ceph users,
>> Warren Wang
>> Walmart ✻
>>
>> On 6/16/17, 10:25 AM, "ceph-users on behalf of Alfredo Deza" 
>>  wrote:
>>
>> Hello,
>>
>> At the last CDM [0] we talked about `ceph-lvm` and the ability to
>> deploy OSDs from logical volumes. We have now an initial draft for the
>> documentation [1] and would like some feedback.
>>
>> The important features for this new tool are:
>>
>> * parting ways with udev (new approach will rely on LVM functionality
>> for discovery)
>> * compatibility/migration for existing LVM volumes deployed as 
>> directories
>> * dmcache support
>>
>> By documenting the API and workflows first we are making sure that
>> those look fine before starting on actual development.
>>
>> It would be great to get some feedback, specially if you are currently
>> using LVM with ceph (or planning to!).
>>
>> Please note that the documentation is not complete and is missing
>> content on some parts.
>>
>> [0] http://tracker.ceph.com/projects/ceph/wiki/CDM_06-JUN-2017
>> [1] http://docs.ceph.com/ceph-lvm/
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>>
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] EXT: ceph-lvm - a tool to deploy OSDs from LVM volumes

2017-06-19 Thread Alfredo Deza

On Mon, Jun 19, 2017 at 9:27 AM, John Spray  wrote:
> On Fri, Jun 16, 2017 at 7:23 PM, Alfredo Deza  wrote:
>> On Fri, Jun 16, 2017 at 2:11 PM, Warren Wang - ISD
>>  wrote:
>>> I would prefer that this is something more generic, to possibly support 
>>> other backends one day, like ceph-volume. Creating one tool per backend 
>>> seems silly.
>>>
>>> Also, ceph-lvm seems to imply that ceph itself has something to do with 
>>> lvm, which it really doesn’t. This is simply to deal with the underlying 
>>> disk. If there’s resistance to something more generic like ceph-volume, 
>>> then it should at least be called something like ceph-disk-lvm.
>>
>> Sage, you had mentioned the need for "composable" tools for this, and
>> I think that if we go with `ceph-volume` we could allow plugins for
>> each strategy. We are starting with `lvm` support so that would look
>> like: `ceph-volume lvm`
>>
>> The `lvm` functionality could be implemented as a plugin itself, and
>> when we start working with supporting regular disks, then `ceph-volume
>> disk` can come along, etc...
>>
>> It would also open the door for anyone to be able to write a plugin to
>> `ceph-volume` to implement their own logic, while at the same time
>> re-using most of what we are implementing today: logging, reporting,
>> systemd support, OSD metadata, etc...
>>
>> If we were to separate these into single-purpose tools, all those
>> would need to be re-done.
>
> Couple of thoughts:
>  - let's keep this in the Ceph repository unless there's a strong
> reason not to -- it'll enable the tool's branching to automatically
> happen in line with Ceph's.

For initial development this is easier to have as a separate tool from
the Ceph source tree. There are some niceties about being in-source,
like
not being required to deal with what features we are supporting on what version.

Although there is no code yet, I consider the project in an "unstable"
state, it will move incredibly fast (it has to!) and that puts it at
odds with the cadence
of Ceph. Specifically, these two things are very important right now:

* faster release cycles
* easier and faster to test

I am not ruling out going into Ceph at some point though, ideally when
things slow down and become stable.

Is your argument only to have parity in Ceph's branching? That was
never a problem with out-of-tree tools like ceph-deploy for example.

>  - I agree with others that a single entrypoint (i.e. executable) will
> be more manageable than having conspicuously separate tools, but we
> shouldn't worry too much about making things "plugins" as such -- they
> can just be distinct code inside one tool, sharing as much or as
> little as they need.
>
> What if we delivered this set of LVM functionality as "ceph-disk lvm
> ..." commands to minimise the impression that the tooling is changing,
> even if internally it's all new/distinct code?

That sounded appealing initially, but because we are introducing a
very different API, it would look odd to interact
with other subcommands without a normalized interaction. For example,
for 'prepare' this would be:

ceph-disk prepare [...]

And for LVM it would possible be

ceph-disk lvm prepare [...]

The level at which these similar actions are presented imply that one
may be a preferred (or even default) one, while the other one
isn't.

At one point we are going to add regular disk worfklows (replacing
ceph-disk functionality) and then it would become even more
confusing to keep it there (or do you think at that point we could split?)

>
> At the risk of being a bit picky about language, I don't like calling
> this anything with "volume" in the name, because afaik we've never
> ever called OSDs or the drives they occupy "volumes", so we're
> introducing a whole new noun, and a widely used (to mean different
> things) one at that.
>

We have never called them 'volumes' because there was never anything
to support something other than regular disks, the approach
has always been disks and partitions.

A "volume" can be a physical volume (e.g. a disk) or a logical one
(lvm, dmcache). It is an all-encompassing name to allow different
device-like to work with.


> John
>
>>
>>
>>>
>>> 2 cents from one of the LVM for Ceph users,
>>> Warren Wang
>>> Walmart ✻
>>>
>>> On 6/16/17, 10:25 AM, "ceph-users on behalf of Alfredo Deza" 
>>>  wrote:
>>>
>>> Hello,
>>>
>>> At the last CDM [0] we talked about `ceph-lvm` and the ability to
>>> deploy OSDs from logical volumes. We have now an initial draft for the
>>> documentation [1] and would like some feedback.
>>>
>>> The important features for this new tool are:
>>>
>>> * parting ways with udev (new approach will rely on LVM functionality
>>> for discovery)
>>> * compatibility/migration for existing LVM volumes deployed as 
>>> directories
>>> * dmcache support
>>>
>>> By documenting the API and workflows first we are making sure that
>>> those look fine before starting on actual development.

Re: [ceph-users] Luminous: ETA on LTS production release?

2017-06-19 Thread Lars Marowsky-Bree

On 2017-06-16T20:09:04, Gregory Farnum  wrote:

> There's a lot going into this release, and things keep popping up. I
> suspect it'll be another month or two, but I doubt anybody is capable of
> giving a more precise date. :/ The downside of giving up on train
> releases...

What's still outstanding though? When will be the time at which features
will be frozen, so that stabilization can happen?

(I wish we hadn't moved away from schedule-driven releases and stuck to
what the kernel does ;-)



Regards,
Lars

-- 
Architect SDS, Distinguished Engineer
SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 
(AG Nürnberg)
"Experience is the name everyone gives to their mistakes." -- Oscar Wilde

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] CephFS | flapping OSD locked up NFS

2017-06-19 Thread David

Hi All

We had a faulty OSD that was going up and down for a few hours until Ceph
marked it out. During this time Cephfs was accessible, however, for about
10 mins all NFS processes (kernel NFSv3) on a server exporting Cephfs were
hung, locking up all the NFS clients. The cluster was healthy before the
faulty OSD. I'm trying to understand if this is expected behaviour, a bug
or something else. Any insights would be appreciated.

MDS active/passive
Jewel 10.2.2
Ceph client 3.10.0-514.6.1.el7.x86_64
Cephfs mount: (rw,relatime,name=admin,secret=,acl)

I can see some slow requests in the MDS log during the time the NFS
processes were hung, some for setattr calls:

2017-06-15 04:29:37.081175 7f889401f700  0 log_channel(cluster) log [WRN] :
slow request 60.974528 seconds old, received at 2017-06-15 04:
28:36.106598: client_request(client.2622511:116375892 setattr size=0
#100025b3554 2017-06-15 04:28:36.104928) currently acquired locks

and some for getattr:

2017-06-15 04:29:42.081224 7f889401f700  0 log_channel(cluster) log [WRN] :
slow request 32.225883 seconds old, received at 2017-06-15 04:
29:09.855302: client_request(client.2622511:116380541 getattr pAsLsXsFs
#100025b4d37 2017-06-15 04:29:09.853772) currently failed to rdloc
k, waiting

And a "client not responding to mclientcaps revoke" warning:

2017-06-15 04:31:12.084561 7f889401f700  0 log_channel(cluster) log [WRN] :
client.2344872 isn't responding to mclientcaps(revoke), ino 100025b4d37
pending pAsxLsXsxFcb issued pAsxLsXsxFsxcrwb, sent 122.229172 seconds ag

These issues seemed to have cleared once the faulty OSD was marked out.

In general I have noticed the NFS processes exporting Cephfs do seem to
spend a lot of time in 'D' state, with WCHAN as 'lock_page', compared with
a NFS server exporting a local file system. Also, NFS performance hasn't
been great with small reads/writes, particularly writes with the default
sync export option, I've had to export with async for the time-being. I
haven't had a chance to troubleshoot this in any depth yet, just mentioning
in case it's relevant.

Thanks,
David
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] CephFS | flapping OSD locked up NFS

2017-06-19 Thread John Petrini

Hi David,

While I have no personal experience with this; from what I've been told, if
you're going to export cephfs over NFS it's recommended that you use a
userspace implementation of NFS (like nfs-ganesha) rather than
nfs-kernel-server. This may be the source of you issues and might be worth
testing. I'd be interested to hear the results if you do.

___

John Petrini
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] radosgw: scrub causing slow requests in the md log

2017-06-19 Thread Dan van der Ster

On Thu, Jun 15, 2017 at 7:56 PM, Casey Bodley  wrote:
>
> On 06/14/2017 05:59 AM, Dan van der Ster wrote:
>>
>> Dear ceph users,
>>
>> Today we had O(100) slow requests which were caused by deep-scrubbing
>> of the metadata log:
>>
>> 2017-06-14 11:07:55.373184 osd.155
>> [2001:1458:301:24::100:d]:6837/3817268 7387 : cluster [INF] 24.1d
>> deep-scrub starts
>> ...
>> 2017-06-14 11:22:04.143903 osd.155
>> [2001:1458:301:24::100:d]:6837/3817268 8276 : cluster [WRN] slow
>> request 480.140904 seconds old, received at 2017-06-14
>> 11:14:04.002913: osd_op(client.3192010.0:11872455 24.be8b305d
>> meta.log.8d4fcb63-c314-4f9a-b3b3-0e61719ec258.54 [call log.add] snapc
>> 0=[] ondisk+write+known_if_redirected e7752) currently waiting for
>> scrub
>> ...
>> 2017-06-14 11:22:06.729306 osd.155
>> [2001:1458:301:24::100:d]:6837/3817268 8277 : cluster [INF] 24.1d
>> deep-scrub ok
>>
>> We have log_meta: true, log_data: false on this (our only) region [1],
>> which IIRC we setup to enable indexless buckets.
>>
>> I'm obviously unfamiliar with rgw meta and data logging, and have a
>> few questions:
>>
>>   1. AFAIU, it is used by the rgw multisite feature. Is it safe to turn
>> it off when not using multisite?
>
>
> It's a good idea to turn that off, yes.
>
> First, make sure that you have configured a default realm/zonegroup/zone:
>
> $ radosgw-admin realm default --rgw-realm   (you can determine
> realm name from 'radosgw-admin realm list')
> $ radosgw-admin zonegroup default --rgw-zonegroup default
> $ radosgw-admin zone default --rgw-zone default
>

Thanks. This had already been done, as confirmed with radosgw-admin
realm get-default.

> Then you can modify the zonegroup (aka region):
>
> $ radosgw-admin zonegroup get > zonegroup.json
> $ sed -i 's/log_meta": "true/log_meta":"false/' zonegroup.json
> $ radosgw-admin zonegroup set < zonegroup.json
>
> Then commit the updated period configuration:
>
> $ radosgw-admin period update --commit
>
> Verify that the resulting period contains "log_meta": "false". Take care
> with future radosgw-admin commands on the zone/zonegroup, as they may revert
> log_meta back to true [1].
>

Great, this worked. FYI (and for others trying this in future), the
period update --commit blocks all rgws for ~30s while they reload the
realm.

>>
>>   2. I started dumping the output of radosgw-admin mdlog list, and
>> cancelled it after a few minutes. It had already dumped 3GB of json
>> and I don't know how much more it would have written. Is something
>> supposed to be trimming the mdlog automatically?
>
>
> There is automated mdlog trimming logic in master, but not jewel/kraken. And
> this logic won't be triggered if there is only one zone [2].
>
>>
>>   3. ceph df doesn't show the space occupied by omap objects -- is
>> there an indirect way to see how much space these are using?
>
>
> You can inspect the osd's omap directory: du -sh
> /var/lib/ceph/osd/osd0/current/omap
>

Cool. osd.155 (holding shard 54) has 3.3GB of omap, compared with
~100-300MB on other OSDs.

>>   4. mdlog status has markers going back to 2016-10, see [2]. I suppose
>> we're not using this feature correctly? :-/
>>
>>   5. Suppose I were to set log_meta: false -- how would I delete these
>> log entries now that they are not needed?
>
>
> There is a 'radosgw-admin mdlog trim' command that can be used to trim them
> one --shard-id (from 0 to 63) at a time. An entire log shard can be trimmed
> with:
>
> $ radosgw-admin mdlog trim --shard-id 0 --period
> 8d4fcb63-c314-4f9a-b3b3-0e61719ec258 --end-time 2020-1-1
>
> *However*, there is a risk that bulk operations on large omaps will affect
> cluster health by taking down OSDs. Not only can this bulk deletion take
> long enough to trigger the osd/filestore suicide timeouts, the resulting
> leveldb compaction after deletion is likely to block other omap operations
> and hit the timeouts as well. This seems likely in your case, based on the
> fact that you're already having issues with scrub.

We did this directly on shard 54, and indeed the command is taking a
looong time (but with no slow requests or osds being marked down).
After 45 minutes, du is still 3.3GB, so I can't tell if it's
progressing. I see ~1000 _omap_rmkeys messages every ~2 seconds:

2017-06-19 16:57:34.347222 7fc602640700 15
filestore(/var/lib/ceph/osd/ceph-155) _omap_rmkeys
24.1d_head/#24:ba0cd17d:::met
a.log.8d4fcb63-c314-4f9a-b3b3-0e61719ec258.54:head#
2017-06-19 16:57:34.347319 7fc602640700 10 filestore oid:
#24:ba0cd17d:::meta.log.8d4fcb63-c314-4f9a-b3b3-0e61719ec258.54:h
ead# not skipping op, *spos 67765185.0.0
2017-06-19 16:57:34.347326 7fc602640700 10 filestore  > header.spos 0.0.0
2017-06-19 16:57:34.347351 7fc602640700 15
filestore(/var/lib/ceph/osd/ceph-155) _omap_rmkeys
24.1d_head/#24:ba0cd17d:::met
a.log.8d4fcb63-c314-4f9a-b3b3-0e61719ec258.54:head#
2017-06-19 16:57:34.347373 7fc602640700 10 filestore oid:
#24:ba0cd17d:::meta.log.8d4fcb63-c314-4f9a-b3b3-0e61719ec258.54:h
ead# not skipping op, *spo

Re: [ceph-users] radosgw: scrub causing slow requests in the md log

2017-06-19 Thread Casey Bodley


Hi Dan,

That's good news that it can remove 1000 keys at a time without hitting 
timeouts. The output of 'du' will depend on when the leveldb compaction 
runs. If you do find that compaction leads to suicide timeouts on this 
osd (you would see a lot of 'leveldb:' output in the log), consider 
running offline compaction by adding 'leveldb compact on mount = true' 
to the osd config and restarting.


Casey

On 06/19/2017 11:01 AM, Dan van der Ster wrote:

On Thu, Jun 15, 2017 at 7:56 PM, Casey Bodley  wrote:

On 06/14/2017 05:59 AM, Dan van der Ster wrote:

Dear ceph users,

Today we had O(100) slow requests which were caused by deep-scrubbing
of the metadata log:

2017-06-14 11:07:55.373184 osd.155
[2001:1458:301:24::100:d]:6837/3817268 7387 : cluster [INF] 24.1d
deep-scrub starts
...
2017-06-14 11:22:04.143903 osd.155
[2001:1458:301:24::100:d]:6837/3817268 8276 : cluster [WRN] slow
request 480.140904 seconds old, received at 2017-06-14
11:14:04.002913: osd_op(client.3192010.0:11872455 24.be8b305d
meta.log.8d4fcb63-c314-4f9a-b3b3-0e61719ec258.54 [call log.add] snapc
0=[] ondisk+write+known_if_redirected e7752) currently waiting for
scrub
...
2017-06-14 11:22:06.729306 osd.155
[2001:1458:301:24::100:d]:6837/3817268 8277 : cluster [INF] 24.1d
deep-scrub ok

We have log_meta: true, log_data: false on this (our only) region [1],
which IIRC we setup to enable indexless buckets.

I'm obviously unfamiliar with rgw meta and data logging, and have a
few questions:

   1. AFAIU, it is used by the rgw multisite feature. Is it safe to turn
it off when not using multisite?


It's a good idea to turn that off, yes.

First, make sure that you have configured a default realm/zonegroup/zone:

$ radosgw-admin realm default --rgw-realm   (you can determine
realm name from 'radosgw-admin realm list')
$ radosgw-admin zonegroup default --rgw-zonegroup default
$ radosgw-admin zone default --rgw-zone default


Thanks. This had already been done, as confirmed with radosgw-admin
realm get-default.


Then you can modify the zonegroup (aka region):

$ radosgw-admin zonegroup get > zonegroup.json
$ sed -i 's/log_meta": "true/log_meta":"false/' zonegroup.json
$ radosgw-admin zonegroup set < zonegroup.json

Then commit the updated period configuration:

$ radosgw-admin period update --commit

Verify that the resulting period contains "log_meta": "false". Take care
with future radosgw-admin commands on the zone/zonegroup, as they may revert
log_meta back to true [1].


Great, this worked. FYI (and for others trying this in future), the
period update --commit blocks all rgws for ~30s while they reload the
realm.


   2. I started dumping the output of radosgw-admin mdlog list, and
cancelled it after a few minutes. It had already dumped 3GB of json
and I don't know how much more it would have written. Is something
supposed to be trimming the mdlog automatically?


There is automated mdlog trimming logic in master, but not jewel/kraken. And
this logic won't be triggered if there is only one zone [2].


   3. ceph df doesn't show the space occupied by omap objects -- is
there an indirect way to see how much space these are using?


You can inspect the osd's omap directory: du -sh
/var/lib/ceph/osd/osd0/current/omap


Cool. osd.155 (holding shard 54) has 3.3GB of omap, compared with
~100-300MB on other OSDs.


   4. mdlog status has markers going back to 2016-10, see [2]. I suppose
we're not using this feature correctly? :-/

   5. Suppose I were to set log_meta: false -- how would I delete these
log entries now that they are not needed?


There is a 'radosgw-admin mdlog trim' command that can be used to trim them
one --shard-id (from 0 to 63) at a time. An entire log shard can be trimmed
with:

$ radosgw-admin mdlog trim --shard-id 0 --period
8d4fcb63-c314-4f9a-b3b3-0e61719ec258 --end-time 2020-1-1

*However*, there is a risk that bulk operations on large omaps will affect
cluster health by taking down OSDs. Not only can this bulk deletion take
long enough to trigger the osd/filestore suicide timeouts, the resulting
leveldb compaction after deletion is likely to block other omap operations
and hit the timeouts as well. This seems likely in your case, based on the
fact that you're already having issues with scrub.

We did this directly on shard 54, and indeed the command is taking a
looong time (but with no slow requests or osds being marked down).
After 45 minutes, du is still 3.3GB, so I can't tell if it's
progressing. I see ~1000 _omap_rmkeys messages every ~2 seconds:

2017-06-19 16:57:34.347222 7fc602640700 15
filestore(/var/lib/ceph/osd/ceph-155) _omap_rmkeys
24.1d_head/#24:ba0cd17d:::met
a.log.8d4fcb63-c314-4f9a-b3b3-0e61719ec258.54:head#
2017-06-19 16:57:34.347319 7fc602640700 10 filestore oid:
#24:ba0cd17d:::meta.log.8d4fcb63-c314-4f9a-b3b3-0e61719ec258.54:h
ead# not skipping op, *spos 67765185.0.0
2017-06-19 16:57:34.347326 7fc602640700 10 filestore  > header.spos 0.0.0
2017-06-19 16:57:34.347351 7fc602640700 15
filestore(/var/lib

Re: [ceph-users] EXT: ceph-lvm - a tool to deploy OSDs from LVM volumes

2017-06-19 Thread Willem Jan Withagen

On 19-6-2017 16:13, Alfredo Deza wrote:
> On Mon, Jun 19, 2017 at 9:27 AM, John Spray  wrote:
>> On Fri, Jun 16, 2017 at 7:23 PM, Alfredo Deza  wrote:
>>> On Fri, Jun 16, 2017 at 2:11 PM, Warren Wang - ISD
>>>  wrote:
 I would prefer that this is something more generic, to possibly support 
 other backends one day, like ceph-volume. Creating one tool per backend 
 seems silly.

 Also, ceph-lvm seems to imply that ceph itself has something to do with 
 lvm, which it really doesn’t. This is simply to deal with the underlying 
 disk. If there’s resistance to something more generic like ceph-volume, 
 then it should at least be called something like ceph-disk-lvm.
>>>
>>> Sage, you had mentioned the need for "composable" tools for this, and
>>> I think that if we go with `ceph-volume` we could allow plugins for
>>> each strategy. We are starting with `lvm` support so that would look
>>> like: `ceph-volume lvm`
>>>
>>> The `lvm` functionality could be implemented as a plugin itself, and
>>> when we start working with supporting regular disks, then `ceph-volume
>>> disk` can come along, etc...
>>>
>>> It would also open the door for anyone to be able to write a plugin to
>>> `ceph-volume` to implement their own logic, while at the same time
>>> re-using most of what we are implementing today: logging, reporting,
>>> systemd support, OSD metadata, etc...
>>>
>>> If we were to separate these into single-purpose tools, all those
>>> would need to be re-done.
>>
>> Couple of thoughts:
>>  - let's keep this in the Ceph repository unless there's a strong
>> reason not to -- it'll enable the tool's branching to automatically
>> happen in line with Ceph's.
> 
> For initial development this is easier to have as a separate tool from
> the Ceph source tree. There are some niceties about being in-source,
> like
> not being required to deal with what features we are supporting on what 
> version.

Just my observation, need not be true at all, but ...

As long as you do not have it interact with the other tools, that is
true. But as soon as you start depending on ceph-{disk-new,volume} in
other parts of the mainstream ceph-code you have created a ty-in with
the versioning and will require it to be maintained in the same way.


> Although there is no code yet, I consider the project in an "unstable"
> state, it will move incredibly fast (it has to!) and that puts it at
> odds with the cadence
> of Ceph. Specifically, these two things are very important right now:
> 
> * faster release cycles
> * easier and faster to test
> 
> I am not ruling out going into Ceph at some point though, ideally when
> things slow down and become stable.
> 
> Is your argument only to have parity in Ceph's branching? That was
> never a problem with out-of-tree tools like ceph-deploy for example.

Some of the external targets move so fast (ceph-asible) that I have
given up on trying to see what is going on. For this tool I'd like it to
do the ZFS/FreeBSD stuff as a plugin-module.
In the expectation that it will supersede the current ceph-disk,
otherwise there are 2 place to maintain this type of code.

>>  - I agree with others that a single entrypoint (i.e. executable) will
>> be more manageable than having conspicuously separate tools, but we
>> shouldn't worry too much about making things "plugins" as such -- they
>> can just be distinct code inside one tool, sharing as much or as
>> little as they need.
>>
>> What if we delivered this set of LVM functionality as "ceph-disk lvm
>> ..." commands to minimise the impression that the tooling is changing,
>> even if internally it's all new/distinct code?
> 
> That sounded appealing initially, but because we are introducing a
> very different API, it would look odd to interact
> with other subcommands without a normalized interaction. For example,
> for 'prepare' this would be:
> 
> ceph-disk prepare [...]
> 
> And for LVM it would possible be
> 
> ceph-disk lvm prepare [...]
> 
> The level at which these similar actions are presented imply that one
> may be a preferred (or even default) one, while the other one
> isn't.

Is this about API "cosmetics"? Because there is a lot of examples
suggestions and other stuff out there that is using the old syntax.

And why not do a hybrid? it will require a bit more commandline parsing,
but that is not a major dealbreaker.

so the line would look like
ceph-disk [lvm,zfs,disk,partition] prepare [...]
and the first parameter is optional reverting to the current supported
systems.

You can always start warning users that their API usage is old style,
and that it is going to go away in a next release.

> At one point we are going to add regular disk worfklows (replacing
> ceph-disk functionality) and then it would become even more
> confusing to keep it there (or do you think at that point we could split?)

The more separate you go, the more akward it is going to be when things
start to melt together.

>> At the risk of being a bit

Re: [ceph-users] EXT: ceph-lvm - a tool to deploy OSDs from LVM volumes

2017-06-19 Thread John Spray

On Mon, Jun 19, 2017 at 3:13 PM, Alfredo Deza  wrote:
> On Mon, Jun 19, 2017 at 9:27 AM, John Spray  wrote:
>> On Fri, Jun 16, 2017 at 7:23 PM, Alfredo Deza  wrote:
>>> On Fri, Jun 16, 2017 at 2:11 PM, Warren Wang - ISD
>>>  wrote:
 I would prefer that this is something more generic, to possibly support 
 other backends one day, like ceph-volume. Creating one tool per backend 
 seems silly.

 Also, ceph-lvm seems to imply that ceph itself has something to do with 
 lvm, which it really doesn’t. This is simply to deal with the underlying 
 disk. If there’s resistance to something more generic like ceph-volume, 
 then it should at least be called something like ceph-disk-lvm.
>>>
>>> Sage, you had mentioned the need for "composable" tools for this, and
>>> I think that if we go with `ceph-volume` we could allow plugins for
>>> each strategy. We are starting with `lvm` support so that would look
>>> like: `ceph-volume lvm`
>>>
>>> The `lvm` functionality could be implemented as a plugin itself, and
>>> when we start working with supporting regular disks, then `ceph-volume
>>> disk` can come along, etc...
>>>
>>> It would also open the door for anyone to be able to write a plugin to
>>> `ceph-volume` to implement their own logic, while at the same time
>>> re-using most of what we are implementing today: logging, reporting,
>>> systemd support, OSD metadata, etc...
>>>
>>> If we were to separate these into single-purpose tools, all those
>>> would need to be re-done.
>>
>> Couple of thoughts:
>>  - let's keep this in the Ceph repository unless there's a strong
>> reason not to -- it'll enable the tool's branching to automatically
>> happen in line with Ceph's.
>
> For initial development this is easier to have as a separate tool from
> the Ceph source tree. There are some niceties about being in-source,
> like
> not being required to deal with what features we are supporting on what 
> version.
>
> Although there is no code yet, I consider the project in an "unstable"
> state, it will move incredibly fast (it has to!) and that puts it at
> odds with the cadence
> of Ceph. Specifically, these two things are very important right now:
>
> * faster release cycles
> * easier and faster to test

I think having one part of Ceph on a different release cycle to the
rest of Ceph is an even more dramatic thing than having it in a
separate git repository.

It seems like there is some dissatisfaction with how the Ceph project
as whole is doing things that is driving you to try and do work
outside of the repo where the rest of the project lives -- if the
release cycles or test infrastructure within Ceph are not adequate for
the tool that formats drives for OSDs, what can we do to fix them?

> I am not ruling out going into Ceph at some point though, ideally when
> things slow down and become stable.

I think that the decision about where this code lives needs to be made
before it is released -- moving it later is rather awkward.  If you'd
rather not have the code in Ceph master until you're happy with it,
then a branch would be the natural way to do that.

> Is your argument only to have parity in Ceph's branching? That was
> never a problem with out-of-tree tools like ceph-deploy for example.

I guess my argument isn't so much an argument as it is an assertion
that if you want to go your own way then you need to have a really
strong clear reason.

Put a bit bluntly: if CephFS, RBD, RGW, the mon and the OSD can all
successfully co-habit in one git repository, what makes the CLI that
formats drives so special that it needs its own?

>>  - I agree with others that a single entrypoint (i.e. executable) will
>> be more manageable than having conspicuously separate tools, but we
>> shouldn't worry too much about making things "plugins" as such -- they
>> can just be distinct code inside one tool, sharing as much or as
>> little as they need.
>>
>> What if we delivered this set of LVM functionality as "ceph-disk lvm
>> ..." commands to minimise the impression that the tooling is changing,
>> even if internally it's all new/distinct code?
>
> That sounded appealing initially, but because we are introducing a
> very different API, it would look odd to interact
> with other subcommands without a normalized interaction. For example,
> for 'prepare' this would be:
>
> ceph-disk prepare [...]
>
> And for LVM it would possible be
>
> ceph-disk lvm prepare [...]
>
> The level at which these similar actions are presented imply that one
> may be a preferred (or even default) one, while the other one
> isn't.
>
> At one point we are going to add regular disk worfklows (replacing
> ceph-disk functionality) and then it would become even more
> confusing to keep it there (or do you think at that point we could split?)
>
>>
>> At the risk of being a bit picky about language, I don't like calling
>> this anything with "volume" in the name, because afaik we've never
>> ever called OSDs or the drives they occupy "vo

Re: [ceph-users] Mon Create currently at the state of probing

2017-06-19 Thread Jim Forde

No, I don’t think Ubuntu 14.04 has it enabled by default.
Double checked.
Sudo ufw status
Status: inactive.
No other symptoms of a firewall.

From: Sasha Litvak [mailto:alexander.v.lit...@gmail.com]
Sent: Sunday, June 18, 2017 11:10 PM
To: Jim Forde 
Cc: ceph-users@lists.ceph.com
Subject: Re: [ceph-users] Mon Create currently at the state of probing

Do you have firewall on on new server by any chance?

On Sun, Jun 18, 2017 at 8:18 PM, Jim Forde 
mailto:j...@mninc.net>> wrote:
I have an eight node ceph cluster running Jewel 10.2.5.
One Ceph-Deploy node. Four OSD nodes and three Monitor nodes.
Ceph-Deploy node is r710T
OSD’s are r710a, r710b, r710c, and r710d.
Mon’s are r710e, r710f, and r710g.
Name resolution is in Hosts file on each node.

Successfully removed Monitor r710e from cluster
Upgraded ceph-deploy node r710T to Kraken 11.2.0 (ceph -v returns 11.2.0 all 
other nodes are still 10.2.5)
Ceph -s is HEALTH_OK 2 mons
Rebuilt r710e with same OS (ubutnu 14.04 LTS) and same IP address.
“Ceph-deploy install –release kraken r710e” is successful with ceph -v 
returning 11.2.0 on node r710e
“ceph-deploy admin r710e” is successful and puts the keyring in 
/etc/ceph/ceph.client.admin.keyring
“sudo chmod +r /etc/ceph/ceph.client.admin.keyring”

Everything seems successful to this point.
Then I run
“ceph-deploy mon create r710e” and I get the following:

[r710e][DEBUG ] 

[r710e][INFO  ] monitor: mon.r710e is currently at the state of probing
[r710e][INFO  ] Running command: sudo ceph --cluster=ceph --admin-daemon 
/var/run/ceph/ceph-mon.r710e.asok mon_status
[r710e][WARNIN] r710e is not defined in `mon initial members`
[r710e][WARNIN] monitor r710e does not exist in monmap

R710e is in the ‘mon initial members’.
It is in the ceph.conf file correctly (it was running before and the parameters 
have not changed) Public and Cluster networks are defined.
It is the same physical server with the same (but freshly installed) OS and 
same IP address.
Looking at the local daemon mon_status on all three monitors I see.
R710f and r710g see r710e as an “extra_probe_peers”
R710e sees r710f and r710g as “extra_probe_peers”

“ceph-deploy purge r710e” and “ceph-deploy purgedata r710e” with a reboot of 
the 2 mon’s brings cluster back to HEALTH_OK

Not sure what is going on. Is Ceph allergic to single node upgrades? Afraid to 
push the upgrade on all mon’s.

What I have done:
Rebuilt r710e with different hardware. Rebuilt with different OS. Rebuilt with 
different name and IP address. Same result.
I have also restructured the NTP server. R710T is my NTP server on the cluster. 
(HEALTH_OK prior to updating) I reset all Mon nodes to get time from Ubuntu 
default NTP sources. Same error.

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] EXT: ceph-lvm - a tool to deploy OSDs from LVM volumes

2017-06-19 Thread Alfredo Deza

On Mon, Jun 19, 2017 at 12:55 PM, John Spray  wrote:
> On Mon, Jun 19, 2017 at 3:13 PM, Alfredo Deza  wrote:
>> On Mon, Jun 19, 2017 at 9:27 AM, John Spray  wrote:
>>> On Fri, Jun 16, 2017 at 7:23 PM, Alfredo Deza  wrote:
 On Fri, Jun 16, 2017 at 2:11 PM, Warren Wang - ISD
  wrote:
> I would prefer that this is something more generic, to possibly support 
> other backends one day, like ceph-volume. Creating one tool per backend 
> seems silly.
>
> Also, ceph-lvm seems to imply that ceph itself has something to do with 
> lvm, which it really doesn’t. This is simply to deal with the underlying 
> disk. If there’s resistance to something more generic like ceph-volume, 
> then it should at least be called something like ceph-disk-lvm.

 Sage, you had mentioned the need for "composable" tools for this, and
 I think that if we go with `ceph-volume` we could allow plugins for
 each strategy. We are starting with `lvm` support so that would look
 like: `ceph-volume lvm`

 The `lvm` functionality could be implemented as a plugin itself, and
 when we start working with supporting regular disks, then `ceph-volume
 disk` can come along, etc...

 It would also open the door for anyone to be able to write a plugin to
 `ceph-volume` to implement their own logic, while at the same time
 re-using most of what we are implementing today: logging, reporting,
 systemd support, OSD metadata, etc...

 If we were to separate these into single-purpose tools, all those
 would need to be re-done.
>>>
>>> Couple of thoughts:
>>>  - let's keep this in the Ceph repository unless there's a strong
>>> reason not to -- it'll enable the tool's branching to automatically
>>> happen in line with Ceph's.
>>
>> For initial development this is easier to have as a separate tool from
>> the Ceph source tree. There are some niceties about being in-source,
>> like
>> not being required to deal with what features we are supporting on what 
>> version.
>>
>> Although there is no code yet, I consider the project in an "unstable"
>> state, it will move incredibly fast (it has to!) and that puts it at
>> odds with the cadence
>> of Ceph. Specifically, these two things are very important right now:
>>
>> * faster release cycles
>> * easier and faster to test
>
> I think having one part of Ceph on a different release cycle to the
> rest of Ceph is an even more dramatic thing than having it in a
> separate git repository.
>
> It seems like there is some dissatisfaction with how the Ceph project
> as whole is doing things that is driving you to try and do work
> outside of the repo where the rest of the project lives -- if the
> release cycles or test infrastructure within Ceph are not adequate for
> the tool that formats drives for OSDs, what can we do to fix them?

It isn't Ceph the project :)

Not every tool about Ceph has to come from ceph.git, in which case the
argument could be flipped around: why isn't ceph-installer,
ceph-ansible, ceph-deploy, radosgw-agent, etc... all coming from
within ceph.git ?

They don't necessarily need to be tied in. In the case of
ceph-installer: there is nothing ceph-specific it needs from ceph.git
to run, why force it in?

>
>> I am not ruling out going into Ceph at some point though, ideally when
>> things slow down and become stable.
>
> I think that the decision about where this code lives needs to be made
> before it is released -- moving it later is rather awkward.  If you'd
> rather not have the code in Ceph master until you're happy with it,
> then a branch would be the natural way to do that.
>

The decision was made a few weeks ago, and I really don't think we
should be in ceph.git, but I am OK to keep
discussing on the reasoning.

>> Is your argument only to have parity in Ceph's branching? That was
>> never a problem with out-of-tree tools like ceph-deploy for example.
>
> I guess my argument isn't so much an argument as it is an assertion
> that if you want to go your own way then you need to have a really
> strong clear reason.

Many! Like I mentioned: easier testing, faster release cycle, can
publish in any package index, doesn't need anything in ceph.git to
operate, etc..

>
> Put a bit bluntly: if CephFS, RBD, RGW, the mon and the OSD can all
> successfully co-habit in one git repository, what makes the CLI that
> formats drives so special that it needs its own?

Sure. Again, there is nothing some of our tooling needs from ceph.git
so I don't see why the need to have then in-tree. I am sure RGW and
other
components do need to consume Ceph code in some way? I don't even
think ceph-disk should be in tree for the same reason. I believe that
in the very
beginning it was just so easy to have everything be built from ceph.git

Even in some cases like pybind, it has been requested numerous times
to get them on separate package indexes like PyPI, but that has always
been
*tremendously* difficult: http://tra

Re: [ceph-users] EXT: ceph-lvm - a tool to deploy OSDs from LVM volumes

2017-06-19 Thread Alfredo Deza

On Mon, Jun 19, 2017 at 11:37 AM, Willem Jan Withagen  wrote:
> On 19-6-2017 16:13, Alfredo Deza wrote:
>> On Mon, Jun 19, 2017 at 9:27 AM, John Spray  wrote:
>>> On Fri, Jun 16, 2017 at 7:23 PM, Alfredo Deza  wrote:
 On Fri, Jun 16, 2017 at 2:11 PM, Warren Wang - ISD
  wrote:
> I would prefer that this is something more generic, to possibly support 
> other backends one day, like ceph-volume. Creating one tool per backend 
> seems silly.
>
> Also, ceph-lvm seems to imply that ceph itself has something to do with 
> lvm, which it really doesn’t. This is simply to deal with the underlying 
> disk. If there’s resistance to something more generic like ceph-volume, 
> then it should at least be called something like ceph-disk-lvm.

 Sage, you had mentioned the need for "composable" tools for this, and
 I think that if we go with `ceph-volume` we could allow plugins for
 each strategy. We are starting with `lvm` support so that would look
 like: `ceph-volume lvm`

 The `lvm` functionality could be implemented as a plugin itself, and
 when we start working with supporting regular disks, then `ceph-volume
 disk` can come along, etc...

 It would also open the door for anyone to be able to write a plugin to
 `ceph-volume` to implement their own logic, while at the same time
 re-using most of what we are implementing today: logging, reporting,
 systemd support, OSD metadata, etc...

 If we were to separate these into single-purpose tools, all those
 would need to be re-done.
>>>
>>> Couple of thoughts:
>>>  - let's keep this in the Ceph repository unless there's a strong
>>> reason not to -- it'll enable the tool's branching to automatically
>>> happen in line with Ceph's.
>>
>> For initial development this is easier to have as a separate tool from
>> the Ceph source tree. There are some niceties about being in-source,
>> like
>> not being required to deal with what features we are supporting on what 
>> version.
>
> Just my observation, need not be true at all, but ...
>
> As long as you do not have it interact with the other tools, that is
> true. But as soon as you start depending on ceph-{disk-new,volume} in
> other parts of the mainstream ceph-code you have created a ty-in with
> the versioning and will require it to be maintained in the same way.
>
>
>> Although there is no code yet, I consider the project in an "unstable"
>> state, it will move incredibly fast (it has to!) and that puts it at
>> odds with the cadence
>> of Ceph. Specifically, these two things are very important right now:
>>
>> * faster release cycles
>> * easier and faster to test
>>
>> I am not ruling out going into Ceph at some point though, ideally when
>> things slow down and become stable.
>>
>> Is your argument only to have parity in Ceph's branching? That was
>> never a problem with out-of-tree tools like ceph-deploy for example.
>
> Some of the external targets move so fast (ceph-asible) that I have
> given up on trying to see what is going on. For this tool I'd like it to
> do the ZFS/FreeBSD stuff as a plugin-module.
> In the expectation that it will supersede the current ceph-disk,
> otherwise there are 2 place to maintain this type of code.

Yes, the idea is that it will be pluggable from the start, and that it
will supersede current ceph-disk (but not immediately)

>
>>>  - I agree with others that a single entrypoint (i.e. executable) will
>>> be more manageable than having conspicuously separate tools, but we
>>> shouldn't worry too much about making things "plugins" as such -- they
>>> can just be distinct code inside one tool, sharing as much or as
>>> little as they need.
>>>
>>> What if we delivered this set of LVM functionality as "ceph-disk lvm
>>> ..." commands to minimise the impression that the tooling is changing,
>>> even if internally it's all new/distinct code?
>>
>> That sounded appealing initially, but because we are introducing a
>> very different API, it would look odd to interact
>> with other subcommands without a normalized interaction. For example,
>> for 'prepare' this would be:
>>
>> ceph-disk prepare [...]
>>
>> And for LVM it would possible be
>>
>> ceph-disk lvm prepare [...]
>>
>> The level at which these similar actions are presented imply that one
>> may be a preferred (or even default) one, while the other one
>> isn't.
>
> Is this about API "cosmetics"? Because there is a lot of examples
> suggestions and other stuff out there that is using the old syntax.
>
> And why not do a hybrid? it will require a bit more commandline parsing,
> but that is not a major dealbreaker.
>
> so the line would look like
> ceph-disk [lvm,zfs,disk,partition] prepare [...]
> and the first parameter is optional reverting to the current supported
> systems.
>
> You can always start warning users that their API usage is old style,
> and that it is going to go away in a next release.
>
>> At one point we are go

Re: [ceph-users] Mon Create currently at the state of probing

2017-06-19 Thread David Turner

Question... Why are you reinstalling the node, removing the mon from the
cluster, and adding it back into the cluster to upgrade to Kraken?  The
upgrade path from 10.2.5 to 11.2.0 is an acceptable upgrade path.  If you
just needed to reinstall the OS for some reason, then you can keep the
/var/lib/ceph/mon/r710e/ folder in tact and not need to remove/re-add the
mon to reisntall the OS.  Even if you upgraded from 14.04 to 16.04, this
would work.  You would want to change the upstart file in the daemon's
folder to systemd and make sure it works with systemctl just fine, but the
daemon itself would be fine.

If you are hell-bent on doing this the hardest way I've ever heard of, then
you might want to check out this Note from the docs for adding/removing a
mon.  Since you are far enough removed from the initial ceph-deploy, you
have removed r710e from your configuration and if you don't have a public
network statement in your ceph.conf file... that could be your problem for
the probing.

http://docs.ceph.com/docs/kraken/rados/deployment/ceph-deploy-mon/
"

Note


When adding a monitor on a host that was not in hosts initially defined
with the ceph-deploy new command, a public network statement needs to be
added to the ceph.conf file."


On Mon, Jun 19, 2017 at 1:09 PM Jim Forde  wrote:

> No, I don’t think Ubuntu 14.04 has it enabled by default.
>
> Double checked.
>
> Sudo ufw status
>
> Status: inactive.
>
> No other symptoms of a firewall.
>
>
>
> *From:* Sasha Litvak [mailto:alexander.v.lit...@gmail.com]
> *Sent:* Sunday, June 18, 2017 11:10 PM
> *To:* Jim Forde 
> *Cc:* ceph-users@lists.ceph.com
> *Subject:* Re: [ceph-users] Mon Create currently at the state of probing
>
>
>
> Do you have firewall on on new server by any chance?
>
>
>
> On Sun, Jun 18, 2017 at 8:18 PM, Jim Forde  wrote:
>
> I have an eight node ceph cluster running Jewel 10.2.5.
>
> One Ceph-Deploy node. Four OSD nodes and three Monitor nodes.
>
> Ceph-Deploy node is r710T
>
> OSD’s are r710a, r710b, r710c, and r710d.
>
> Mon’s are r710e, r710f, and r710g.
>
> Name resolution is in Hosts file on each node.
>
>
>
> Successfully removed Monitor r710e from cluster
>
> Upgraded ceph-deploy node r710T to Kraken 11.2.0 (ceph -v returns 11.2.0
> all other nodes are still 10.2.5)
>
> Ceph -s is HEALTH_OK 2 mons
>
> Rebuilt r710e with same OS (ubutnu 14.04 LTS) and same IP address.
>
> “Ceph-deploy install –release kraken r710e” is successful with ceph -v
> returning 11.2.0 on node r710e
>
> “ceph-deploy admin r710e” is successful and puts the keyring in
> /etc/ceph/ceph.client.admin.keyring
>
> “sudo chmod +r /etc/ceph/ceph.client.admin.keyring”
>
>
>
> Everything seems successful to this point.
>
> Then I run
>
> “ceph-deploy mon create r710e” and I get the following:
>
>
>
> [r710e][DEBUG ]
> 
>
> [r710e][INFO  ] monitor: mon.r710e is currently at the state of probing
>
> [r710e][INFO  ] Running command: sudo ceph --cluster=ceph --admin-daemon
> /var/run/ceph/ceph-mon.r710e.asok mon_status
>
> [r710e][WARNIN] r710e is not defined in `mon initial members`
>
> [r710e][WARNIN] monitor r710e does not exist in monmap
>
>
>
> R710e is in the ‘mon initial members’.
>
> It is in the ceph.conf file correctly (it was running before and the
> parameters have not changed) Public and Cluster networks are defined.
>
> It is the same physical server with the same (but freshly installed) OS
> and same IP address.
>
> Looking at the local daemon mon_status on all three monitors I see.
>
> R710f and r710g see r710e as an “extra_probe_peers”
>
> R710e sees r710f and r710g as “extra_probe_peers”
>
>
>
> “ceph-deploy purge r710e” and “ceph-deploy purgedata r710e” with a reboot
> of the 2 mon’s brings cluster back to HEALTH_OK
>
>
>
> Not sure what is going on. Is Ceph allergic to single node upgrades?
> Afraid to push the upgrade on all mon’s.
>
>
>
> What I have done:
>
> Rebuilt r710e with different hardware. Rebuilt with different OS. Rebuilt
> with different name and IP address. Same result.
>
> I have also restructured the NTP server. R710T is my NTP server on the
> cluster. (HEALTH_OK prior to updating) I reset all Mon nodes to get time
> from Ubuntu default NTP sources. Same error.
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] EXT: ceph-lvm - a tool to deploy OSDs from LVM volumes

2017-06-19 Thread Andrew Schoen

>>
>> I think having one part of Ceph on a different release cycle to the
>> rest of Ceph is an even more dramatic thing than having it in a
>> separate git repository.
>>
>> It seems like there is some dissatisfaction with how the Ceph project
>> as whole is doing things that is driving you to try and do work
>> outside of the repo where the rest of the project lives -- if the
>> release cycles or test infrastructure within Ceph are not adequate for
>> the tool that formats drives for OSDs, what can we do to fix them?
>
> It isn't Ceph the project :)

I think there needs to be a distinction between things that *are* ceph
(CephFS, RBD, RGW, MON, OSD) and things that might leverage ceph or
help with it's installation / usage (ceph-ansible, ceph-deploy,
ceph-installer, ceph-volume). I don't think the later group needs to
be in ceph.git.


>>
>> I guess my argument isn't so much an argument as it is an assertion
>> that if you want to go your own way then you need to have a really
>> strong clear reason.
>
> Many! Like I mentioned: easier testing, faster release cycle, can
> publish in any package index, doesn't need anything in ceph.git to
> operate, etc..

I agree with all these points. I would add that having ceph-volume in
a separate git repo greatly simplifies the CI interaction with the
project. When I submit a PR to ceph-volume.git I'd want all our unit
tests run and any new docs automatically published. If this lived in
ceph.git it's very clumsy (maybe not possible) to have a jenkins react
and start jobs that only pertain to the code being changed. If I
submit new ceph-volume code why would I need make check ran or ceph
packages built?

Having ceph-disk tied to ceph.git (and it's release cycle) has caused
problems with ceph-docker in the past. We've had a race condition (in
ceph-disk) that exposes itself in our CI present for quite some time
even though the patch was merged to master upstream. I think the fixed
missed the 2.3 downstream release as well because it wasn't back
ported quickly enough. Keeping tools like ceph-disk or ceph-volume
outside of ceph.git would allow us to merge those fixes back into
ceph-docker more efficiently.

Maybe I don't understand the strong reasons for keeping ceph-volume in
ceph.git? If it's only for parity in branches, I never thought of
ceph-volume having a branch per version of ceph supported anyway. I'd
expect it to have numbered releases that support a documented number
of ceph releases, ceph-ansible works similarly here and I believe
ceph-deploy did as well.

- Andrew
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] Erasure Coding: Determine location of data and coding chunks

2017-06-19 Thread Jonas Jaszkowic

Hello all, I have a simple question:

I have an erasure coded pool with k = 2 data chunks and m = 3 coding chunks, 
how can I determine the location of the data and coding chunks? Given an object 
A
that is stored on n = k + m different OSDs I want to find out where (i.e. on 
which OSDs)
the data chunks are stored and where the coding chunks are stored.

Thank you! 

- Jonas
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] EXT: ceph-lvm - a tool to deploy OSDs from LVM volumes

2017-06-19 Thread Willem Jan Withagen




Op 19-6-2017 om 19:57 schreef Alfredo Deza:

On Mon, Jun 19, 2017 at 11:37 AM, Willem Jan Withagen  wrote:

On 19-6-2017 16:13, Alfredo Deza wrote:

On Mon, Jun 19, 2017 at 9:27 AM, John Spray  wrote:

On Fri, Jun 16, 2017 at 7:23 PM, Alfredo Deza  wrote:

On Fri, Jun 16, 2017 at 2:11 PM, Warren Wang - ISD
 wrote:


I would just try to glue it into ceph-disk in the most flexible way

We can't "glue it into ceph-disk" because we are proposing a
completely new way of doing things that
go against how ceph-disk works.


'mmm,

Not really a valid argument if you want the 2 to become equal.

I have limited python knowledge, but I can envision an outer wrapper 
that just call
the old version of ceph-disk as an external executable. User impact is 
thus reduced

to bare minimum.

Got admit that it is not very elegant, but it would work.

But I'll see what you guys come up with.
Best proof is always the code.

--WjW

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] ceph on raspberry pi - unable to locate package ceph-osd and ceph-mon

2017-06-19 Thread Gregory Farnum

On Sat, Jun 17, 2017 at 10:11 AM Craig Wilson  wrote:

> Hi ceph-users
>
> I'm look at ceph for a new storage cluster at work and have been trying to
> build a test cluster using an old laptop and 2 raspberry pi's to evaluate
> it in something other then virtualbox. I've got the laptop setup with
> ubuntu LTS but no luck on the Pi's.
>
> When I run ceph-deploy install pi2 it errors on the following:
>
> [pi2][DEBUG ] Reading state information...
> [pi2][WARNIN] E: Unable to locate package ceph-osd
> [pi2][WARNIN] E: Unable to locate package ceph-mon
> [pi2][ERROR ] RuntimeError: command returned non-zero exit status: 100
> [ceph_deploy][ERROR ] RuntimeError: Failed to execute command: env
> DEBIAN_FRONTEND=noninteractive DEBIAN_PRIORITY=critical apt-get
> --assume-yes -q --no-install-recommends install -o
> Dpkg::Options::=--force-confnew ceph-osd ceph-mds ceph-mon radosgw
>
>
> I've tried looking into installing it manually but not been able to find
> any concise instructions that don't just use the same repo's that are
> already missing the packages.
>
> I've tried using different releases of ceph, hammer, jewel and kraken but
> all return the same. Not sure how to get this up and running so I can get
> some hands on experience with it at home before making any recommendations
> at work.
>
>
Unfortunately I don't think Ceph is generally available for Raspberry Pi
platforms. They use a sufficiently-old instruction set that some of our
dependencies (for atomic memory operations and ordering) didn't work and
you have to configure it to use very slow fallbacks. Developers have run it
but I think the compromises required were significant enough nobody ever
thought it was a good idea to package. :/
-Greg
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] Errors connecting cinder-volume to ceph

2017-06-19 Thread T. Nichole Williams

Hello,

I’m having trouble connecting Ceph to OpenStack Cinder following the guide in 
docs.ceph.com  & I can’t figure out what’s wrong. I’ve 
confirmed auth connectivity for both root & ceph users on my openstack 
controller node, but the RBDDriver is not initializing. I’ve dug through every 
related Google article I can find with no results. Any one have any tips?

Here’s output of a few sample errors, auth list from controller, & 
cinder-manage config list. Please let me know if you need any further info from 
my side.
https://gist.githubusercontent.com/OGtrilliams/ed7642358a113ab7d908f4240427ad2e/raw/282de9ce1756670fe8c3071d1613e3c64d6e5b2f/cinder-conf
 


T. Nichole Williams
tribe...@tribecc.us



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Errors connecting cinder-volume to ceph

2017-06-19 Thread Marko Sluga

Hi Nicole,

I can help, I have been working on my own openstack connected to ceph - can you
send over the config in your /etc/cinder/cinder.conf file - especially the rbd
relevant section starting with:

volume_driver = cinder.volume.drivers.rbd.RBDDriver

Also, make sure your rbd_secret_uuid matches the client volume secret you
created.

Regards,

Marko Sluga

Independent Trainer

W: http://markocloud.com

T: +1 (647) 546-4365

L + M Consulting Inc.

Ste 212, 2121 Lake Shore Blvd W

M8E 4E9, Etobicoke, ON

On Mon, 19 Jun 2017 17:25:59 -0400 T. Nichole Williams
wrote

Hello,

I’m having trouble connecting Ceph to OpenStack Cinder following the guide in
docs.ceph.com & I can’t figure out what’s wrong. I’ve confirmed auth
connectivity for both root & ceph users on my openstack controller node,
but the RBDDriver is not initializing. I’ve dug through every related Google
article I can find with no results. Any one have any tips?

Here’s output of a few sample errors, auth list from controller, &
cinder-manage config list. Please let me know if you need any further info from
my side.

https://gist.githubusercontent.com/OGtrilliams/ed7642358a113ab7d908f4240427ad2e/raw/282de9ce1756670fe8c3071d1613e3c64d6e5b2f/cinder-conf

T. Nichole Williams

tribe...@tribecc.us

___

ceph-users mailing list

ceph-users@lists.ceph.com

http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Errors connecting cinder-volume to ceph

2017-06-19 Thread Alejandro Comisario

you might want to configure cinder.conf with

verbose = true
debug = true

and see /var/log/cinder/cinder-volume.log after a "systemctl restart
cinder-volume" to see the real cause.

best.
alejandrito

On Mon, Jun 19, 2017 at 6:25 PM, T. Nichole Williams 
wrote:

> Hello,
>
> I’m having trouble connecting Ceph to OpenStack Cinder following the guide
> in docs.ceph.com & I can’t figure out what’s wrong. I’ve confirmed auth
> connectivity for both root & ceph users on my openstack controller node,
> but the RBDDriver is not initializing. I’ve dug through every related
> Google article I can find with no results. Any one have any tips?
>
> Here’s output of a few sample errors, auth list from controller, &
> cinder-manage config list. Please let me know if you need any further info
> from my side.
> https://gist.githubusercontent.com/OGtrilliams/
> ed7642358a113ab7d908f4240427ad2e/raw/282de9ce1756670fe8c3071d1613e3
> c64d6e5b2f/cinder-conf
>
> T. Nichole Williams
> tribe...@tribecc.us
>
>
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] OSDs are not mounting on startup

2017-06-19 Thread Alex Gorbachev

We are seeing the same problem as http://tracker.ceph.com/issues/18945
where OSDs are not activating, with the lockbox error as below.
--
Alex Gorbachev
Storcium

un 19 17:11:56 roc03r-sca070 ceph-osd6804: starting osd.75 at :/0
osd_data /var/lib/ceph/osd/ceph-75 /var/lib/ceph/osd/ceph-75/journal
Jun 19 17:11:56 roc03r-sca070 sh3519: main_trigger:
Jun 19 17:11:56 roc03r-sca070 sh3519: main_trigger: main_activate:
path = /dev/sdj1
Jun 19 17:11:56 roc03r-sca070 sh3519: get_dm_uuid: get_dm_uuid
/dev/sdj1 uuid path is /sys/dev/block/8:145/dm/uuid
Jun 19 17:11:56 roc03r-sca070 sh3519: command: Running command:
/sbin/blkid -o udev -p /dev/sdj1
Jun 19 17:11:56 roc03r-sca070 sh3519: message repeated 3 times: [
command: Running command: /sbin/blkid -o udev -p /dev/sdj1]
Jun 19 17:11:56 roc03r-sca070 sh3519: Traceback (most recent call last):
Jun 19 17:11:56 roc03r-sca070 sh3519: File "/usr/sbin/ceph-disk", line
9, in 
Jun 19 17:11:56 roc03r-sca070 sh3519:
load_entry_point('ceph-disk==1.0.0', 'console_scripts', 'ceph-disk')()
Jun 19 17:11:56 roc03r-sca070 sh3519: File
"/usr/lib/python2.7/dist-packages/ceph_disk/main.py", line 5048, in
run
Jun 19 17:11:56 roc03r-sca070 sh3519: main(sys.argv[1:])
Jun 19 17:11:56 roc03r-sca070 sh3519: File
"/usr/lib/python2.7/dist-packages/ceph_disk/main.py", line 4999, in
main
Jun 19 17:11:56 roc03r-sca070 sh3519: args.func(args)
Jun 19 17:11:56 roc03r-sca070 sh3519: File
"/usr/lib/python2.7/dist-packages/ceph_disk/main.py", line 3357, in
main_activate
Jun 19 17:11:56 roc03r-sca070 sh3519: reactivate=args.reactivate,
Jun 19 17:11:56 roc03r-sca070 sh3519: File
"/usr/lib/python2.7/dist-packages/ceph_disk/main.py", line 3059, in
mount_activate
Jun 19 17:11:56 roc03r-sca070 sh3519: dev = dmcrypt_map(dev, dmcrypt_key_dir)
Jun 19 17:11:56 roc03r-sca070 sh3519: File
"/usr/lib/python2.7/dist-packages/ceph_disk/main.py", line 3037, in
dmcrypt_map
Jun 19 17:11:56 roc03r-sca070 sh3519: dmcrypt_key =
get_dmcrypt_key(part_uuid, dmcrypt_key_dir, luks)
Jun 19 17:11:56 roc03r-sca070 sh3519: File
"/usr/lib/python2.7/dist-packages/ceph_disk/main.py", line 1156, in
get_dmcrypt_key
Jun 19 17:11:56 roc03r-sca070 sh3519: raise Error('unknown
key-management-mode ' + str(mode))
Jun 19 17:11:56 roc03r-sca070 sh3519: ceph_disk.main.Error: Error:
unknown key-management-mode None
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Erasure Coding: Determine location of data and coding chunks

2017-06-19 Thread Marko Sluga

Hi Jonas,



ceph osd map [poolname] [objectname] 



should provide you with more information about where the object and chunks are 
stored on the cluster.



Regards,


Marko Sluga

Independent Trainer







W: http://markocloud.com

T: +1 (647) 546-4365



L + M Consulting Inc.

Ste 212, 2121 Lake Shore Blvd W

M8E 4E9, Etobicoke, ON






 On Mon, 19 Jun 2017 14:56:57 -0400 Jonas Jaszkowic 
 wrote 




Hello all, I have a simple question: 

 

I have an erasure coded pool with k = 2 data chunks and m = 3 coding chunks, 

how can I determine the location of the data and coding chunks? Given an object 
A 

that is stored on n = k + m different OSDs I want to find out where (i.e. on 
which OSDs) 

the data chunks are stored and where the coding chunks are stored. 

 

Thank you! 

 

- Jonas 

___ 

ceph-users mailing list 

ceph-users@lists.ceph.com 

http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com 






___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Errors connecting cinder-volume to ceph

2017-06-19 Thread T. Nichole Williams

Hi Marko,

Here’s’ my ceph config:

[ceph]
volume_driver = cinder.volume.drivers.rbd.RBDDriver
volume_backend_name = ceph
rbd_pool = volumes
rbd_ceph_conf = /etc/ceph/ceph.conf
rbd_flatten_volume_from_snapshot = false
rbd_max_clone_depth = 5
rbd_store_chunk_size = 4
rados_connect_timeout = -1
rbd_user = cinder
rbd_secret_uuid = c80d6505-260c-48c1-a248-7144cd5d5aab 
filter_function = "volume.size >= 2"

Setting logging to “debug” doesn’t seem to produce any new information. Here’s 
a snippet of /var/log/cinder/volume.log:
2017-06-19 16:54:45.056 9797 INFO cinder.volume.manager 
[req-e556d559-c484-4edf-a458-5afbafcb8e39 - - - - -] Initializing RPC dependent 
components of volume driver RBDDriver (1.2.0)
2017-06-19 16:54:45.056 9797 ERROR cinder.utils 
[req-e556d559-c484-4edf-a458-5afbafcb8e39 - - - - -] Volume driver RBDDriver 
not initialized
2017-06-19 16:54:45.057 9797 ERROR cinder.volume.manager 
[req-e556d559-c484-4edf-a458-5afbafcb8e39 - - - - -] Cannot complete RPC 
initialization because driver isn't initialized properly.
2017-06-19 16:54:55.063 9797 ERROR cinder.service [-] Manager for service 
cinder-volume controller.trilliams.info@ceph is reporting problems, not sending 
heartbeat. Service will appear "down".
2017-06-19 16:56:34.065 9797 WARNING cinder.volume.manager 
[req-1309a49a-d5c9-45dd-b277-36cb4ac09dd8 - - - - -] Update driver status 
failed: (config name ceph) is uninitialized.

I’ve added the entirety of /etc/cinder/cinder.conf to my gist, & thank you all 
for any help you can provide.


T. Nichole Williams
tribe...@tribecc.us



> On Jun 19, 2017, at 4:39 PM, Marko Sluga  wrote:
> 
> Hi Nicole,
> 
> I can help, I have been working on my own openstack connected to ceph - can 
> you send over the config in your /etc/cinder/cinder.conf file - especially 
> the rbd relevant section starting with:
> 
> volume_driver = cinder.volume.drivers.rbd.RBDDriver
> 
> Also, make sure your rbd_secret_uuid matches the client volume secret you 
> created.
> 
> Regards,
> 
> Marko Sluga
> Independent Trainer
> 
> <1487020143233.jpg>
> 
> W: http://markocloud.com 
> T: +1 (647) 546-4365
> 
> L + M Consulting Inc.
> Ste 212, 2121 Lake Shore Blvd W
> M8E 4E9, Etobicoke, ON
> 
> 
>  On Mon, 19 Jun 2017 17:25:59 -0400 T. Nichole Williams 
>  wrote 
> 
> Hello,
> 
> I’m having trouble connecting Ceph to OpenStack Cinder following the guide in 
> docs.ceph.com  & I can’t figure out what’s wrong. I’ve 
> confirmed auth connectivity for both root & ceph users on my openstack 
> controller node, but the RBDDriver is not initializing. I’ve dug through 
> every related Google article I can find with no results. Any one have any 
> tips?
> 
> Here’s output of a few sample errors, auth list from controller, & 
> cinder-manage config list. Please let me know if you need any further info 
> from my side.
> https://gist.githubusercontent.com/OGtrilliams/ed7642358a113ab7d908f4240427ad2e/raw/282de9ce1756670fe8c3071d1613e3c64d6e5b2f/cinder-conf
>  
> 
> 
> T. Nichole Williams
> tribe...@tribecc.us 
> 
> 
> 
> ___ 
> ceph-users mailing list 
> ceph-users@lists.ceph.com  
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com 
>  
> 

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Packages for Luminous RC 12.1.0?

2017-06-19 Thread Linh Vu

No worries, thanks a lot, look forward to testing it :)


From: Abhishek Lekshmanan 
Sent: Monday, 19 June 2017 10:03:15 PM
To: Linh Vu; ceph-users
Subject: Re: [ceph-users] Packages for Luminous RC 12.1.0?

Linh Vu  writes:

> Hi all,
>
>
> I saw that Luminous RC 12.1.0 has been mentioned in the latest release notes 
> here: http://docs.ceph.com/docs/master/release-notes/

the PR mentioning the release generally goes in around the same time we
announce the release, this time due to another issue we had to hold off
on the release. Sorry about that. we should be announcing the release
this week and have the packages soon
--
Abhishek Lekshmanan
SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton,
HRB 21284 (AG Nürnberg)

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] Ceph packages for Debian Stretch?

2017-06-19 Thread Christian Balzer


Hello,

can we have the status, projected release date of the Ceph packages for
Debian Stretch?

Christian
-- 
Christian BalzerNetwork/Systems Engineer
ch...@gol.com   Rakuten Communications
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Errors connecting cinder-volume to ceph

2017-06-19 Thread Marko Sluga


Hi Nichole,

Since your config is ok.

I'm going to need more details on the OpenStack release, the hypervisor, linux 
and librados versions.

You could also test if you can try and monut a volume from your os and/or 
hypervisor and the machine that runs the cinder volume service to start with.

Regards, Marko


 On Mon, 19 Jun 2017 17:59:48 -0400 tribe...@tribecc.us wrote 

Hi Marko,

Here’s’ my ceph config:

[ceph]
volume_driver = cinder.volume.drivers.rbd.RBDDriver
volume_backend_name = ceph
rbd_pool = volumes
rbd_ceph_conf = /etc/ceph/ceph.conf
rbd_flatten_volume_from_snapshot = false
rbd_max_clone_depth = 5
rbd_store_chunk_size = 4
rados_connect_timeout = -1
rbd_user = cinder
rbd_secret_uuid = c80d6505-260c-48c1-a248-7144cd5d5aab 
filter_function = "volume.size >= 2"

Setting logging to “debug” doesn’t seem to produce any new information. Here’s 
a snippet of /var/log/cinder/volume.log:
2017-06-19 16:54:45.056 9797 INFO cinder.volume.manager 
[req-e556d559-c484-4edf-a458-5afbafcb8e39 - - - - -] Initializing RPC dependent 
components of volume driver RBDDriver (1.2.0)
2017-06-19 16:54:45.056 9797 ERROR cinder.utils 
[req-e556d559-c484-4edf-a458-5afbafcb8e39 - - - - -] Volume driver RBDDriver 
not initialized
2017-06-19 16:54:45.057 9797 ERROR cinder.volume.manager 
[req-e556d559-c484-4edf-a458-5afbafcb8e39 - - - - -] Cannot complete RPC 
initialization because driver isn't initialized properly.
2017-06-19 16:54:55.063 9797 ERROR cinder.service [-] Manager for service 
cinder-volume controller.trilliams.info@ceph is reporting problems, not sending 
heartbeat. Service will appear "down".
2017-06-19 16:56:34.065 9797 WARNING cinder.volume.manager 
[req-1309a49a-d5c9-45dd-b277-36cb4ac09dd8 - - - - -] Update driver status 
failed: (config name ceph) is uninitialized.

I’ve added the entirety of /etc/cinder/cinder.conf to my gist, & thank you all 
for any help you can provide.


T. Nichole Williams
tribe...@tribecc.us



On Jun 19, 2017, at 4:39 PM, Marko Sluga  wrote:

Hi Nicole,

I can help, I have been working on my own openstack connected to ceph - can you 
send over the config in your /etc/cinder/cinder.conf file - especially the rbd 
relevant section starting with:

volume_driver = cinder.volume.drivers.rbd.RBDDriver

Also, make sure your rbd_secret_uuid  matches the client volume secret you 
created.

Regards,

Marko Sluga
Independent Trainer

<1487020143233.jpg>

W: http://markocloud.com
T: +1 (647) 546-4365

L + M Consulting Inc.
Ste 212, 2121 Lake Shore Blvd W
M8E 4E9, Etobicoke, ON


 On Mon, 19 Jun 2017 17:25:59 -0400 T. Nichole Williams 
 wrote 

Hello,

I’m having trouble connecting Ceph to OpenStack Cinder following the guide in 
docs.ceph.com & I can’t figure out what’s wrong. I’ve confirmed auth 
connectivity for both root & ceph users on my openstack controller node, but 
the RBDDriver is not initializing. I’ve dug through every related Google 
article I can find with no results. Any one have any tips?

Here’s output of a few sample errors, auth list from controller, & 
cinder-manage config list. Please let me know if you need any further info from 
my side.
https://gist.githubusercontent.com/OGtrilliams/ed7642358a113ab7d908f4240427ad2e/raw/282de9ce1756670fe8c3071d1613e3c64d6e5b2f/cinder-conf

T. Nichole Williams
tribe...@tribecc.us



___ 
ceph-users mailing list 
ceph-users@lists.ceph.com 
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com 


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] Introduction

2017-06-19 Thread Marko Sluga

Hi Everyone,

My name is Marko, I'm an independent consultant and trainer on cloud solutions 
and I work a lot with OpenStack. 

Recently my clients have started asking about Ceph so I went on the docs and 
learned how to use it and feel pretty comfortable using it now, especially in 
connection with OpenStack.

I joined the mailing lists this morning and I see there is a lot of activity so 
I'm here to help if everyone is ok with that of course.

Well, just wanted to say hi!

Regards, Marko
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Errors connecting cinder-volume to ceph

2017-06-19 Thread T. Nichole Williams

Hi Marko!

Here’s my details:

OpenStack Newton deployed with PackStack [controller + network node}
Ceph Kraken 3-node setup deployed with ceph-ansible

# cat /etc/redhat-release 
Red Hat Enterprise Linux Server release 7.3 (Maipo)

# ceph --version
ceph version 11.2.0 (f223e27eeb35991352ebc1f67423d4ebc252adb7)

# rpm -qa | grep librados
libradosstriper1-11.2.0-0.el7.x86_64
librados2-11.2.0-0.el7.x86_64

# cinder --version
1.9.0

# nova-manage --version
14.0.3

If it matters, both glance & nova connected without a hitch. It’s just cinder 
that’s causing a headache.

T. Nichole Williams
tribe...@tribecc.us



> On Jun 19, 2017, at 7:34 PM, Marko Sluga  wrote:
> 
> Hi Nichole,
> 
> Since your config is ok.
> 
> I'm going to need more details on the OpenStack release, the hypervisor, 
> linux and librados versions.
> 
> You could also test if you can try and monut a volume from your os and/or 
> hypervisor and the machine that runs the cinder volume service to start with.
> 
> Regards, Marko
> 
> 
>  On Mon, 19 Jun 2017 17:59:48 -0400 tribe...@tribecc.us 
>  wrote 
> 
> Hi Marko,
> 
> Here’s’ my ceph config:
> 
> [ceph]
> volume_driver = cinder.volume.drivers.rbd.RBDDriver
> volume_backend_name = ceph
> rbd_pool = volumes
> rbd_ceph_conf = /etc/ceph/ceph.conf
> rbd_flatten_volume_from_snapshot = false
> rbd_max_clone_depth = 5
> rbd_store_chunk_size = 4
> rados_connect_timeout = -1
> rbd_user = cinder
> rbd_secret_uuid = c80d6505-260c-48c1-a248-7144cd5d5aab 
> filter_function = "volume.size >= 2"
> 
> Setting logging to “debug” doesn’t seem to produce any new information. 
> Here’s a snippet of /var/log/cinder/volume.log:
> 2017-06-19 16:54:45.056 9797 INFO cinder.volume.manager 
> [req-e556d559-c484-4edf-a458-5afbafcb8e39 - - - - -] Initializing RPC 
> dependent components of volume driver RBDDriver (1.2.0)
> 2017-06-19 16:54:45.056 9797 ERROR cinder.utils 
> [req-e556d559-c484-4edf-a458-5afbafcb8e39 - - - - -] Volume driver RBDDriver 
> not initialized
> 2017-06-19 16:54:45.057 9797 ERROR cinder.volume.manager 
> [req-e556d559-c484-4edf-a458-5afbafcb8e39 - - - - -] Cannot complete RPC 
> initialization because driver isn't initialized properly.
> 2017-06-19 16:54:55.063 9797 ERROR cinder.service [-] Manager for service 
> cinder-volume controller.trilliams.info 
> @ceph is reporting problems, not sending 
> heartbeat. Service will appear "down".
> 2017-06-19 16:56:34.065 9797 WARNING cinder.volume.manager 
> [req-1309a49a-d5c9-45dd-b277-36cb4ac09dd8 - - - - -] Update driver status 
> failed: (config name ceph) is uninitialized.
> 
> I’ve added the entirety of /etc/cinder/cinder.conf to my gist, & thank you 
> all for any help you can provide.
> 
> 
> T. Nichole Williams
> tribe...@tribecc.us 
> 
> 
> 
> On Jun 19, 2017, at 4:39 PM, Marko Sluga  > wrote:
> 
> Hi Nicole,
> 
> I can help, I have been working on my own openstack connected to ceph - can 
> you send over the config in your /etc/cinder/cinder.conf file - especially 
> the rbd relevant section starting with:
> 
> volume_driver = cinder.volume.drivers.rbd.RBDDriver
> 
> Also, make sure your rbd_secret_uuid matches the client volume secret you 
> created.
> 
> Regards,
> 
> Marko Sluga
> Independent Trainer
> 
> <1487020143233.jpg>
> 
> W:  http://markocloud.com 
> T: +1 (647) 546-4365
> 
> L + M Consulting Inc.
> Ste 212, 2121 Lake Shore Blvd W
> M8E 4E9, Etobicoke, ON
> 
> 
>  On Mon, 19 Jun 2017 17:25:59 -0400 T. Nichole Williams 
> mailto:tribe...@tribecc.us>> wrote 
> 
> Hello,
> 
> I’m having trouble connecting Ceph to OpenStack Cinder following the guide in 
> docs.ceph.com  & I can’t figure out what’s wrong. I’ve 
> confirmed auth connectivity for both root & ceph users on my openstack 
> controller node, but the RBDDriver is not initializing. I’ve dug through 
> every related Google article I can find with no results. Any one have any 
> tips?
> 
> Here’s output of a few sample errors, auth list from controller, & 
> cinder-manage config list. Please let me know if you need any further info 
> from my side.
> https://gist.githubusercontent.com/OGtrilliams/ed7642358a113ab7d908f4240427ad2e/raw/282de9ce1756670fe8c3071d1613e3c64d6e5b2f/cinder-conf
>  
> 
> 
> T. Nichole Williams
> tribe...@tribecc.us 
> 
> 
> 
> ___ 
> ceph-users mailing list 
> ceph-users@lists.ceph.com  
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com 
>  
> 
> 

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://list

Re: [ceph-users] Errors connecting cinder-volume to ceph

2017-06-19 Thread Marko Sluga



Hi Nichole,

Yeah, your setup looks is ok, so the only thing here could be an auth issue. So 
I went through the config again and I see you have set the client.volumes ceph 
user with rwx permissions on the volumes pool.

In your cinder.conf the setup is:

rbd_user = cinder

Unless the cinder ceph user also exists, this is probably incorrectly set and I 
would say you would need to change that setting to:

rbd_user = client.volumes

Regards, Marko

 On Mon, 19 Jun 2017 20:50:47 -0400 tribe...@tribecc.us wrote 

Hi Marko!

Here’s my details:

OpenStack Newton deployed with PackStack [controller + network node}
Ceph Kraken 3-node setup deployed with ceph-ansible

# cat /etc/redhat-release 
Red Hat Enterprise Linux Server release 7.3 (Maipo)

# ceph --version
ceph version 11.2.0 (f223e27eeb35991352ebc1f67423d4ebc252adb7)

# rpm -qa | grep librados
libradosstriper1-11.2.0-0.el7.x86_64
librados2-11.2.0-0.el7.x86_64

# cinder --version
1.9.0

# nova-manage --version
14.0.3

If it matters, both glance & nova connected without a hitch. It’s just cinder 
that’s causing a headache.

T. Nichole Williams
tribe...@tribecc.us



On Jun 19, 2017, at 7:34 PM, Marko Sluga  wrote:

Hi Nichole,

Since your config is ok.

I'm going to need more details on the OpenStack release, the hypervisor, linux 
and librados versions.

You could also test if you can try and monut a volume from your os and/or 
hypervisor and the machine that runs the cinder volume service to start with.

Regards, Marko


 On Mon, 19 Jun 2017 17:59:48 -0400 tribe...@tribecc.us wrote 

Hi Marko,

Here’s’ my ceph config:

[ceph]
volume_driver = cinder.volume.drivers.rbd.RBDDriver
volume_backend_name = ceph
rbd_pool = volumes
rbd_ceph_conf = /etc/ceph/ceph.conf
rbd_flatten_volume_from_snapshot = false
rbd_max_clone_depth = 5
rbd_store_chunk_size = 4
rados_connect_timeout = -1
rbd_user = cinder
rbd_secret_uuid = c80d6505-260c-48c1-a248-7144cd5d5aab 
filter_function = "volume.size >= 2"

Setting logging to “debug” doesn’t seem to produce any new information. Here’s 
a snippet of /var/log/cinder/volume.log:
2017-06-19 16:54:45.056 9797 INFO cinder.volume.manager 
[req-e556d559-c484-4edf-a458-5afbafcb8e39 - - - - -] Initializing RPC dependent 
components of volume driver RBDDriver (1.2.0)
2017-06-19 16:54:45.056 9797 ERROR cinder.utils 
[req-e556d559-c484-4edf-a458-5afbafcb8e39 - - - - -] Volume driver RBDDriver 
not initialized
2017-06-19 16:54:45.057 9797 ERROR cinder.volume.manager 
[req-e556d559-c484-4edf-a458-5afbafcb8e39 - - - - -] Cannot complete RPC 
initialization because driver isn't initialized properly.
2017-06-19 16:54:55.063 9797 ERROR cinder.service [-] Manager for service 
cinder-volume controller.trilliams.info@ceph is reporting problems, not sending 
heartbeat. Service will appear "down".
2017-06-19 16:56:34.065 9797 WARNING cinder.volume.manager 
[req-1309a49a-d5c9-45dd-b277-36cb4ac09dd8 - - - - -] Update driver status 
failed: (config name ceph) is uninitialized.

I’ve added the entirety of /etc/cinder/cinder.conf to my gist, & thank you all 
for any help you can provide.


T. Nichole Williams
tribe...@tribecc.us



On Jun 19, 2017, at 4:39 PM, Marko Sluga  wrote:

Hi Nicole,

I can help, I have been working on my own openstack connected to ceph - can you 
send over the config in your /etc/cinder/cinder.conf file - especially the rbd 
relevant section starting with:

volume_driver = cinder.volume.drivers.rbd.RBDDriver

Also, make sure your rbd_secret_uuid matches the client volume secret you 
created.

Regards,

Marko Sluga
Independent Trainer

<1487020143233.jpg>

W: http://markocloud.com
T: +1 (647) 546-4365

L + M Consulting Inc.
Ste 212, 2121 Lake Shore Blvd W
M8E 4E9, Etobicoke, ON


 On Mon, 19 Jun 2017 17:25:59 -0400 T. Nichole Williams 
 wrote 

Hello,

I’m having trouble connecting Ceph to OpenStack Cinder following the guide in 
docs.ceph.com & I can’t figure out what’s wrong. I’ve confirmed auth 
connectivity for both root & ceph users on my openstack controller node, but 
the RBDDriver is not initializing. I’ve dug through every related Google 
article I can find with no results. Any one have any tips?

Here’s output of a few sample errors, auth list from controller, & 
cinder-manage config list. Please let me know if you need any further info from 
my side.
https://gist.githubusercontent.com/OGtrilliams/ed7642358a113ab7d908f4240427ad2e/raw/282de9ce1756670fe8c3071d1613e3c64d6e5b2f/cinder-conf

T. Nichole Williams
tribe...@tribecc.us



___ 
ceph-users mailing list 
ceph-users@lists.ceph.com 
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com 




___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Errors connecting cinder-volume to ceph

2017-06-19 Thread Marko Sluga

Sorry, 

rbd_user = volumes

Not client.volumes


 On Mon, 19 Jun 2017 21:09:38 -0400 ma...@markocloud.com wrote 

Hi Nichole,

Yeah, your setup looks is ok, so the only thing here could be an auth issue. So 
I went through the config again and I see you have set the client.volumes ceph 
user with rwx permissions on the volumes pool.

In your cinder.conf the setup is:

rbd_user = cinder

Unless the cinder ceph user also exists, this is probably incorrectly set and I 
would say you would need to change that setting to:

rbd_user = client.volumes

Regards, Marko

 On Mon, 19 Jun 2017 20:50:47 -0400 tribe...@tribecc.us wrote 

Hi Marko!

Here’s my details:

OpenStack Newton deployed with PackStack [controller + network node}
Ceph Kraken 3-node setup deployed with ceph-ansible

# cat /etc/redhat-release 
Red Hat Enterprise Linux Server release 7.3 (Maipo)

# ceph --version
ceph version 11.2.0 (f223e27eeb35991352ebc1f67423d4ebc252adb7)

# rpm -qa | grep librados
libradosstriper1-11.2.0-0.el7.x86_64
librados2-11.2.0-0.el7.x86_64

# cinder --version
1.9.0

# nova-manage --version
14.0.3

If it matters, both glance & nova connected without a hitch. It’s just cinder 
that’s causing a headache.

T. Nichole Williams
tribe...@tribecc.us



On Jun 19, 2017, at 7:34 PM, Marko Sluga  wrote:

Hi Nichole,

Since your config is ok.

I'm going to need more details on the OpenStack release, the hypervisor, linux 
and librados versions.

You could also test if you can try and monut a volume from your os and/or 
hypervisor and the machine that runs the cinder volume service to start with.

Regards, Marko


 On Mon, 19 Jun 2017 17:59:48 -0400 tribe...@tribecc.us wrote 

Hi Marko,

Here’s’ my ceph config:

[ceph]
volume_driver = cinder.volume.drivers.rbd.RBDDriver
volume_backend_name = ceph
rbd_pool = volumes
rbd_ceph_conf = /etc/ceph/ceph.conf
rbd_flatten_volume_from_snapshot = false
rbd_max_clone_depth = 5
rbd_store_chunk_size = 4
rados_connect_timeout = -1
rbd_user = cinder
rbd_secret_uuid = c80d6505-260c-48c1-a248-7144cd5d5aab 
filter_function = "volume.size >= 2"

Setting logging to “debug” doesn’t seem to produce any new information. Here’s 
a snippet of /var/log/cinder/volume.log:
2017-06-19 16:54:45.056 9797 INFO cinder.volume.manager 
[req-e556d559-c484-4edf-a458-5afbafcb8e39 - - - - -] Initializing RPC dependent 
components of volume driver RBDDriver (1.2.0)
2017-06-19 16:54:45.056 9797 ERROR cinder.utils 
[req-e556d559-c484-4edf-a458-5afbafcb8e39 - - - - -] Volume driver RBDDriver 
not initialized
2017-06-19 16:54:45.057 9797 ERROR cinder.volume.manager 
[req-e556d559-c484-4edf-a458-5afbafcb8e39 - - - - -] Cannot complete RPC 
initialization because driver isn't initialized properly.
2017-06-19 16:54:55.063 9797 ERROR cinder.service [-] Manager for service 
cinder-volume controller.trilliams.info@ceph is reporting problems, not sending 
heartbeat. Service will appear "down".
2017-06-19 16:56:34.065 9797 WARNING cinder.volume.manager 
[req-1309a49a-d5c9-45dd-b277-36cb4ac09dd8 - - - - -] Update driver status 
failed: (config name ceph) is uninitialized.

I’ve added the entirety of /etc/cinder/cinder.conf to my gist, & thank you all 
for any help you can provide.


T. Nichole Williams
tribe...@tribecc.us



On Jun 19, 2017, at 4:39 PM, Marko Sluga  wrote:

Hi Nicole,

I can help, I have been working on my own openstack connected to ceph - can you 
send over the config in your /etc/cinder/cinder.conf file - especially the rbd 
relevant section starting with:

volume_driver = cinder.volume.drivers.rbd.RBDDriver

Also, make sure your rbd_secret_uuid matches the client volume secret you 
created.

Regards,

Marko Sluga
Independent Trainer

<1487020143233.jpg>

W: http://markocloud.com
T: +1 (647) 546-4365

L + M Consulting Inc.
Ste 212, 2121 Lake Shore Blvd W
M8E 4E9, Etobicoke, ON


 On Mon, 19 Jun 2017 17:25:59 -0400 T. Nichole Williams 
 wrote 

Hello,

I’m having trouble connecting Ceph to OpenStack Cinder following the guide in 
docs.ceph.com & I can’t figure out what’s wrong. I’ve confirmed auth 
connectivity for both root & ceph users on my openstack controller node, but 
the RBDDriver is not initializing. I’ve dug through every related Google 
article I can find with no results. Any one have any tips?

Here’s output of a few sample errors, auth list from controller, & 
cinder-manage config list. Please let me know if you need any further info from 
my side.
https://gist.githubusercontent.com/OGtrilliams/ed7642358a113ab7d908f4240427ad2e/raw/282de9ce1756670fe8c3071d1613e3c64d6e5b2f/cinder-conf

T. Nichole Williams
tribe...@tribecc.us



___ 
ceph-users mailing list 
ceph-users@lists.ceph.com 
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com 




___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Introduction

2017-06-19 Thread Brad Hubbard

On Tue, Jun 20, 2017 at 10:40 AM, Marko Sluga  wrote:
> Hi Everyone,
>
> My name is Marko, I'm an independent consultant and trainer on cloud
> solutions and I work a lot with OpenStack.
>
> Recently my clients have started asking about Ceph so I went on the docs and
> learned how to use it and feel pretty comfortable using it now, especially
> in connection with OpenStack.
>
> I joined the mailing lists this morning and I see there is a lot of activity
> so I'm here to help if everyone is ok with that of course.

Sounds good Marko. Welcome!

>
> Well, just wanted to say hi!
>
> Regards, Marko
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>



-- 
Cheers,
Brad
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Errors connecting cinder-volume to ceph

2017-06-19 Thread T. Nichole Williams

That was it! Thank you so much for your help, Marko! What a silly thing for me 
to miss!

<3 Trilliams

Sent from my iPhone

> On Jun 19, 2017, at 8:12 PM, Marko Sluga  wrote:
> 
> Sorry, 
> 
> rbd_user = volumes
> 
> Not client.volumes
> 
> 
> 
>  On Mon, 19 Jun 2017 21:09:38 -0400 ma...@markocloud.com wrote 
> 
> Hi Nichole,
> 
> Yeah, your setup looks is ok, so the only thing here could be an auth issue. 
> So I went through the config again and I see you have set the client.volumes 
> ceph user with rwx permissions on the volumes pool.
> 
> In your cinder.conf the setup is:
> 
> rbd_user = cinder
> 
> Unless the cinder ceph user also exists, this is probably incorrectly set and 
> I would say you would need to change that setting to:
> 
> rbd_user = client.volumes
> 
> Regards, Marko
> 
>  On Mon, 19 Jun 2017 20:50:47 -0400 tribe...@tribecc.us wrote 
> 
> Hi Marko!
> 
> Here’s my details:
> 
> OpenStack Newton deployed with PackStack [controller + network node}
> Ceph Kraken 3-node setup deployed with ceph-ansible
> 
> # cat /etc/redhat-release 
> Red Hat Enterprise Linux Server release 7.3 (Maipo)
> 
> # ceph --version
> ceph version 11.2.0 (f223e27eeb35991352ebc1f67423d4ebc252adb7)
> 
> # rpm -qa | grep librados
> libradosstriper1-11.2.0-0.el7.x86_64
> librados2-11.2.0-0.el7.x86_64
> 
> # cinder --version
> 1.9.0
> 
> # nova-manage --version
> 14.0.3
> 
> If it matters, both glance & nova connected without a hitch. It’s just cinder 
> that’s causing a headache.
> 
> T. Nichole Williams
> tribe...@tribecc.us
> 
> 
> 
> On Jun 19, 2017, at 7:34 PM, Marko Sluga  wrote:
> 
> Hi Nichole,
> 
> Since your config is ok.
> 
> I'm going to need more details on the OpenStack release, the hypervisor, 
> linux and librados versions.
> 
> You could also test if you can try and monut a volume from your os and/or 
> hypervisor and the machine that runs the cinder volume service to start with.
> 
> Regards, Marko
> 
> 
>  On Mon, 19 Jun 2017 17:59:48 -0400 tribe...@tribecc.us wrote 
> 
> Hi Marko,
> 
> Here’s’ my ceph config:
> 
> [ceph]
> volume_driver = cinder.volume.drivers.rbd.RBDDriver
> volume_backend_name = ceph
> rbd_pool = volumes
> rbd_ceph_conf = /etc/ceph/ceph.conf
> rbd_flatten_volume_from_snapshot = false
> rbd_max_clone_depth = 5
> rbd_store_chunk_size = 4
> rados_connect_timeout = -1
> rbd_user = cinder
> rbd_secret_uuid = c80d6505-260c-48c1-a248-7144cd5d5aab 
> filter_function = "volume.size >= 2"
> 
> Setting logging to “debug” doesn’t seem to produce any new information. 
> Here’s a snippet of /var/log/cinder/volume.log:
> 2017-06-19 16:54:45.056 9797 INFO cinder.volume.manager 
> [req-e556d559-c484-4edf-a458-5afbafcb8e39 - - - - -] Initializing RPC 
> dependent components of volume driver RBDDriver (1.2.0)
> 2017-06-19 16:54:45.056 9797 ERROR cinder.utils 
> [req-e556d559-c484-4edf-a458-5afbafcb8e39 - - - - -] Volume driver RBDDriver 
> not initialized
> 2017-06-19 16:54:45.057 9797 ERROR cinder.volume.manager 
> [req-e556d559-c484-4edf-a458-5afbafcb8e39 - - - - -] Cannot complete RPC 
> initialization because driver isn't initialized properly.
> 2017-06-19 16:54:55.063 9797 ERROR cinder.service [-] Manager for service 
> cinder-volume controller.trilliams.info@ceph is reporting problems, not 
> sending heartbeat. Service will appear "down".
> 2017-06-19 16:56:34.065 9797 WARNING cinder.volume.manager 
> [req-1309a49a-d5c9-45dd-b277-36cb4ac09dd8 - - - - -] Update driver status 
> failed: (config name ceph) is uninitialized.
> 
> I’ve added the entirety of /etc/cinder/cinder.conf to my gist, & thank you 
> all for any help you can provide.
> 
> 
> T. Nichole Williams
> tribe...@tribecc.us
> 
> 
> 
> On Jun 19, 2017, at 4:39 PM, Marko Sluga  wrote:
> 
> Hi Nicole,
> 
> I can help, I have been working on my own openstack connected to ceph - can 
> you send over the config in your /etc/cinder/cinder.conf file - especially 
> the rbd relevant section starting with:
> 
> volume_driver = cinder.volume.drivers.rbd.RBDDriver
> 
> Also, make sure your rbd_secret_uuid matches the client volume secret you 
> created.
> 
> Regards,
> 
> Marko Sluga
> Independent Trainer
> 
> <1487020143233.jpg>
> 
> W: http://markocloud.com
> T: +1 (647) 546-4365
> 
> L + M Consulting Inc.
> Ste 212, 2121 Lake Shore Blvd W
> M8E 4E9, Etobicoke, ON
> 
> 
>  On Mon, 19 Jun 2017 17:25:59 -0400 T. Nichole Williams 
>  wrote 
> 
> Hello,
> 
> I’m having trouble connecting Ceph to OpenStack Cinder following the guide in 
> docs.ceph.com & I can’t figure out what’s wrong. I’ve confirmed auth 
> connectivity for both root & ceph users on my openstack controller node, but 
> the RBDDriver is not initializing. I’ve dug through every related Google 
> article I can find with no results. Any one have any tips?
> 
> Here’s output of a few sample errors, auth list from controller, & 
> cinder-manage config list. Please let me know if you need any further info 
> from my side.
> https://gist.githubusercontent.

Re: [ceph-users] Errors connecting cinder-volume to ceph

2017-06-19 Thread Marko Sluga

Not a problem at all, sometimes all we need is just need a second pair of eyes! 
;)

 On Mon, 19 Jun 2017 21:23:34 -0400 tribe...@tribecc.us wrote 

That was it! Thank you so much for your help, Marko! What a silly thing for me 
to miss!

<3 Trilliams

Sent from my iPhone

On Jun 19, 2017, at 8:12 PM, Marko Sluga  wrote:

Sorry, 

rbd_user = volumes

Not client.volumes



 On Mon, 19 Jun 2017 21:09:38 -0400 ma...@markocloud.com wrote 

Hi Nichole,

Yeah, your setup looks is ok, so the only thing here could be an auth issue. So 
I went through the config again and I see you have set the client.volumes ceph 
user with rwx permissions on the volumes pool.

In your cinder.conf the setup is:

rbd_user = cinder

Unless the cinder ceph user also exists, this is probably incorrectly set and I 
would say you would need to change that setting to:

rbd_user = client.volumes

Regards, Marko

 On Mon, 19 Jun 2017 20:50:47 -0400 tribe...@tribecc.us wrote 

Hi Marko!

Here’s my details:

OpenStack Newton deployed with PackStack [controller + network node}
Ceph Kraken 3-node setup deployed with ceph-ansible

# cat /etc/redhat-release 
Red Hat Enterprise Linux Server release 7.3 (Maipo)

# ceph --version
ceph version 11.2.0 (f223e27eeb35991352ebc1f67423d4ebc252adb7)

# rpm -qa | grep librados
libradosstriper1-11.2.0-0.el7.x86_64
librados2-11.2.0-0.el7.x86_64

# cinder --version
1.9.0

# nova-manage --version
14.0.3

If it matters, both glance & nova connected without a hitch. It’s just cinder 
that’s causing a headache.

T. Nichole Williams
tribe...@tribecc.us



On Jun 19, 2017, at 7:34 PM, Marko Sluga  wrote:

Hi Nichole,

Since your config is ok.

I'm going to need more details on the OpenStack release, the hypervisor, linux 
and librados versions.

You could also test if you can try and monut a volume from your os and/or 
hypervisor and the machine that runs the cinder volume service to start with.

Regards, Marko


 On Mon, 19 Jun 2017 17:59:48 -0400 tribe...@tribecc.us wrote 

Hi Marko,

Here’s’ my ceph config:

[ceph]
volume_driver = cinder.volume.drivers.rbd.RBDDriver
volume_backend_name = ceph
rbd_pool = volumes
rbd_ceph_conf = /etc/ceph/ceph.conf
rbd_flatten_volume_from_snapshot = false
rbd_max_clone_depth = 5
rbd_store_chunk_size = 4
rados_connect_timeout = -1
rbd_user = cinder
rbd_secret_uuid = c80d6505-260c-48c1-a248-7144cd5d5aab 
filter_function = "volume.size >= 2"

Setting logging to “debug” doesn’t seem to produce any new information. Here’s 
a snippet of /var/log/cinder/volume.log:
2017-06-19 16:54:45.056 9797 INFO cinder.volume.manager 
[req-e556d559-c484-4edf-a458-5afbafcb8e39 - - - - -] Initializing RPC dependent 
components of volume driver RBDDriver (1.2.0)
2017-06-19 16:54:45.056 9797 ERROR cinder.utils 
[req-e556d559-c484-4edf-a458-5afbafcb8e39 - - - - -] Volume driver RBDDriver 
not initialized
2017-06-19 16:54:45.057 9797 ERROR cinder.volume.manager 
[req-e556d559-c484-4edf-a458-5afbafcb8e39 - - - - -] Cannot complete RPC 
initialization because driver isn't initialized properly.
2017-06-19 16:54:55.063 9797 ERROR cinder.service [-] Manager for service 
cinder-volume controller.trilliams.info@ceph is reporting problems, not sending 
heartbeat. Service will appear "down".
2017-06-19 16:56:34.065 9797 WARNING cinder.volume.manager 
[req-1309a49a-d5c9-45dd-b277-36cb4ac09dd8 - - - - -] Update driver status 
failed: (config name ceph) is uninitialized.

I’ve added the entirety of /etc/cinder/cinder.conf to my gist, & thank you all 
for any help you can provide.


T. Nichole Williams
tribe...@tribecc.us



On Jun 19, 2017, at 4:39 PM, Marko Sluga  wrote:

Hi Nicole,

I can help, I have been working on my own openstack connected to ceph - can you 
send over the config in your /etc/cinder/cinder.conf file - especially the rbd 
relevant section starting with:

volume_driver = cinder.volume.drivers.rbd.RBDDriver

Also, make sure your rbd_secret_uuid matches the client volume secret you 
created.

Regards,

Marko Sluga
Independent Trainer

<1487020143233.jpg>

W: http://markocloud.com
T: +1 (647) 546-4365

L + M Consulting Inc.
Ste 212, 2121 Lake Shore Blvd W
M8E 4E9, Etobicoke, ON


 On Mon, 19 Jun 2017 17:25:59 -0400 T. Nichole Williams 
 wrote 

Hello,

I’m having trouble connecting Ceph to OpenStack Cinder following the guide in 
docs.ceph.com & I can’t figure out what’s wrong. I’ve confirmed auth 
connectivity for both root & ceph users on my openstack controller node, but 
the RBDDriver is not initializing. I’ve dug through every related Google 
article I can find with no results. Any one have any tips?

Here’s output of a few sample errors, auth list from controller, & 
cinder-manage config list. Please let me know if you need any further info from 
my side.
https://gist.githubusercontent.com/OGtrilliams/ed7642358a113ab7d908f4240427ad2e/raw/282de9ce1756670fe8c3071d1613e3c64d6e5b2f/cinder-conf

T. Nichole Williams
tribe...@tribecc.us



_

[ceph-users] Question about upgrading ceph clusters from Hammer to Jewel

2017-06-19 Thread 许雪寒

Hi, everyone.

I intend to upgrade one of our ceph clusters from Hammer to Jewel, I wonder in 
what order I should upgrade the MON, OSD and LIBRBD? Is there any problem to 
have some of these components running Hammer version while others running Jewel 
version? Do I have to upgrade QEMU as well to adapt to the Jewel version’s 
LIBRBD?

Thank you:)
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] 答复: Question about upgrading ceph clusters from Hammer to Jewel

2017-06-19 Thread 许雪寒

By the way, I intend to install jewel version throught “rpm” command, and I 
already have a user “ceph” on the target machine, is there any problem if I do 
“systemctl start ceph.target” after the installation of jewel version?

发件人: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] 代表 许雪寒
发送时间: 2017年6月20日 10:35
收件人: ceph-users@lists.ceph.com
主题: [ceph-users] Question about upgrading ceph clusters from Hammer to Jewel

Hi, everyone.

I intend to upgrade one of our ceph clusters from Hammer to Jewel, I wonder in 
what order I should upgrade the MON, OSD and LIBRBD? Is there any problem to 
have some of these components running Hammer version while others running Jewel 
version? Do I have to upgrade QEMU as well to adapt to the Jewel version’s 
LIBRBD?

Thank you:)
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] 转发: Question about upgrading ceph clusters from Hammer to Jewel

2017-06-19 Thread 许雪寒

By the way, I intend to install jewel version throught “rpm” command, and I 
already have a user “ceph” on the target machine, is there any problem if I do 
“systemctl start ceph.target” after the installation of jewel version?

发件人: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] 代表 许雪寒
发送时间: 2017年6月20日 10:35
收件人: ceph-users@lists.ceph.com
主题: [ceph-users] Question about upgrading ceph clusters from Hammer to Jewel

Hi, everyone.

I intend to upgrade one of our ceph clusters from Hammer to Jewel, I wonder in 
what order I should upgrade the MON, OSD and LIBRBD? Is there any problem to 
have some of these components running Hammer version while others running Jewel 
version? Do I have to upgrade QEMU as well to adapt to the Jewel version’s 
LIBRBD?

Thank you☺
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] FW: radosgw: stale/leaked bucket index entries

2017-06-19 Thread Pavan Rallabhandi

Trying one more time with ceph-users

On 19/06/17, 11:07 PM, "Pavan Rallabhandi"  wrote:

On many of our clusters running Jewel (10.2.5+), am running into a strange 
problem of having stale bucket index entries left over for (some of the) 
objects deleted. Though it is not reproducible at will, it has been pretty 
consistent of late and am clueless at this point for the possible reasons to 
happen so. 

The symptoms are that the actual delete operation of an object is reported 
successful in the RGW logs, but a bucket list on the container would still show 
the deleted object. An attempt to download/stat of the object appropriately 
results in a failure. No failures are seen in the respective OSDs where the 
bucket index object is located. And rebuilding the bucket index by running 
‘radosgw-admin bucket check –fix’ would fix the issue.

Though I could simulate the problem by instrumenting the code, to not to 
have invoked `complete_del` on the bucket index op 
https://github.com/ceph/ceph/blob/master/src/rgw/rgw_rados.cc#L8793, but that 
call is always seem to be made unless there is a cascading error from the 
actual delete operation of the object, which doesn’t seem to be the case here.

I wanted to know the possible reasons where the bucket index would be left 
in such limbo, any pointers would be much appreciated. FWIW, we are not 
sharding the buckets and very recently I’ve seen this happen with buckets 
having as low as 
< 10 objects, and we are using swift for all the operations.

Thanks,
-Pavan.



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] RadosGW not working after upgrade to Hammer

2017-06-19 Thread Gerson Jamal

Hi everyone,

I upgrade ceph from firefly to hammer and everything looks OK on upgrade
but after that RadosGW not working, I can list all buckets but i cant list
the objects inside the buckets, and I receive the following error:

format=json 400 Bad Request   []{"Code":"InvalidArgument"}

On Radosgw log I got the following error:

2017-06-17 01:37:25.325505 7f0108801700 10 ver=v1 first= req=
2017-06-17 01:37:25.325508 7f0108801700 10 s->object= s->bucket=
2017-06-17 01:37:25.325513 7f0108801700  2 req 21:0.49:swift:GET
/swift/v1/::getting op
2017-06-17 01:37:25.325516 7f0108801700  2 req 21:0.53:swift:GET
/swift/v1/:list_buckets:authorizing
2017-06-17 01:37:25.325529 7f0108801700 10 swift_user=sysmonitor:xx
2017-06-17 01:37:25.325541 7f0108801700 20 build_token
token=15007379736d6f6e69746f723a6c69676874686f7573652c7d7d41da54e3ba35bd45595a0df912
2017-06-17 01:37:25.325572 7f0108801700  2 req 21:0.000108:swift:GET
/swift/v1/:list_buckets:reading permissions
2017-06-17 01:37:25.325580 7f0108801700  2 req 21:0.000116:swift:GET
/swift/v1/:list_buckets:init op
2017-06-17 01:37:25.325582 7f0108801700  2 req 21:0.000119:swift:GET
/swift/v1/:list_buckets:verifying op mask
2017-06-17 01:37:25.325584 7f0108801700 20 required_mask= 1 user.op_mask=7
2017-06-17 01:37:25.325586 7f0108801700  2 req 21:0.000122:swift:GET
/swift/v1/:list_buckets:verifying op permissions
2017-06-17 01:37:25.325588 7f0108801700  2 req 21:0.000125:swift:GET
/swift/v1/:list_buckets:verifying op params
2017-06-17 01:37:25.325590 7f0108801700  2 req 21:0.000127:swift:GET
/swift/v1/:list_buckets:executing
2017-06-17 01:37:25.328258 7f0108801700 20 reading from
.rgw:.bucket.meta.CHECK_CEPH:default.4576.17572
2017-06-17 01:37:25.328284 7f0108801700 20 get_obj_state:
rctx=0x7f01087ff250 obj=.rgw:.bucket.meta.CHECK_CEPH:default.4576.17572
state=0x7f05641389c0 s->prefetch_data=0
2017-06-17 01:37:25.328294 7f0108801700 10 cache get:
name=.rgw+.bucket.meta.CHECK_CEPH:default.4576.17572 : hit
2017-06-17 01:37:25.328304 7f0108801700 20 get_obj_state: s->obj_tag was
set empty
2017-06-17 01:37:25.328308 7f0108801700 10 cache get:
name=.rgw+.bucket.meta.CHECK_CEPH:default.4576.17572 : hit
2017-06-17 01:37:25.330351 7f0108801700  0 ERROR: could not get stats for
buckets
2017-06-17 01:37:25.330378 7f0108801700 10 WARNING: failed on
rgw_get_user_buckets uid=sysmonitor
2017-06-17 01:37:25.330407 7f0108801700  2 req 21:0.004943:swift:GET
/swift/v1/:list_buckets:http status=400
2017-06-17 01:37:25.330412 7f0108801700  1 == req done
req=0x7f053023c0a0 http_status=400 ==
2017-06-17 01:37:25.330418 7f0108801700 20 process_request() returned -22
2017-06-17 01:37:28.470724 7f05837fe700  2
RGWDataChangesLog::ChangesRenewThread: start



Anyone can help me

-- 
Regards,

Gerson Razaque Jamal
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

46 matches

Mail list logo