Re: [openstack-dev] [puppet] [swift] Storage service startup should await ring creation

2015-08-14 Thread Mark Kirkwood

On 10/07/15 12:43, Mark Kirkwood wrote:

Hi,

I am using puppet-swift to deploy a swift multi node cluster (Icehouse),
following the setup in supplied tests/site.pp. I am running into two
issues that seem to be related to the subject above:

1/ Errors when the storage replication services try to start before the
ring files exist. e.g:

Error: Could not start Service[swift-object-replicator]: Execution of
'/sbin/start swift-object-replicator' returned 1: start: Job failed to
start
Wrapped exception:
Execution of '/sbin/start swift-object-replicator' returned 1: start:
Job failed to start
Error:
/Stage[main]/Swift::Storage::Object/Swift::Storage::Generic[object]/Service[swift-object-replicator]/ensure:
change from stopped to running failed: Could not start
Service[swift-object-replicator]: Execution of '/sbin/start
swift-object-replicator' returned 1: start: Job failed to start

Now these will be fixed the *next* time I do a puppet run (provided I've
performed a run on the appropriate proxy/ringmaster). However the
failing services make scripted testing difficult as we have to put in
logic to the effect don't worry about errors the 1st time.

2/ Container and object stats not updated without full restart of services

This one is a bit more subtle - everything works but 'swift stat' always
shows zero objects and bytes for every container. The only way to fix
this is to stop and start all services on each storage node.

Again this complicates scripted builds as there is the need to go and
stop + start all the swift storage services! Not to mention an extra
little quirk for ops to remember at zero dark 30 oclock...

I've made a patch that prevents these services starting until the ring
files exist (actually for now it just checks the object ring) - see
attached.

Now while I'm not entirely sure that this is the best way to solve the
issue (custom fact that changes the service start flag)...I *do* think
that making the storage services await the ring existence *is* a needed
change, so any thoughts on better ways to do this are appreciated.

Also note that this change *does* require one more puppet run on the
storage nodes:
- one to create the storage servers config and drives
- one to get the ring from the proxy/ringmaster
- one to start the services



I decided to work around these rather than trying to battle in my patch 
to the swift module.


For 1/ I'm trapping the return code for the 1st puppet run and handling 
errors there...run not doing anything for any subsequent run as there 
shouldn't be any errors thereafter. Seems to work ok


For 2/ I'm inserting an exec in our driving puppet code to just stop and 
start (not restart as that does not solve it...growl) *all* the services 
on a storage node. e.g (see tests/site.pp in the swift module for context):


  # collect resources for synchronizing the ring databases
  Swift::Ringsync||

  # stop and start all swift services
  # this is a workaround due to the services starting before the ring
  # is synced which stops stats (and maybe other things) working.
  # We can't just restart as this does *not* achieve the same result.
  exec { 'stop-start-services':
provider = shell,
command  = 'for x in `initctl list|grep swift|awk \'{print 
$1}\'`;do stop $x;start $x;done;exit 0',

path = '/usr/bin:/usr/sbin:/bin:/sbin',
logoutput= true,
  }

  # make sure we stop and start all services after syncing ring
  Swift::Ringsync||
  - Exec['stop-start-services']


Ok, so it is a pretty big hammer, but leaves all of the services in a 
known good state, so I'm happy.


Regards

Mark

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] [puppet] [swift] Storage service startup should await ring creation

2015-07-09 Thread Mark Kirkwood

Hi,

I am using puppet-swift to deploy a swift multi node cluster (Icehouse), 
following the setup in supplied tests/site.pp. I am running into two 
issues that seem to be related to the subject above:


1/ Errors when the storage replication services try to start before the 
ring files exist. e.g:


Error: Could not start Service[swift-object-replicator]: Execution of 
'/sbin/start swift-object-replicator' returned 1: start: Job failed to start

Wrapped exception:
Execution of '/sbin/start swift-object-replicator' returned 1: start: 
Job failed to start
Error: 
/Stage[main]/Swift::Storage::Object/Swift::Storage::Generic[object]/Service[swift-object-replicator]/ensure: 
change from stopped to running failed: Could not start 
Service[swift-object-replicator]: Execution of '/sbin/start 
swift-object-replicator' returned 1: start: Job failed to start


Now these will be fixed the *next* time I do a puppet run (provided I've 
performed a run on the appropriate proxy/ringmaster). However the 
failing services make scripted testing difficult as we have to put in 
logic to the effect don't worry about errors the 1st time.


2/ Container and object stats not updated without full restart of services

This one is a bit more subtle - everything works but 'swift stat' always 
shows zero objects and bytes for every container. The only way to fix 
this is to stop and start all services on each storage node.


Again this complicates scripted builds as there is the need to go and 
stop + start all the swift storage services! Not to mention an extra 
little quirk for ops to remember at zero dark 30 oclock...


I've made a patch that prevents these services starting until the ring 
files exist (actually for now it just checks the object ring) - see 
attached.


Now while I'm not entirely sure that this is the best way to solve the 
issue (custom fact that changes the service start flag)...I *do* think 
that making the storage services await the ring existence *is* a needed 
change, so any thoughts on better ways to do this are appreciated.


Also note that this change *does* require one more puppet run on the 
storage nodes:

- one to create the storage servers config and drives
- one to get the ring from the proxy/ringmaster
- one to start the services

Regards

Mark
From 1583d68eedbeaecbacb5a29258343b9e980ce4a4 Mon Sep 17 00:00:00 2001
From: Mark Kirkwood mark.kirkw...@catalyst.net.nz
Date: Fri, 10 Jul 2015 11:18:16 +1200
Subject: [PATCH] Make storage service startup wait for the ring

This solves two problems:

1/ Errors when the replication services fail to start when the
ring files are absent.

2/ Container and object stats not updated without full restart of services

While neither of these issues prevent a working deployment, they play
havoc with fully automated deployment for ci etc (and make arcane
knowledge necessary for ops).

This change *does* require one more puppet run on the storage nodes:
- one to create the storage servers config and drives
- one to get the ring from the proxy/ringmaster
- one to start the services

(amend: make sure the custom fact goes in the correct dir)
Change-Id: I168c26aed5f6dfc337ea8bc5f863e15f6e86d4a2
---
 modules/swift/lib/facter/ringfact.rb |  5 +
 modules/swift/manifests/storage/account.pp   | 14 ++
 modules/swift/manifests/storage/container.pp | 17 -
 modules/swift/manifests/storage/object.pp| 15 +++
 4 files changed, 38 insertions(+), 13 deletions(-)
 create mode 100644 modules/swift/lib/facter/ringfact.rb

diff --git a/modules/swift/lib/facter/ringfact.rb b/modules/swift/lib/facter/ringfact.rb
new file mode 100644
index 000..4816cb8
--- /dev/null
+++ b/modules/swift/lib/facter/ringfact.rb
@@ -0,0 +1,5 @@
+Facter.add(ringexists) do
+  setcode do
+File.exists?(/etc/swift/object.ring.gz) ? true : false
+  end
+end
diff --git a/modules/swift/manifests/storage/account.pp b/modules/swift/manifests/storage/account.pp
index a4398c3..1719776 100644
--- a/modules/swift/manifests/storage/account.pp
+++ b/modules/swift/manifests/storage/account.pp
@@ -18,16 +18,22 @@ class swift::storage::account(
   $enabled= true,
   $package_ensure = 'present'
 ) {
+  if $::ringexists == false {
+$service_enabled = false
+  } else {
+$service_enabled = $enabled
+  }
+
   swift::storage::generic { 'account':
 manage_service = $manage_service,
-enabled= $enabled,
+enabled= $service_enabled,
 package_ensure = $package_ensure,
   }
 
   include swift::params
 
   if $manage_service {
-if $enabled {
+if $service_enabled {
   $service_ensure = 'running'
 } else {
   $service_ensure = 'stopped'
@@ -37,7 +43,7 @@ class swift::storage::account(
   service { 'swift-account-reaper':
 ensure= $service_ensure,
 name  = $::swift::params::account_reaper_service_name,
-enable= $enabled,
+enable= $service_enabled,
 provider  =