commit 8ee1eb8c1e9f691df2c3fa5eb0911d3e4602d46f
Merge: fa6fd0f 713f1cf
Author: Lisa Velden <[email protected]>
Date: Fri Dec 4 14:57:39 2015 +0100
Merge branch 'stable-2.16' into stable-2.17
* stable-2.16
Fix lines with more than 80 characters
Add more detach/attach sequence tests
Allow disk attachment to diskless instances
Improve tests for attaching disks
Use only string value in error message
Add entries describing new gnt-cluster params to manpage
QA: Add ssh-key-type and -bits tests
QA: Extend AssertCommand to allow not forwarding the agent
Remove default limit on diffs in cfgupgrade tests
QA: Downgrade the cluster key type in 2.16
Fix typo
Fail early for invalid key type and size combinations
Handle SSH key changes in upgrades and downgrades
Allow SSH key property changes
Use the SSH key parameters when generating keys
Do not generate the ganeti_pub_keys file with --no-ssh-init
Add querying of ssh-related config values
Add modify_ssh_setup to queryable config params
Add helper function for querying cluster properties
Show info about new params in gnt-cluster info
Add the SSH key type and length to the config, and set them
Change SSH key types to a proper Haskell sum type
Add the SSH key options
Mention disabling of '--no-node-setup' in NEWS file
Show 'modify ssh setup' in cluster info
Disable --no-node-setup
Make 'modify ssh setup' queryable
Fix RPC signature of NodeVerify
Use ssconf for SSH ports in NodeVerify
* stable-2.15
Document the decission why optimisation is turned off
Don't keep input for error messages
Use dict.copy instead of deepcopy
Use bulk-adding of keys in renew-crypto
Make NodeSshKeyAdd use its *Bulk companion
Unit test bulk-adding normal nodes
Unit test for bulk-adding pot. master candidates
Introduce bulk-adding of SSH keys
Pause watcher during performance QA
Send answers strictly
Store keys as ByteStrings
Encode UUIDs as ByteStrings
Prefer the UuidObject type class over specific functions
Assign the variables before use (bugfix for dee6adb9)
Extend QA to detect autopromotion errors
Handle SSH key distribution on auto promotion
Do not remove authorized key of node itself
Fix indentation
Support force option for deactivate disks on RAPI
* stable-2.14
Fix faulty iallocator type check
Improve cfgupgrade output in case of errors
* stable-2.13
Extend timeout for gnt-cluster renew-crypto
Reduce flakyness of GetCmdline test on slow machines
Remove duplicated words
* stable-2.12
Revert "Also consider connection time out a network error"
Clone lists before modifying
Make lockConfig call retryable
Return the correct error code in the post-upgrade script
Make openssl refrain from DH altogether
Fix upgrades of instances with missing creation time
* stable-2.11
(no changes)
* stable-2.10
Remove -X from hspace man page
Make htools tolerate missing "dtotal" and "dfree" on luxi
Conflicts:
NEWS
lib/cli_opts.py
lib/objects.py
src/Ganeti/Config.hs
src/Ganeti/DataCollectors.hs
src/Ganeti/Monitoring/Server.hs
src/Ganeti/Objects.hs
src/Ganeti/Objects/Disk.hs
src/Ganeti/Objects/Instance.hs
src/Ganeti/Query/Group.hs
src/Ganeti/Query/Server.hs
src/Ganeti/WConfd/ConfigModifications.hs
src/Ganeti/WConfd/ConfigVerify.hs
test/hs/Test/Ganeti/Objects.hs
test/py/cfgupgrade_unittest.py
Resolution:
NEWS
take both changes
lib/cli_opts.py
take both changes
lib/objects.py
take both changes
src/Ganeti/Config.hs
keep the ByteString changes, but Control.Monad from 2.17
src/Ganeti/DataCollectors.hs
take both changes
src/Ganeti/Monitoring/Server.hs
fix imports
src/Ganeti/Objects.hs
take both changes
src/Ganeti/Objects/Disk.hs
take both changes
src/Ganeti/Objects/Instance.hs
fix imports
keep 2.17 changes
src/Ganeti/Query/Group.hs
keep field definition for hv_state and disk_state, but use
uuidOf instead of groupUuid
src/Ganeti/Query/Server.hs
take both changes
src/Ganeti/WConfd/ConfigModifications.hs
fix imports
src/Ganeti/WConfd/ConfigVerify.hs
fix imports
test/hs/Test/Ganeti/Objects.hs
fix imports
take both changes
test/py/cfgupgrade_unittest.py
take both changes
diff --cc NEWS
index 8a96376,898a739..c64fedf
--- a/NEWS
+++ b/NEWS
@@@ -2,37 -2,20 +2,51 @@@ New
====
+Version 2.17.0 alpha1
+---------------------
+
+*(unreleased)*
+
+Incompatible/important changes
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+- The IAllocator protocol has been extended by a new ``hv_state`` parameter.
+ This new parameter is used to estimate the amount of memory utilized by
+ the node. It replaces ``reserved_mem`` on hypervisors other than ``xen-pvm``
+ and ``xen-hvm`` because ``reserved_mem`` was reported incorrectly on them.
+ If this ``hv_state`` parameter is not presented in an iallocator input, the
+ old ``reserved_mem`` will be used.
+
+New features
+~~~~~~~~~~~~
+
+- There is a new daemon, the :doc:`Ganeti Maintenance Daemon
<design-repaird>`,
+ that coordinates all maintenance operations on a cluster, i.e. rebalancing,
+ activate disks, ERROR_down handling and node repairs actions.
+- ``htools`` support memory over-commitment now. Look at
+ :doc:`Memory Over Commitment <design-memory-over-commitment>` for the
+ details.
+- ``hbal`` has a new option ``--avoid-disk-moves *factor*`` that allows disk
+ moves only if the gain in the cluster metrics is ``*factor*`` times higher
+ than with no disk moves.
+- ``hcheck`` reports the level of redundancy for each node group as
a new ouput
+ parameter, see :doc:`N+M Redundancy <design-n-m-redundancy>`.
+
+
+ Version 2.16.0 beta2
+ --------------------
+
+ *(unreleased)*
+
+ Incompatible/important changes
+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+ - The options ``--no-node-setup`` of ``gnt-node add`` is disabled.
+ Instead, the cluster configuration parameter ``modify_ssh_setup`` is
+ used to determine whether or not to manipulate the SSH setup of a new
+ node.
+
+
Version 2.16.0 beta1
--------------------
diff --cc lib/cli_opts.py
index 4a720c0,9f4d530..73a2ca9
--- a/lib/cli_opts.py
+++ b/lib/cli_opts.py
@@@ -1633,11 -1596,17 +1635,22 @@@ LONG_SLEEP_OPT = cli_option
"--long-sleep", default=False, dest="long_sleep",
help="Allow long shutdowns when backing up instances",
action="store_true")
+INPUT_OPT = cli_option("--input", dest="input", default=None,
+ help=("input to be passed as stdin"
+ " to the repair command"),
+ type="string")
+
+ SSH_KEY_TYPE_OPT = \
+ cli_option("--ssh-key-type", default=None,
+ choices=list(constants.SSHK_ALL), dest="ssh_key_type",
+ help="Type of SSH key deployed by Ganeti for cluster actions")
+
+ SSH_KEY_BITS_OPT = \
+ cli_option("--ssh-key-bits", default=None,
+ type="int", dest="ssh_key_bits",
+ help="Length of SSH keys generated by Ganeti, in bits")
+
+
#: Options provided by all commands
COMMON_OPTS = [DEBUG_OPT, REASON_OPT]
diff --cc lib/objects.py
index a92eb59,4ea958a..edc4e23
--- a/lib/objects.py
+++ b/lib/objects.py
@@@ -1683,7 -1653,8 +1683,9 @@@ class Cluster(TaggableObject)
"compression_tools",
"enabled_user_shutdown",
"data_collectors",
+ "diagnose_data_collector_filename",
+ "ssh_key_type",
+ "ssh_key_bits",
] + _TIMESTAMPS + _UUID
def UpgradeConfig(self):
diff --cc lib/tools/cfgupgrade.py
index bc091da,14e2e20..d39c8fc
--- a/lib/tools/cfgupgrade.py
+++ b/lib/tools/cfgupgrade.py
@@@ -331,9 -340,15 +340,17 @@@ class CfgUpgrade(object)
cluster["data_collectors"].get(
name, dict(active=True,
interval=constants.MOND_TIME_INTERVAL * 1e6))
+ if "diagnose_data_collector_filename" not in cluster:
+ cluster["diagnose_data_collector_filename"] = ""
+ # These parameters are set to pre-2.16 default values, which
+ # differ from post-2.16 default values
+ if "ssh_key_type" not in cluster:
+ cluster["ssh_key_type"] = constants.SSHK_DSA
+
+ if "ssh_key_bits" not in cluster:
+ cluster["ssh_key_bits"] = 1024
+
@OrFail("Upgrading groups")
def UpgradeGroups(self):
cl_ipolicy = self.config_data["cluster"].get("ipolicy")
@@@ -710,25 -718,42 +729,57 @@@
# DOWNGRADE ------------------------------------------------------------
+ @OrFail("Removing SSH parameters")
+ def DowngradeSshKeyParams(self):
+ """Removes the SSH key type and bits parameters from the config.
+
+ Also fails if these have been changed from values appropriate in lower
+ Ganeti versions.
+
+ """
+ # pylint: disable=E1103
+ # Because config_data is a dictionary which has the get method.
+ cluster = self.config_data.get("cluster", None)
+ if cluster is None:
+ raise Error("Can't find the cluster entry in the configuration")
+
+ def _FetchAndDelete(key):
+ val = cluster.get(key, None)
+ if key in cluster:
+ del cluster[key]
+ return val
+
+ ssh_key_type = _FetchAndDelete("ssh_key_type")
+ _FetchAndDelete("ssh_key_bits")
+
+ if ssh_key_type is not None and ssh_key_type != "dsa":
+ raise Error("The current Ganeti setup is using non-DSA SSH keys, and"
+ " versions below 2.16 do not support these. To downgrade,"
+ " please perform a gnt-cluster renew-crypto using the "
+ " --new-ssh-keys and --ssh-key-type=dsa options, generating"
+ " DSA keys that older versions can also use.")
+
def DowngradeAll(self):
+ if "maintenance" in self.config_data:
+ del self.config_data["maintenance"]
+ if "cluster" in self.config_data:
+ cluster = self.config_data["cluster"]
+ if "diagnose_data_collector_filename" in cluster:
+ del cluster["diagnose_data_collector_filename"]
+ if "data_collectors" in cluster:
+ if constants.DATA_COLLECTOR_DIAGNOSE in cluster["data_collectors"]:
+ del cluster["data_collectors"][constants.DATA_COLLECTOR_DIAGNOSE]
+ if constants.DATA_COLLECTOR_KVM_R_S_S in cluster["data_collectors"]:
+ del cluster["data_collectors"][constants.DATA_COLLECTOR_KVM_R_S_S]
+ if "ipolicy" in cluster:
+ ipolicy = cluster["ipolicy"]
+ if "memory-ratio" in ipolicy:
+ del ipolicy["memory-ratio"]
self.config_data["version"] = version.BuildVersion(DOWNGRADE_MAJOR,
DOWNGRADE_MINOR, 0)
- return True
+
+ self.DowngradeSshKeyParams()
+ return not self.errors
def _ComposePaths(self):
# We need to keep filenames locally because they might be renamed between
diff --cc src/Ganeti/Config.hs
index d20e128,379df93..8d9e1b8
--- a/src/Ganeti/Config.hs
+++ b/src/Ganeti/Config.hs
@@@ -83,11 -82,12 +83,13 @@@ module Ganeti.Confi
, instNodes
) where
-import Control.Applicative
+import Prelude ()
+import Ganeti.Prelude
+
import Control.Arrow ((&&&))
-import Control.Monad
-import Control.Monad.State
+import Control.Monad (liftM)
+ import qualified Data.ByteString as BS
+ import qualified Data.ByteString.UTF8 as UTF8
import qualified Data.Foldable as F
import Data.List (foldl', nub)
import Data.Maybe (fromMaybe)
@@@ -191,7 -190,7 +193,7 @@@ getMasterNodes cfg
-- | Get the list of master candidates, /not including/ the master itself.
getMasterCandidates :: ConfigData -> [Node]
--getMasterCandidates cfg =
++getMasterCandidates cfg =
filter ((==) NRCandidate . getNodeRole cfg) . F.toList . configNodes $ cfg
-- | Get the list of master candidates, /including/ the master.
diff --cc src/Ganeti/DataCollectors.hs
index fa35c62,33ad9cb..3c1146d
--- a/src/Ganeti/DataCollectors.hs
+++ b/src/Ganeti/DataCollectors.hs
@@@ -34,13 -34,11 +34,14 @@@ SOFTWARE, EVEN IF ADVISED OF THE POSSIB
module Ganeti.DataCollectors( collectors ) where
+import Prelude ()
+import Ganeti.Prelude
+
+ import qualified Data.ByteString.UTF8 as UTF8
import Data.Map (findWithDefault)
-import Data.Monoid (mempty)
import qualified Ganeti.DataCollectors.CPUload as CPUload
+import qualified Ganeti.DataCollectors.Diagnose as Diagnose
import qualified Ganeti.DataCollectors.Diskstats as Diskstats
import qualified Ganeti.DataCollectors.Drbd as Drbd
import qualified Ganeti.DataCollectors.InstStatus as InstStatus
diff --cc src/Ganeti/JSON.hs
index 6ce0f62,823dc31..e23ce57
--- a/src/Ganeti/JSON.hs
+++ b/src/Ganeti/JSON.hs
@@@ -85,8 -85,10 +85,10 @@@ module Ganeti.JSO
import Control.Applicative
import Control.DeepSeq
-import Control.Monad.Error.Class
+import Control.Monad.Error.Class (MonadError(..))
import Control.Monad.Writer
+ import qualified Data.ByteString as BS
+ import qualified Data.ByteString.UTF8 as UTF8
import qualified Data.Foldable as F
import qualified Data.Text as T
import qualified Data.Traversable as F
diff --cc src/Ganeti/MaintD/CleanupIncidents.hs
index 1347f04,0000000..f8aaf92
mode 100644,000000..100644
--- a/src/Ganeti/MaintD/CleanupIncidents.hs
+++ b/src/Ganeti/MaintD/CleanupIncidents.hs
@@@ -1,86 -1,0 +1,87 @@@
+{-| Incident clean up in the maintenance daemon.
+
+This module implements the clean up of events that are finished,
+and acknowledged as such by the user.
+
+-}
+
+{-
+
+Copyright (C) 2015 Google Inc.
+All rights reserved.
+
+Redistribution and use in source and binary forms, with or without
+modification, are permitted provided that the following conditions are
+met:
+
+1. Redistributions of source code must retain the above copyright notice,
+this list of conditions and the following disclaimer.
+
+2. Redistributions in binary form must reproduce the above copyright
+notice, this list of conditions and the following disclaimer in the
+documentation and/or other materials provided with the distribution.
+
+THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS
+IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED
+TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
+PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR
+CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
+EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
+PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
+PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
+LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
+NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
+SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+-}
+
+module Ganeti.MaintD.CleanupIncidents
+ ( cleanupIncidents
+ ) where
+
+import Control.Arrow ((&&&))
+import Control.Monad (unless)
+import Control.Monad.IO.Class (liftIO)
++import qualified Data.ByteString.UTF8 as UTF8
+import Data.IORef (IORef)
+
+import Ganeti.BasicTypes (ResultT, mkResultT)
+import qualified Ganeti.HTools.Container as Container
+import qualified Ganeti.HTools.Node as Node
+import Ganeti.Logging.Lifted
+import Ganeti.MaintD.MemoryState (MemoryState, getIncidents, rmIncident)
+import Ganeti.Objects.Maintenance (Incident(..), RepairStatus(..))
+import Ganeti.Utils (logAndBad)
+
+-- | Remove a single incident, provided the corresponding tag
+-- is no longer present.
+cleanupIncident :: IORef MemoryState
+ -> Node.List
+ -> Incident
+ -> ResultT String IO ()
+cleanupIncident memstate nl incident = do
+ let location = incidentNode incident
+ uuid = incidentUuid incident
+ tag = incidentTag incident
+ nodes = filter ((==) location . Node.name) $ Container.elems nl
+ case nodes of
+ [] -> do
+ logInfo $ "No node any more with name " ++ location
- ++ "; will forget event " ++ uuid
- liftIO $ rmIncident memstate uuid
++ ++ "; will forget event " ++ UTF8.toString uuid
++ liftIO . rmIncident memstate $ UTF8.toString uuid
+ [nd] -> unless (tag `elem` Node.nTags nd) $ do
+ logInfo $ "Tag " ++ tag ++ " removed on " ++ location
- ++ "; will forget event " ++ uuid
- liftIO $ rmIncident memstate uuid
++ ++ "; will forget event " ++ UTF8.toString uuid
++ liftIO . rmIncident memstate $ UTF8.toString uuid
+ _ -> mkResultT . logAndBad
+ $ "Found More than one node with name " ++ location
+
+-- | Remove all incidents from the record that are in a final state
+-- and additionally the node tag for that incident has been removed.
+cleanupIncidents :: IORef MemoryState -> Node.List -> ResultT String IO ()
+cleanupIncidents memstate nl = do
+ incidents <- getIncidents memstate
+ let finalized = filter ((> RSPending) . incidentRepairStatus) incidents
+ logDebug . (++) "Finalized incidents " . show
+ $ map (incidentNode &&& incidentUuid) finalized
+ mapM_ (cleanupIncident memstate nl) finalized
diff --cc src/Ganeti/MaintD/CollectIncidents.hs
index ece48bc,0000000..ba31569
mode 100644,000000..100644
--- a/src/Ganeti/MaintD/CollectIncidents.hs
+++ b/src/Ganeti/MaintD/CollectIncidents.hs
@@@ -1,129 -1,0 +1,130 @@@
+{-| Discovery of incidents by the maintenance daemon.
+
+This module implements the querying of all monitoring
+daemons for the value of the node-status data collector.
+Any new incident gets registered.
+
+-}
+
+{-
+
+Copyright (C) 2015 Google Inc.
+All rights reserved.
+
+Redistribution and use in source and binary forms, with or without
+modification, are permitted provided that the following conditions are
+met:
+
+1. Redistributions of source code must retain the above copyright notice,
+this list of conditions and the following disclaimer.
+
+2. Redistributions in binary form must reproduce the above copyright
+notice, this list of conditions and the following disclaimer in the
+documentation and/or other materials provided with the distribution.
+
+THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS
+IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED
+TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
+PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR
+CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
+EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
+PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
+PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
+LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
+NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
+SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+-}
+
+module Ganeti.MaintD.CollectIncidents
+ ( collectIncidents
+ ) where
+
+import Control.Applicative (liftA2)
+import Control.Monad (unless)
+import Control.Monad.IO.Class (liftIO)
++import qualified Data.ByteString.UTF8 as UTF8
+import Data.IORef (IORef)
+import Network.Curl
+import System.Time (getClockTime)
+import qualified Text.JSON as J
+
+import Ganeti.BasicTypes (ResultT)
+import qualified Ganeti.Constants as C
+import qualified Ganeti.DataCollectors.Diagnose as D
+import Ganeti.DataCollectors.Types (getCategoryName)
+import qualified Ganeti.HTools.Container as Container
+import qualified Ganeti.HTools.Node as Node
+import Ganeti.Logging.Lifted
+import Ganeti.MaintD.MemoryState (MemoryState, getIncidents, updateIncident)
+import Ganeti.Objects.Maintenance
+import Ganeti.Utils (newUUID)
+
+-- | Query a node, unless it is offline, and return
+-- the paylod of the report, if available. For offline
+-- nodes return nothing.
+queryStatus :: Node.Node -> IO (Maybe J.JSValue)
+queryStatus node = do
+ let name = Node.name node
+ let url = name ++ ":" ++ show C.defaultMondPort
+ ++ "/1/report/" ++ maybe "default" getCategoryName D.dcCategory
+ ++ "/" ++ D.dcName
+ if Node.offline node
+ then do
+ logDebug $ "Not asking " ++ name ++ "; it is offline"
+ return Nothing
+ else do
+ (code, body) <- liftIO $ curlGetString url []
+ case code of
+ CurlOK ->
+ case J.decode body of
+ J.Ok r -> return $ Just r
+ _ -> return Nothing
+ _ -> do
+ logWarning $ "Failed to contact " ++ name
+ return Nothing
+
+-- | Update the status of one node.
+updateNode :: IORef MemoryState -> Node.Node -> ResultT String IO ()
+updateNode memstate node = do
+ let name = Node.name node
+ logDebug $ "Inspecting " ++ name
+ report <- liftIO $ queryStatus node
+ case report of
+ Just (J.JSObject obj)
+ | Just orig@(J.JSObject origobj) <- lookup "data" $ J.fromJSObject obj,
+ Just s <- lookup "status" $ J.fromJSObject origobj,
+ J.Ok state <- J.readJSON s,
+ state /= RANoop -> do
+ let origs = J.encode orig
+ logDebug $ "Relevant event on " ++ name ++ ": " ++ origs
+ incidents <- getIncidents memstate
+ unless (any (liftA2 (&&)
+ ((==) name . incidentNode)
+ ((==) orig . incidentOriginal)) incidents) $ do
+ logInfo $ "Registering new incident on " ++ name ++ ": " ++ origs
+ uuid <- liftIO newUUID
+ now <- liftIO getClockTime
+ let tag = C.maintdSuccessTagPrefix ++ uuid
+ incident = Incident { incidentOriginal = orig
+ , incidentAction = state
+ , incidentRepairStatus = RSNoted
+ , incidentJobs = []
+ , incidentNode = name
+ , incidentTag = tag
- , incidentUuid = uuid
++ , incidentUuid = UTF8.fromString uuid
+ , incidentCtime = now
+ , incidentMtime = now
+ , incidentSerial = 1
+ }
+ liftIO $ updateIncident memstate incident
+ _ -> return ()
+
+
+-- | Query all MonDs for updates on the node-status.
+collectIncidents :: IORef MemoryState -> Node.List -> ResultT String IO ()
+collectIncidents memstate nl = do
+ _ <- getIncidents memstate -- always update the memory state,
+ -- even if we do not observe anything
+ logDebug "Querying all nodes for incidents"
+ mapM_ (updateNode memstate) $ Container.elems nl
diff --cc src/Ganeti/MaintD/FailIncident.hs
index 4f9a7b8,0000000..917cb78
mode 100644,000000..100644
--- a/src/Ganeti/MaintD/FailIncident.hs
+++ b/src/Ganeti/MaintD/FailIncident.hs
@@@ -1,92 -1,0 +1,93 @@@
+{-| Incident failing in the maintenace daemon
+
+This module implements the treatment of an incident, once
+a job failed.
+
+-}
+
+{-
+
+Copyright (C) 2015 Google Inc.
+All rights reserved.
+
+Redistribution and use in source and binary forms, with or without
+modification, are permitted provided that the following conditions are
+met:
+
+1. Redistributions of source code must retain the above copyright notice,
+this list of conditions and the following disclaimer.
+
+2. Redistributions in binary form must reproduce the above copyright
+notice, this list of conditions and the following disclaimer in the
+documentation and/or other materials provided with the distribution.
+
+THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS
+IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED
+TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
+PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR
+CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
+EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
+PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
+PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
+LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
+NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
+SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+-}
+
+module Ganeti.MaintD.FailIncident
+ ( failIncident
+ ) where
+
+import Control.Exception.Lifted (bracket)
+import Control.Lens.Setter (over)
+import Control.Monad (liftM, when)
+import Control.Monad.IO.Class (liftIO)
++import qualified Data.ByteString.UTF8 as UTF8
+import Data.IORef (IORef)
+import System.IO.Error (tryIOError)
+
+import Ganeti.BasicTypes (ResultT, mkResultT, GenericResult(..))
+import qualified Ganeti.Constants as C
+import Ganeti.JQueue (currentTimestamp)
+import Ganeti.Jobs (execJobsWaitOkJid)
+import Ganeti.Logging.Lifted
+import qualified Ganeti.Luxi as L
+import Ganeti.MaintD.MemoryState (MemoryState, getIncidents, updateIncident)
+import Ganeti.MaintD.Utils (annotateOpCode)
+import Ganeti.Objects.Lens (incidentJobsL)
+import Ganeti.Objects.Maintenance (Incident(..), RepairStatus(..))
+import Ganeti.OpCodes (OpCode(..))
+import qualified Ganeti.Path as Path
+import Ganeti.Types (JobId, fromJobId, TagKind(..))
+
+-- | Mark an incident as failed.
+markAsFailed :: IORef MemoryState -> Incident -> ResultT String IO ()
+markAsFailed memstate incident = do
+ let uuid = incidentUuid incident
- newtag = C.maintdFailureTagPrefix ++ uuid
- logInfo $ "Marking incident " ++ uuid ++ " as failed"
++ newtag = C.maintdFailureTagPrefix ++ UTF8.toString uuid
++ logInfo $ "Marking incident " ++ UTF8.toString uuid ++ " as failed"
+ now <- liftIO currentTimestamp
+ luxiSocket <- liftIO Path.defaultQuerySocket
+ jids <- bracket (mkResultT . liftM (either (Bad . show) Ok)
+ . tryIOError $ L.getLuxiClient luxiSocket)
+ (liftIO . L.closeClient)
+ (mkResultT . execJobsWaitOkJid
+ [[ annotateOpCode "marking incident handling as
failed" now
+ . OpTagsSet TagKindNode [ newtag ]
+ . Just $ incidentNode incident ]])
+ let incident' = over incidentJobsL (++ jids)
+ $ incident { incidentRepairStatus = RSFailed
+ , incidentTag = newtag
+ }
+ liftIO $ updateIncident memstate incident'
+
+-- | Mark the incident, if any, belonging to the given job as
+-- failed after having tagged it appropriately.
+failIncident :: IORef MemoryState -> JobId -> ResultT String IO ()
+failIncident memstate jid = do
+ incidents <- getIncidents memstate
+ let affected = filter (elem jid . incidentJobs) incidents
+ when (null affected) . logInfo
+ $ "Job " ++ show (fromJobId jid) ++ " does not belong to an incident"
+ mapM_ (markAsFailed memstate) affected
diff --cc src/Ganeti/MaintD/HandleIncidents.hs
index 600707d,0000000..c6da8fd
mode 100644,000000..100644
--- a/src/Ganeti/MaintD/HandleIncidents.hs
+++ b/src/Ganeti/MaintD/HandleIncidents.hs
@@@ -1,297 -1,0 +1,298 @@@
+{-| Incident handling in the maintenance daemon.
+
+This module implements the submission of actions for ongoing
+repair events reported by the node-status data collector.
+
+-}
+
+{-
+
+Copyright (C) 2015 Google Inc.
+All rights reserved.
+
+Redistribution and use in source and binary forms, with or without
+modification, are permitted provided that the following conditions are
+met:
+
+1. Redistributions of source code must retain the above copyright notice,
+this list of conditions and the following disclaimer.
+
+2. Redistributions in binary form must reproduce the above copyright
+notice, this list of conditions and the following disclaimer in the
+documentation and/or other materials provided with the distribution.
+
+THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS
+IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED
+TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
+PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR
+CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
+EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
+PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
+PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
+LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
+NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
+SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+-}
+
+module Ganeti.MaintD.HandleIncidents
+ ( handleIncidents
+ ) where
+
+import Control.Arrow ((&&&))
+import Control.Exception.Lifted (bracket)
+import Control.Lens.Setter (over)
+import Control.Monad (foldM)
+import Control.Monad.IO.Class (liftIO)
++import qualified Data.ByteString.UTF8 as UTF8
+import Data.Function (on)
+import Data.IORef (IORef)
+import qualified Data.Map as Map
+import qualified Data.Set as Set
+import qualified Text.JSON as J
+
+import Ganeti.BasicTypes ( GenericResult(..), ResultT, mkResultT, Down(..))
+import qualified Ganeti.Constants as C
+import Ganeti.HTools.AlgorithmParams (AlgorithmOptions(..), defaultOptions)
+import Ganeti.HTools.Cluster.Evacuate (tryNodeEvac, EvacSolution(..))
+import qualified Ganeti.HTools.Container as Container
+import qualified Ganeti.HTools.Group as Group
+import qualified Ganeti.HTools.Instance as Instance
+import qualified Ganeti.HTools.Node as Node
+import Ganeti.HTools.Types (Idx)
+import Ganeti.JQueue (currentTimestamp)
+import Ganeti.Jobs (execJobsWaitOkJid, submitJobs, forceFailover)
+import Ganeti.Logging.Lifted
+import qualified Ganeti.Luxi as L
+import Ganeti.MaintD.MemoryState ( MemoryState, getIncidents, rmIncident
+ , updateIncident, appendJobs)
+import Ganeti.MaintD.Utils (annotateOpCode, getRepairCommand)
+import Ganeti.Objects.Lens (incidentJobsL)
+import Ganeti.Objects.Maintenance ( RepairStatus(..), RepairAction(..)
+ , Incident(..))
+import Ganeti.OpCodes (OpCode(..), MetaOpCode)
+import qualified Ganeti.Path as Path
+import Ganeti.Types ( cTimeOf, uuidOf, mkNonEmpty, fromJobId
+ , EvacMode(..), TagKind(..))
+import Ganeti.Utils (maxBy, logAndBad)
+
+-- | Given two incidents, choose the more severe one; for equally severe
+-- ones the older (by creation timestamp).
+moreSevereIncident :: Incident -> Incident -> Incident
+moreSevereIncident = maxBy (compare `on` incidentAction &&& (Down . cTimeOf))
+
+-- | From a given list of incidents, return, for each node,
+-- the one with the most severe action.
+rankIncidents :: [Incident] -> Map.Map String Incident
+rankIncidents = foldl (\m i -> Map.insertWith moreSevereIncident
+ (incidentNode i) i m) Map.empty
+
+-- | Generate a job to drain a given node.
+drainJob :: String -> ResultT String IO [ MetaOpCode ]
+drainJob name = do
+ name' <- mkNonEmpty name
+ now <- liftIO currentTimestamp
+ return $ map (annotateOpCode ("Draining " ++ name) now)
+ [ OpNodeSetParams { opNodeName = name'
+ , opNodeUuid = Nothing
+ , opForce = True
+ , opHvState = Nothing
+ , opDiskState = Nothing
+ , opMasterCandidate = Nothing
+ , opOffline = Nothing
+ , opDrained = Just True
+ , opAutoPromote = False
+ , opMasterCapable = Nothing
+ , opVmCapable = Nothing
+ , opSecondaryIp = Nothing
+ , opgenericNdParams = Nothing
+ , opPowered = Nothing
+ } ]
+
+-- | Submit and register the next job for a node evacuation.
+handleEvacuation :: L.Client -- ^ Luxi client to use
+ -> IORef MemoryState -- ^ memory state of the daemon
+ -> (Group.List, Node.List, Instance.List) -- ^ cluster
+ -> Idx -- ^ index of the node to evacuate
+ -> Bool -- ^ whether to try migrations
+ -> Set.Set Int -- ^ allowed nodes for evacuation
+ -> Incident -- ^ the incident
+ -> ResultT String IO (Set.Set Int) -- ^ nodes still available
+handleEvacuation client memst (gl, nl, il) ndx migrate freenodes incident = do
+ let node = Container.find ndx nl
+ name = Node.name node
+ fNdNames = map (Node.name . flip Container.find nl) $
Set.elems freenodes
+ evacOpts = defaultOptions { algEvacMode = True
+ , algIgnoreSoftErrors = True
+ , algRestrictToNodes = Just fNdNames
+ }
+ evacFun = tryNodeEvac evacOpts gl nl il
+ migrateFun = if migrate then id else forceFailover
+ annotateFun = annotateOpCode $ "Evacuating " ++ name
+ pendingIncident = incident { incidentRepairStatus = RSPending }
+ updateJobs jids_r = case jids_r of
+ Ok jids -> do
+ let incident' = over incidentJobsL (++ jids) pendingIncident
+ liftIO $ updateIncident memst incident'
+ liftIO $ appendJobs memst jids
+ logDebug $ "Jobs submitted: " ++ show (map fromJobId jids)
+ Bad e -> mkResultT . logAndBad
+ $ "Failure evacuating " ++ name ++ ": " ++ e
+ logInstName i = logInfo $ "Evacuating instance "
+ ++ Instance.name (Container.find i il)
+ ++ " from " ++ name
+ execSol sol = do
+ now <- liftIO currentTimestamp
+ let jobs = map (map (annotateFun now . migrateFun)) $ esOpCodes sol
+ jids <- liftIO $ submitJobs jobs client
+ updateJobs jids
+ let touched = esMoved sol >>= \(_, _, nidxs) -> nidxs
+ return $ freenodes Set.\\ Set.fromList touched
+ logDebug $ "Handling evacuation of " ++ name
+ case () of _ | not $ Node.offline node -> do
+ logDebug $ "Draining node " ++ name
+ job <- drainJob name
+ jids <- liftIO $ submitJobs [job] client
+ updateJobs jids
+ return freenodes
+ | i:_ <- Node.pList node -> do
+ logInstName i
+ (_, _, sol) <- mkResultT . return $ evacFun
ChangePrimary [i]
+ execSol sol
+ | i:_ <- Node.sList node -> do
+ logInstName i
+ (_, _, sol) <- mkResultT . return
+ $ evacFun ChangeSecondary [i]
+ execSol sol
+ | otherwise -> do
+ logDebug $ "Finished evacuation of " ++ name
+ now <- liftIO currentTimestamp
+ jids <- mkResultT $ execJobsWaitOkJid
+ [[ annotateFun now
+ . OpTagsSet TagKindNode [ incidentTag
incident ]
+ $ Just name]] client
+ let incident' = over incidentJobsL (++ jids)
+ $ incident { incidentRepairStatus =
+ RSCompleted }
+ liftIO $ updateIncident memst incident'
+ liftIO $ appendJobs memst jids
+ return freenodes
+
+-- | Submit the next action for a live-repair incident.
+handleLiveRepairs :: L.Client -- ^ Luxi client to use
+ -> IORef MemoryState -- ^ memory state of the daemon
+ -> Idx -- ^ the node to handle the event on
+ -> Set.Set Int -- ^ unaffected nodes
+ -> Incident -- ^ the incident
+ -> ResultT String IO (Set.Set Int) -- ^ nodes still available
+handleLiveRepairs client memst ndx freenodes incident = do
+ let maybeCmd = getRepairCommand incident
+ uuid = incidentUuid incident
+ name = incidentNode incident
+ now <- liftIO currentTimestamp
+ logDebug $ "Handling requested command " ++ show maybeCmd ++ " on " ++ name
+ case () of
+ _ | null $ incidentJobs incident,
+ Just cmd <- maybeCmd,
+ cmd /= "" -> do
+ logDebug "Submitting repair command job"
+ name' <- mkNonEmpty name
+ cmd' <- mkNonEmpty cmd
+ orig' <- mkNonEmpty . J.encode $ incidentOriginal incident
+ jids_r <- liftIO $ submitJobs
+ [[ annotateOpCode "repair command requested
by node" now
+ OpRepairCommand { opNodeName = name'
+ , opRepairCommand = cmd'
+ , opInput = Just orig'
+ } ]] client
+ case jids_r of
+ Ok jids -> do
+ let incident' = over incidentJobsL (++ jids) incident
+ liftIO $ updateIncident memst incident'
+ liftIO $ appendJobs memst jids
+ logDebug $ "Jobs submitted: " ++ show (map fromJobId jids)
+ Bad e -> mkResultT . logAndBad
+ $ "Failure requesting command " ++ cmd ++ " on " ++ name
+ ++ ": " ++ e
+ | null $ incidentJobs incident -> do
- logInfo $ "Marking incident " ++ uuid ++ " as failed;"
++ logInfo $ "Marking incident " ++ UTF8.toString uuid ++ "
as failed;"
+ ++ " command for live repair not specified"
- let newtag = C.maintdFailureTagPrefix ++ uuid
++ let newtag = C.maintdFailureTagPrefix ++ UTF8.toString uuid
+ jids <- mkResultT $ execJobsWaitOkJid
+ [[ annotateOpCode "marking incident as ill
specified" now
+ . OpTagsSet TagKindNode [ newtag ]
+ $ Just name ]] client
+ let incident' = over incidentJobsL (++ jids)
+ $ incident { incidentRepairStatus = RSFailed
+ , incidentTag = newtag
+ }
+ liftIO $ updateIncident memst incident'
+ liftIO $ appendJobs memst jids
+ | otherwise -> do
+ logDebug "Command execution has succeeded"
+ jids <- mkResultT $ execJobsWaitOkJid
+ [[ annotateOpCode "repair command requested by node" now
+ . OpTagsSet TagKindNode [ incidentTag incident ]
+ $ Just name ]] client
+ let incident' = over incidentJobsL (++ jids)
+ $ incident { incidentRepairStatus = RSCompleted }
+ liftIO $ updateIncident memst incident'
+ liftIO $ appendJobs memst jids
+ return $ Set.delete ndx freenodes
+
+
+-- | Submit the next actions for a single incident, given the
unaffected nodes;
+-- register all submitted jobs and return the new set of unaffected nodes.
+handleIncident :: L.Client
+ -> IORef MemoryState
+ -> (Group.List, Node.List, Instance.List)
+ -> Set.Set Int
+ -> (String, Incident)
+ -> ResultT String IO (Set.Set Int)
+handleIncident client memstate (gl, nl, il) freeNodes (name, incident) = do
+ ndx <- case Container.keys $ Container.filter ((==) name . Node.name) nl of
+ [ndx] -> return ndx
+ [] -> do
+ logWarning $ "Node " ++ name ++ " no longer in the cluster;"
+ ++ " clearing incident " ++ show incident
+ liftIO . rmIncident memstate $ uuidOf incident
+ fail $ "node " ++ name ++ " left the cluster"
+ ndxs -> do
+ logWarning $ "Abmigious node name " ++ name
+ ++ "; could refer to indices " ++ show ndxs
+ fail $ "ambigious name " ++ name
+ case incidentAction incident of
+ RANoop -> do
+ logDebug $ "Nothing to do for " ++ show incident
+ liftIO . rmIncident memstate $ uuidOf incident
+ return freeNodes
+ RALiveRepair ->
+ handleLiveRepairs client memstate ndx freeNodes incident
+ RAEvacuate ->
+ handleEvacuation client memstate (gl, nl, il) ndx True
freeNodes incident
+ RAEvacuateFailover ->
+ handleEvacuation client memstate (gl, nl, il) ndx False
freeNodes incident
+
+-- | Submit the jobs necessary for the next maintenance step
+-- for each pending maintenance, i.e., the most radical maintenance
+-- for each node. Return the set of node indices unaffected by these
+-- operations. Also, for each job submitted, register it directly.
+handleIncidents :: IORef MemoryState
+ -> (Group.List, Node.List, Instance.List)
+ -> ResultT String IO (Set.Set Int)
+handleIncidents memstate (gl, nl, il) = do
+ incidents <- getIncidents memstate
+ let activeIncidents = filter ((<= RSPending) .
incidentRepairStatus) incidents
+ incidentsToHandle = rankIncidents activeIncidents
+ incidentNodes = Set.fromList . Container.keys
+ $ Container.filter ((`Map.member` incidentsToHandle) . Node.name) nl
+ freeNodes = Set.fromList (Container.keys nl) Set.\\ incidentNodes
+ if null activeIncidents
+ then return freeNodes
+ else do
+ luxiSocket <- liftIO Path.defaultQuerySocket
+ bracket (liftIO $ L.getLuxiClient luxiSocket)
+ (liftIO . L.closeClient)
+ $ \ client ->
+ foldM (handleIncident client memstate (gl, nl, il)) freeNodes
+ $ Map.assocs incidentsToHandle
diff --cc src/Ganeti/Monitoring/Server.hs
index aaad4f4,da78b00..668779b
--- a/src/Ganeti/Monitoring/Server.hs
+++ b/src/Ganeti/Monitoring/Server.hs
@@@ -47,13 -44,16 +47,14 @@@ import Ganeti.Prelud
import Control.Applicative
import Control.DeepSeq (force)
import Control.Exception.Base (evaluate)
-import Control.Monad
+import Control.Monad (void, forever, liftM, foldM, foldM_, mzero)
import Control.Monad.IO.Class
-import Data.ByteString.Char8 (pack, unpack)
+import Data.ByteString.Char8 (unpack)
+ import qualified Data.ByteString.UTF8 as UTF8
import Data.Maybe (fromMaybe)
import Data.List (find)
-import Data.Monoid (mempty)
import qualified Data.Map as Map
import qualified Data.PSQueue as Queue
-import Network.BSD (getServicePortNumber)
import Snap.Core
import Snap.Http.Server
import qualified Text.JSON as J
diff --cc src/Ganeti/Objects.hs
index 52a00fb,1855334..e6078cd
--- a/src/Ganeti/Objects.hs
+++ b/src/Ganeti/Objects.hs
@@@ -103,18 -103,12 +103,19 @@@ module Ganeti.Object
, module Ganeti.PartialParams
, module Ganeti.Objects.Disk
, module Ganeti.Objects.Instance
- ) where
+ , module Ganeti.Objects.Maintenance
+ , FilledHvStateParams(..)
+ , PartialHvStateParams(..)
+ , allHvStateParamFields
+ , FilledHvState
+ , PartialHvState ) where
+
+import Prelude ()
+import Ganeti.Prelude
-import Control.Applicative
import Control.Arrow (first)
import Control.Monad.State
+ import qualified Data.ByteString.UTF8 as UTF8
import Data.List (foldl', intercalate)
import Data.Maybe
import qualified Data.Map as Map
@@@ -690,8 -679,8 +691,10 @@@ $(buildObject "Cluster" "cluster"
, simpleField "compression_tools" [t| [String]
|]
, simpleField "enabled_user_shutdown" [t| Bool
|]
, simpleField "data_collectors" [t| Container
DataCollectorConfig |]
+ , defaultField [| [] |] $ simpleField
+ "diagnose_data_collector_filename" [t| String
|]
+ , simpleField "ssh_key_type" [t| SshKeyType
|]
+ , simpleField "ssh_key_bits" [t| Int
|]
]
++ timeStampFields
++ uuidFields
diff --cc src/Ganeti/Objects/Disk.hs
index f3e08da,ca939d1..7e8b962
--- a/src/Ganeti/Objects/Disk.hs
+++ b/src/Ganeti/Objects/Disk.hs
@@@ -36,9 -36,8 +36,10 @@@ SOFTWARE, EVEN IF ADVISED OF THE POSSIB
module Ganeti.Objects.Disk where
-import Control.Applicative ((<*>), (<$>))
+import Prelude ()
+import Ganeti.Prelude
+
+ import qualified Data.ByteString.UTF8 as UTF8
import Data.Char (isAsciiLower, isAsciiUpper, isDigit)
import Data.List (isPrefixOf, isInfixOf)
import Language.Haskell.TH.Syntax
diff --cc src/Ganeti/Objects/Instance.hs
index d25e134,fb35f65..a946b4e
--- a/src/Ganeti/Objects/Instance.hs
+++ b/src/Ganeti/Objects/Instance.hs
@@@ -36,8 -39,8 +39,10 @@@ SOFTWARE, EVEN IF ADVISED OF THE POSSIB
module Ganeti.Objects.Instance where
+ import qualified Data.ByteString.UTF8 as UTF8
-import Data.Monoid
++
+import Prelude ()
+import Ganeti.Prelude
import Ganeti.JSON (emptyContainer)
import Ganeti.Objects.Nic
diff --cc src/Ganeti/Objects/Maintenance.hs
index 2f0c2f8,0000000..ea6e709
mode 100644,000000..100644
--- a/src/Ganeti/Objects/Maintenance.hs
+++ b/src/Ganeti/Objects/Maintenance.hs
@@@ -1,114 -1,0 +1,115 @@@
+{-# LANGUAGE TemplateHaskell #-}
+
+{-| Implementation of the Ganeti configuration for the maintenance daemon.
+
+-}
+
+{-
+
+Copyright (C) 2015 Google Inc.
+All rights reserved.
+
+Redistribution and use in source and binary forms, with or without
+modification, are permitted provided that the following conditions are
+met:
+
+1. Redistributions of source code must retain the above copyright notice,
+this list of conditions and the following disclaimer.
+
+2. Redistributions in binary form must reproduce the above copyright
+notice, this list of conditions and the following disclaimer in the
+documentation and/or other materials provided with the distribution.
+
+THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS
+IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED
+TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
+PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR
+CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
+EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
+PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
+PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
+LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
+NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
+SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+-}
+
+module Ganeti.Objects.Maintenance
+ ( MaintenanceData(..)
+ , RepairAction(..)
+ , RepairStatus(..)
+ , Incident(..)
+ ) where
+
++import qualified Data.ByteString.UTF8 as UTF8
+import qualified Text.JSON as J
+
+import qualified Ganeti.Constants as C
+import Ganeti.THH
+import Ganeti.THH.Field
+import Ganeti.Types
+
+-- | Action to be taken for a certain repair event. Note
+-- that the order is important, as we rely on values higher
+-- in the derived order to be more intrusive actions.
+$(declareLADT ''String "RepairAction"
+ [ ("RANoop", "Ok")
+ , ("RALiveRepair", "live-repair")
+ , ("RAEvacuate", "evacuate")
+ , ("RAEvacuateFailover", "evacuate-failover")
+ ])
+$(makeJSONInstance ''RepairAction)
+
+-- | Progress made on the particular repair event. Again we rely
+-- on the order in that everything larger than `RSPending` is finalized
+-- in the sense that no further jobs will be submitted.
+$(declareLADT ''String "RepairStatus"
+ [ ("RSNoted", "noted")
+ , ("RSPending", "pending")
+ , ("RSCanceled", "canceled")
+ , ("RSFailed", "failed")
+ , ("RSCompleted", "completed")
+ ])
+$(makeJSONInstance ''RepairStatus)
+
+$(buildObject "Incident" "incident" $
+ [ simpleField "original" [t| J.JSValue |]
+ , simpleField "action" [t| RepairAction |]
+ , defaultField [| [] |] $ simpleField "jobs" [t| [ JobId ] |]
+ , simpleField "node" [t| String |]
+ , simpleField "repair-status" [t| RepairStatus |]
+ , simpleField "tag" [t| String |]
+ ]
+ ++ uuidFields
+ ++ timeStampFields
+ ++ serialFields)
+
+instance SerialNoObject Incident where
+ serialOf = incidentSerial
+
+instance TimeStampObject Incident where
+ cTimeOf = incidentCtime
+ mTimeOf = incidentMtime
+
+instance UuidObject Incident where
- uuidOf = incidentUuid
++ uuidOf = UTF8.toString . incidentUuid
+
+$(buildObject "MaintenanceData" "maint" $
+ [ defaultField [| C.maintdDefaultRoundDelay |]
+ $ simpleField "roundDelay" [t| Int |]
+ , defaultField [| [] |] $ simpleField "jobs" [t| [ JobId ] |]
+ , defaultField [| False |] $ simpleField "balance" [t| Bool |]
+ , defaultField [| 0.1 :: Double |]
+ $ simpleField "balanceThreshold" [t| Double |]
+ , defaultField [| [] |] $ simpleField "evacuated" [t| [ String ] |]
+ , defaultField [| [] |] $ simpleField "incidents" [t| [ Incident ] |]
+ ]
+ ++ timeStampFields
+ ++ serialFields)
+
+instance SerialNoObject MaintenanceData where
+ serialOf = maintSerial
+
+instance TimeStampObject MaintenanceData where
+ cTimeOf = maintCtime
+ mTimeOf = maintMtime
diff --cc src/Ganeti/OpParams.hs
index 8176ac6,0a793c0..a965df3
--- a/src/Ganeti/OpParams.hs
+++ b/src/Ganeti/OpParams.hs
@@@ -299,13 -297,11 +299,15 @@@ module Ganeti.OpParam
, pEnabledUserShutdown
, pAdminStateSource
, pEnabledDataCollectors
+ , pMaintdRoundDelay
+ , pMaintdEnableBalancing
+ , pMaintdBalancingThreshold
, pDataCollectorInterval
+ , pDiagnoseDataCollectorFilename
, pNodeSslCerts
- , pSshKeys
+ , pSshKeyBits
+ , pSshKeyType
+ , pRenewSshKeys
, pNodeSetup
, pVerifyClutter
, pLongSleep
diff --cc src/Ganeti/Query/Group.hs
index 9fe246d,45bd81a..66f100e
--- a/src/Ganeti/Query/Group.hs
+++ b/src/Ganeti/Query/Group.hs
@@@ -82,12 -82,7 +82,12 @@@ groupFields
, (FieldDefinition "pinst_list" "InstanceList" QFTOther
"List of primary instances",
FieldConfig (\cfg -> rsNormal . niceSort . mapMaybe instName . fst .
- getGroupInstances cfg . groupUuid), QffNormal)
+ getGroupInstances cfg . uuidOf), QffNormal)
+ , (FieldDefinition "hv_state" "HypervisorState" QFTOther
+ "Custom static hypervisor state",
+ FieldSimple (rsNormal . groupHvStateStatic), QffNormal)
+ , (FieldDefinition "disk_state" "DiskState" QFTOther "Disk state",
+ FieldSimple (rsNormal . groupDiskStateStatic), QffNormal)
] ++
map buildNdParamField allNDParamFields ++
timeStampFields ++
diff --cc src/Ganeti/Query/Server.hs
index 4a3143c,1b7cfa5..89c8b60
--- a/src/Ganeti/Query/Server.hs
+++ b/src/Ganeti/Query/Server.hs
@@@ -272,18 -271,10 +273,22 @@@ handleCall _ _ cdata QueryClusterInfo
, ("data_collector_interval",
showJSON . fmap dataCollectorInterval
$ clusterDataCollectors cluster)
+ , ("diagnose_data_collector_filename",
+ showJSON $ clusterDiagnoseDataCollectorFilename cluster)
+ , ("maint_round_delay",
+ showJSON . maintRoundDelay $ configMaintenance cdata)
+ , ("maint_balance",
+ showJSON . maintBalance $ configMaintenance cdata)
+ , ("maint_balance_threshold",
+ showJSON . maintBalanceThreshold $ configMaintenance cdata)
+ , ("hv_state",
+ showJSON $ clusterHvStateStatic cluster)
+ , ("disk_state",
+ showJSON $ clusterDiskStateStatic cluster)
+ , ("modify_ssh_setup",
+ showJSON $ clusterModifySshSetup cluster)
+ , ("ssh_key_type", showJSON $ clusterSshKeyType cluster)
+ , ("ssh_key_bits", showJSON $ clusterSshKeyBits cluster)
]
in case master of
diff --cc src/Ganeti/UDSServer.hs
index 36f0ac1,868c4e9..35a493f
--- a/src/Ganeti/UDSServer.hs
+++ b/src/Ganeti/UDSServer.hs
@@@ -79,11 -77,9 +79,9 @@@ import Control.Monad.Trans.Contro
import Control.Exception (catch)
import Control.Monad
import qualified Data.ByteString as B
- import qualified Data.ByteString.Lazy as BL
import qualified Data.ByteString.UTF8 as UTF8
- import qualified Data.ByteString.Lazy.UTF8 as UTF8L
import Data.IORef
-import Data.List
+import Data.List (isInfixOf)
import Data.Word (Word8)
import qualified Network.Socket as S
import System.Directory (removeFile)
diff --cc src/Ganeti/WConfd/ConfigModifications.hs
index 13a7df2,aead178..ad656f1
--- a/src/Ganeti/WConfd/ConfigModifications.hs
+++ b/src/Ganeti/WConfd/ConfigModifications.hs
@@@ -40,16 -39,14 +40,17 @@@ SOFTWARE, EVEN IF ADVISED OF THE POSSIB
module Ganeti.WConfd.ConfigModifications where
-import Control.Applicative ((<$>))
+import Prelude ()
+import Ganeti.Prelude
+
import Control.Lens (_2)
import Control.Lens.Getter ((^.))
-import Control.Lens.Setter ((.~), (%~))
+import Control.Lens.Setter (Setter, (.~), (%~), (+~), over)
+ import qualified Data.ByteString.UTF8 as UTF8
import Control.Lens.Traversal (mapMOf)
-import Control.Monad (unless, when, forM_, foldM, liftM2)
-import Control.Monad.Error (throwError, MonadError)
+import Control.Lens.Type (Simple)
+import Control.Monad (unless, when, forM_, foldM, liftM, liftM2)
+import Control.Monad.Error.Class (throwError, MonadError)
import Control.Monad.IO.Class (liftIO)
import Control.Monad.Trans.State (StateT, get, put, modify,
runStateT, execStateT)
@@@ -121,7 -117,7 +122,7 @@@ getAllIDs cs
instKeys = keysFromC . configInstances . csConfigData $ cs
nodeKeys = keysFromC . configNodes . csConfigData $ cs
--
++
instValues = map uuidOf . valuesFromC
. configInstances . csConfigData $ cs
nodeValues = map uuidOf . valuesFromC . configNodes . csConfigData $ cs
@@@ -655,77 -669,9 +674,77 @@@ updateDisk disk = d
(^. dL) (\cs -> do
dC <- toError $ replaceIn ct disk (cs ^. dL)
return ((serialOf disk + 1, ct), (dL .~ dC) cs)))
- . T.releaseDRBDMinors $ uuidOf disk
+ . T.releaseDRBDMinors . UTF8.fromString $ uuidOf disk
return . MaybeForJSON $ fmap (_2 %~ TimeAsDoubleJSON) r
+-- | Set a particular value and bump serial in the hosting
+-- structure. Arguments are a setter to focus on the part
+-- of the configuration that gets serial-bumped, and a modification
+-- of that part. The function will do the change and bump the serial
+-- in the WConfdMonad temporarily acquiring the configuration lock.
+-- Return True if that succeeded and False if the configuration lock
+-- was not available; no change is done in the latter case.
+changeAndBump :: (SerialNoObjectL a, TimeStampObjectL a)
+ => Simple Setter ConfigState a
+ -> (a -> a)
+ -> WConfdMonad Bool
+changeAndBump focus change = do
+ now <- liftIO getClockTime
+ let operation = over focus $ (serialL +~ 1) . (mTimeL .~ now) . change
+ liftM isJust $ modifyConfigWithLock
+ (\_ cs -> return . operation $ cs)
+ (return ())
+
+-- | Change and bump part of the maintenance part of the configuration.
+changeAndBumpMaint :: (MaintenanceData -> MaintenanceData) -> WConfdMonad Bool
+changeAndBumpMaint = changeAndBump $ csConfigDataL . configMaintenanceL
+
+-- | Set the maintenance intervall.
+setMaintdRoundDelay :: Int -> WConfdMonad Bool
+setMaintdRoundDelay delay = changeAndBumpMaint $ maintRoundDelayL .~ delay
+
+-- | Clear the list of current maintenance jobs.
+clearMaintdJobs :: WConfdMonad Bool
+clearMaintdJobs = changeAndBumpMaint $ maintJobsL .~ []
+
+-- | Append new jobs to the list of current maintenace jobs, if
+-- not alread present.
+appendMaintdJobs :: [JobId] -> WConfdMonad Bool
+appendMaintdJobs jobs = changeAndBumpMaint . over maintJobsL
+ $ ordNub . (++ jobs)
+
+-- | Set the autobalance flag.
+setMaintdBalance :: Bool -> WConfdMonad Bool
+setMaintdBalance value = changeAndBumpMaint $ maintBalanceL .~ value
+
+-- | Set the auto-balance threshold.
+setMaintdBalanceThreshold :: Double -> WConfdMonad Bool
+setMaintdBalanceThreshold value = changeAndBumpMaint
+ $ maintBalanceThresholdL .~ value
+
+-- | Add a name to the list of recently evacuated instances.
+addMaintdEvacuated :: [String] -> WConfdMonad Bool
+addMaintdEvacuated names = changeAndBumpMaint . over maintEvacuatedL
+ $ ordNub . (++ names)
+
+-- | Remove a name from the list of recently evacuated instances.
+rmMaintdEvacuated :: String -> WConfdMonad Bool
+rmMaintdEvacuated name = changeAndBumpMaint . over maintEvacuatedL
+ $ filter (/= name)
+
+-- | Update an incident to the list of known incidents; if the incident,
+-- as identified by the UUID, is not present, it is added.
+updateMaintdIncident :: Incident -> WConfdMonad Bool
+updateMaintdIncident incident =
+ changeAndBumpMaint . over maintIncidentsL
+ $ (incident :) . filter ((/= uuidOf incident) . uuidOf)
+
+-- | Remove an incident from the list of known incidents.
+rmMaintdIncident :: String -> WConfdMonad Bool
+rmMaintdIncident uuid =
+ changeAndBumpMaint . over maintIncidentsL
+ $ filter ((/= uuid) . uuidOf)
+
-- * The list of functions exported to RPC.
exportedFunctions :: [Name]
diff --cc src/Ganeti/WConfd/ConfigVerify.hs
index a6d537b,246b627..118d775
--- a/src/Ganeti/WConfd/ConfigVerify.hs
+++ b/src/Ganeti/WConfd/ConfigVerify.hs
@@@ -39,8 -39,8 +39,9 @@@ module Ganeti.WConfd.ConfigVerif
, verifyConfigErr
) where
-import Control.Monad.Error
+import Control.Monad (forM_)
+import Control.Monad.Error.Class (MonadError(..))
+ import qualified Data.ByteString.UTF8 as UTF8
import qualified Data.Foldable as F
import qualified Data.Map as M
import qualified Data.Set as S
diff --cc src/Ganeti/WConfd/TempRes.hs
index 5aa6343,565fae2..9c0220d
--- a/src/Ganeti/WConfd/TempRes.hs
+++ b/src/Ganeti/WConfd/TempRes.hs
@@@ -73,13 -73,13 +73,15 @@@ module Ganeti.WConfd.TempRe
, reserved
) where
-import Control.Applicative
+import Prelude ()
+import Ganeti.Prelude
+
import Control.Lens.At
-import Control.Monad.Error
+import Control.Monad.Error.Class (MonadError(..))
import Control.Monad.State
import Control.Monad.Trans.Maybe
+ import qualified Data.ByteString as BS
+ import qualified Data.ByteString.UTF8 as UTF8
import qualified Data.Foldable as F
import Data.Maybe
import Data.Map (Map)
diff --cc test/hs/Test/Ganeti/JQScheduler.hs
index 3d79877,77eb2ac..04a6287
--- a/test/hs/Test/Ganeti/JQScheduler.hs
+++ b/test/hs/Test/Ganeti/JQScheduler.hs
@@@ -37,10 -37,9 +37,11 @@@ SOFTWARE, EVEN IF ADVISED OF THE POSSIB
module Test.Ganeti.JQScheduler (testJQScheduler) where
-import Control.Applicative
+import Prelude ()
+import Ganeti.Prelude
+
import Control.Lens ((&), (.~), _2)
+ import qualified Data.ByteString.UTF8 as UTF8
import Data.List (inits)
import Data.Maybe
import qualified Data.Map as Map
diff --cc test/hs/Test/Ganeti/Objects.hs
index 6d8deef,90967ce..8e97443
--- a/test/hs/Test/Ganeti/Objects.hs
+++ b/test/hs/Test/Ganeti/Objects.hs
@@@ -55,7 -52,10 +55,9 @@@ import Ganeti.Prelud
import Test.QuickCheck
import qualified Test.HUnit as HUnit
-import Control.Applicative
-import Control.Monad
+import Control.Monad (liftM, when)
+ import qualified Data.ByteString as BS
+ import qualified Data.ByteString.UTF8 as UTF8
import Data.Char
import qualified Data.List as List
import qualified Data.Map as Map
@@@ -91,29 -91,9 +93,32 @@@ instance Arbitrary (Container DataColle
return GenericContainer {
fromContainer = Map.fromList $ zip names configs }
+-- FYI: Currently only memory node value is used
+instance Arbitrary PartialHvStateParams where
+ arbitrary = PartialHvStateParams <$> pure Nothing <*> pure Nothing
+ <*> pure Nothing <*> genMaybe (fromPositive <$> arbitrary)
+ <*> pure Nothing
+
+instance Arbitrary PartialHvState where
+ arbitrary = do
+ hv_params <- arbitrary
+ return GenericContainer {
+ fromContainer = Map.fromList [ hv_params ] }
+
+-- FYI: Currently only memory node value is used
+instance Arbitrary FilledHvStateParams where
+ arbitrary = FilledHvStateParams <$> pure 0 <*> pure 0 <*> pure 0
+ <*> (fromPositive <$> arbitrary) <*> pure 0
+
+instance Arbitrary FilledHvState where
+ arbitrary = do
+ hv_params <- arbitrary
+ return GenericContainer {
+ fromContainer = Map.fromList [ hv_params ] }
+
+ instance Arbitrary BS.ByteString where
+ arbitrary = fmap UTF8.fromString arbitrary
+
$(genArbitrary ''PartialNDParams)
instance Arbitrary Node where
@@@ -398,37 -380,15 +405,44 @@@ instance Arbitrary FilterRule wher
<*> arbitrary
<*> arbitrary
<*> arbitrary
- <*> genUUID
+ <*> fmap UTF8.fromString genUUID
+
+ instance Arbitrary SshKeyType where
+ arbitrary = oneof
+ [ pure RSA
+ , pure DSA
+ , pure ECDSA
+ ]
+instance Arbitrary RepairStatus where
+ arbitrary = elements [ RSNoted, RSPending, RSCanceled, RSFailed,
RSCompleted ]
+
+instance Arbitrary RepairAction where
+ arbitrary = elements [ RANoop, RALiveRepair, RAEvacuate,
RAEvacuateFailover ]
+
+instance Arbitrary Incident where
+ arbitrary = Incident <$> pure (J.JSObject $ J.toJSObject [])
+ <*> arbitrary
+ <*> arbitrary
+ <*> arbitrary
+ <*> arbitrary
+ <*> arbitrary
+ <*> arbitrary
+ <*> arbitrary
+ <*> arbitrary
+ <*> arbitrary
+
+instance Arbitrary MaintenanceData where
+ arbitrary = MaintenanceData <$> (fromPositive <$> arbitrary)
+ <*> arbitrary
+ <*> arbitrary
+ <*> arbitrary
+ <*> arbitrary
+ <*> arbitrary
+ <*> arbitrary
+ <*> arbitrary
+ <*> arbitrary
+
-- | Generates a network instance with minimum netmasks of /24. Generating
-- bigger networks slows down the tests, because long bit strings
are generated
-- for the reservations.
@@@ -485,8 -445,7 +499,8 @@@ genEmptyCluster ncount = d
networks = GenericContainer Map.empty
disks = GenericContainer Map.empty
filters = GenericContainer Map.empty
+ maintenance <- arbitrary
- let contgroups = GenericContainer $ Map.singleton guuid grp
+ let contgroups = GenericContainer $ Map.singleton (UTF8.fromString
guuid) grp
serial <- arbitrary
-- timestamp fields
ctime <- arbitrary
diff --cc test/py/cfgupgrade_unittest.py
index 0706f53,a6dec64..132575a
--- a/test/py/cfgupgrade_unittest.py
+++ b/test/py/cfgupgrade_unittest.py
@@@ -76,7 -74,8 +76,9 @@@ def GetMinimalConfig()
"cpu-avg-load": { "active": True, "interval": 5000000 },
"xen-cpu-avg-load": { "active": True, "interval": 5000000 },
},
+ "diagnose_data_collector_filename": "",
+ "ssh_key_type": "dsa",
+ "ssh_key_bits": 1024,
},
"instances": {},
"disks": {},
--
Lisa Velden
Software Engineer
[email protected]
Google Germany GmbH
Dienerstraße 12
80331 München
Geschäftsführer: Matthew Scott Sucherman, Paul Terence Manicle
Registergericht und -nummer: Hamburg, HRB 86891
Sitz der Gesellschaft: Hamburg