On Fri, Dec 4, 2015 at 11:04 AM, 'Lisa Velden' via ganeti-devel < [email protected]> wrote:
> commit 8ee1eb8c1e9f691df2c3fa5eb0911d3e4602d46f > Merge: fa6fd0f 713f1cf > Author: Lisa Velden <[email protected]> > Date: Fri Dec 4 14:57:39 2015 +0100 > > Merge branch 'stable-2.16' into stable-2.17 > > * stable-2.16 > Fix lines with more than 80 characters > Add more detach/attach sequence tests > Allow disk attachment to diskless instances > Improve tests for attaching disks > Use only string value in error message > Add entries describing new gnt-cluster params to manpage > QA: Add ssh-key-type and -bits tests > QA: Extend AssertCommand to allow not forwarding the agent > Remove default limit on diffs in cfgupgrade tests > QA: Downgrade the cluster key type in 2.16 > Fix typo > Fail early for invalid key type and size combinations > Handle SSH key changes in upgrades and downgrades > Allow SSH key property changes > Use the SSH key parameters when generating keys > Do not generate the ganeti_pub_keys file with --no-ssh-init > Add querying of ssh-related config values > Add modify_ssh_setup to queryable config params > Add helper function for querying cluster properties > Show info about new params in gnt-cluster info > Add the SSH key type and length to the config, and set them > Change SSH key types to a proper Haskell sum type > Add the SSH key options > Mention disabling of '--no-node-setup' in NEWS file > Show 'modify ssh setup' in cluster info > Disable --no-node-setup > Make 'modify ssh setup' queryable > Fix RPC signature of NodeVerify > Use ssconf for SSH ports in NodeVerify > * stable-2.15 > Document the decission why optimisation is turned off > Don't keep input for error messages > Use dict.copy instead of deepcopy > Use bulk-adding of keys in renew-crypto > Make NodeSshKeyAdd use its *Bulk companion > Unit test bulk-adding normal nodes > Unit test for bulk-adding pot. master candidates > Introduce bulk-adding of SSH keys > Pause watcher during performance QA > Send answers strictly > Store keys as ByteStrings > Encode UUIDs as ByteStrings > Prefer the UuidObject type class over specific functions > Assign the variables before use (bugfix for dee6adb9) > Extend QA to detect autopromotion errors > Handle SSH key distribution on auto promotion > Do not remove authorized key of node itself > Fix indentation > Support force option for deactivate disks on RAPI > * stable-2.14 > Fix faulty iallocator type check > Improve cfgupgrade output in case of errors > * stable-2.13 > Extend timeout for gnt-cluster renew-crypto > Reduce flakyness of GetCmdline test on slow machines > Remove duplicated words > * stable-2.12 > Revert "Also consider connection time out a network error" > Clone lists before modifying > Make lockConfig call retryable > Return the correct error code in the post-upgrade script > Make openssl refrain from DH altogether > Fix upgrades of instances with missing creation time > * stable-2.11 > (no changes) > * stable-2.10 > Remove -X from hspace man page > Make htools tolerate missing "dtotal" and "dfree" on luxi > > Conflicts: > NEWS > lib/cli_opts.py > lib/objects.py > src/Ganeti/Config.hs > src/Ganeti/DataCollectors.hs > src/Ganeti/Monitoring/Server.hs > src/Ganeti/Objects.hs > src/Ganeti/Objects/Disk.hs > src/Ganeti/Objects/Instance.hs > src/Ganeti/Query/Group.hs > src/Ganeti/Query/Server.hs > src/Ganeti/WConfd/ConfigModifications.hs > src/Ganeti/WConfd/ConfigVerify.hs > test/hs/Test/Ganeti/Objects.hs > test/py/cfgupgrade_unittest.py > > Resolution: > NEWS > take both changes > lib/cli_opts.py > take both changes > lib/objects.py > take both changes > src/Ganeti/Config.hs > keep the ByteString changes, but Control.Monad from 2.17 > src/Ganeti/DataCollectors.hs > take both changes > src/Ganeti/Monitoring/Server.hs > fix imports > src/Ganeti/Objects.hs > take both changes > src/Ganeti/Objects/Disk.hs > take both changes > src/Ganeti/Objects/Instance.hs > fix imports > keep 2.17 changes > src/Ganeti/Query/Group.hs > keep field definition for hv_state and disk_state, but use > uuidOf instead of groupUuid > src/Ganeti/Query/Server.hs > take both changes > src/Ganeti/WConfd/ConfigModifications.hs > fix imports > src/Ganeti/WConfd/ConfigVerify.hs > fix imports > test/hs/Test/Ganeti/Objects.hs > fix imports > take both changes > test/py/cfgupgrade_unittest.py > take both changes > > diff --cc NEWS > index 8a96376,898a739..c64fedf > --- a/NEWS > +++ b/NEWS > @@@ -2,37 -2,20 +2,51 @@@ New > ==== > > > +Version 2.17.0 alpha1 > +--------------------- > + > +*(unreleased)* > + > +Incompatible/important changes > +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > + > +- The IAllocator protocol has been extended by a new ``hv_state`` parameter. > + This new parameter is used to estimate the amount of memory utilized by > + the node. It replaces ``reserved_mem`` on hypervisors other than > ``xen-pvm`` > + and ``xen-hvm`` because ``reserved_mem`` was reported incorrectly on them. > + If this ``hv_state`` parameter is not presented in an iallocator input, > the > + old ``reserved_mem`` will be used. > + > +New features > +~~~~~~~~~~~~ > + > +- There is a new daemon, the :doc:`Ganeti Maintenance Daemon > <design-repaird>`, > + that coordinates all maintenance operations on a cluster, i.e. > rebalancing, > + activate disks, ERROR_down handling and node repairs actions. > +- ``htools`` support memory over-commitment now. Look at > + :doc:`Memory Over Commitment <design-memory-over-commitment>` for the > + details. > +- ``hbal`` has a new option ``--avoid-disk-moves *factor*`` that allows disk > + moves only if the gain in the cluster metrics is ``*factor*`` times higher > + than with no disk moves. > +- ``hcheck`` reports the level of redundancy for each node group as a new > ouput > + parameter, see :doc:`N+M Redundancy <design-n-m-redundancy>`. > + > + > + Version 2.16.0 beta2 > + -------------------- > + > + *(unreleased)* > + > + Incompatible/important changes > + ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > + > + - The options ``--no-node-setup`` of ``gnt-node add`` is disabled. > + Instead, the cluster configuration parameter ``modify_ssh_setup`` is > + used to determine whether or not to manipulate the SSH setup of a new > + node. > + > + > Version 2.16.0 beta1 > -------------------- > > diff --cc lib/cli_opts.py > index 4a720c0,9f4d530..73a2ca9 > --- a/lib/cli_opts.py > +++ b/lib/cli_opts.py > @@@ -1633,11 -1596,17 +1635,22 @@@ LONG_SLEEP_OPT = cli_option > "--long-sleep", default=False, dest="long_sleep", > help="Allow long shutdowns when backing up instances", > action="store_true") > > +INPUT_OPT = cli_option("--input", dest="input", default=None, > + help=("input to be passed as stdin" > + " to the repair command"), > + type="string") > + > + SSH_KEY_TYPE_OPT = \ > + cli_option("--ssh-key-type", default=None, > + choices=list(constants.SSHK_ALL), dest="ssh_key_type", > + help="Type of SSH key deployed by Ganeti for cluster > actions") > + > + SSH_KEY_BITS_OPT = \ > + cli_option("--ssh-key-bits", default=None, > + type="int", dest="ssh_key_bits", > + help="Length of SSH keys generated by Ganeti, in bits") > + > + > #: Options provided by all commands > COMMON_OPTS = [DEBUG_OPT, REASON_OPT] > > diff --cc lib/objects.py > index a92eb59,4ea958a..edc4e23 > --- a/lib/objects.py > +++ b/lib/objects.py > @@@ -1683,7 -1653,8 +1683,9 @@@ class Cluster(TaggableObject) > "compression_tools", > "enabled_user_shutdown", > "data_collectors", > + "diagnose_data_collector_filename", > + "ssh_key_type", > + "ssh_key_bits", > ] + _TIMESTAMPS + _UUID > > def UpgradeConfig(self): > diff --cc lib/tools/cfgupgrade.py > index bc091da,14e2e20..d39c8fc > --- a/lib/tools/cfgupgrade.py > +++ b/lib/tools/cfgupgrade.py > @@@ -331,9 -340,15 +340,17 @@@ class CfgUpgrade(object) > cluster["data_collectors"].get( > name, dict(active=True, > interval=constants.MOND_TIME_INTERVAL * 1e6)) > + if "diagnose_data_collector_filename" not in cluster: > + cluster["diagnose_data_collector_filename"] = "" > > + # These parameters are set to pre-2.16 default values, which > + # differ from post-2.16 default values > + if "ssh_key_type" not in cluster: > + cluster["ssh_key_type"] = constants.SSHK_DSA > + > + if "ssh_key_bits" not in cluster: > + cluster["ssh_key_bits"] = 1024 > + > @OrFail("Upgrading groups") > def UpgradeGroups(self): > cl_ipolicy = self.config_data["cluster"].get("ipolicy") > @@@ -710,25 -718,42 +729,57 @@@ > > # DOWNGRADE ------------------------------------------------------------ > > + @OrFail("Removing SSH parameters") > + def DowngradeSshKeyParams(self): > + """Removes the SSH key type and bits parameters from the config. > + > + Also fails if these have been changed from values appropriate in lower > + Ganeti versions. > + > + """ > + # pylint: disable=E1103 > + # Because config_data is a dictionary which has the get method. > + cluster = self.config_data.get("cluster", None) > + if cluster is None: > + raise Error("Can't find the cluster entry in the configuration") > + > + def _FetchAndDelete(key): > + val = cluster.get(key, None) > + if key in cluster: > + del cluster[key] > + return val > + > + ssh_key_type = _FetchAndDelete("ssh_key_type") > + _FetchAndDelete("ssh_key_bits") > + > + if ssh_key_type is not None and ssh_key_type != "dsa": > + raise Error("The current Ganeti setup is using non-DSA SSH keys, and" > + " versions below 2.16 do not support these. To downgrade," > + " please perform a gnt-cluster renew-crypto using the " > + " --new-ssh-keys and --ssh-key-type=dsa options, > generating" > + " DSA keys that older versions can also use.") > + > def DowngradeAll(self): > + if "maintenance" in self.config_data: > + del self.config_data["maintenance"] > + if "cluster" in self.config_data: > + cluster = self.config_data["cluster"] > + if "diagnose_data_collector_filename" in cluster: > + del cluster["diagnose_data_collector_filename"] > + if "data_collectors" in cluster: > + if constants.DATA_COLLECTOR_DIAGNOSE in cluster["data_collectors"]: > + del cluster["data_collectors"][constants.DATA_COLLECTOR_DIAGNOSE] > + if constants.DATA_COLLECTOR_KVM_R_S_S in cluster["data_collectors"]: > + del cluster["data_collectors"][constants.DATA_COLLECTOR_KVM_R_S_S] > + if "ipolicy" in cluster: > + ipolicy = cluster["ipolicy"] > + if "memory-ratio" in ipolicy: > + del ipolicy["memory-ratio"] > self.config_data["version"] = version.BuildVersion(DOWNGRADE_MAJOR, > DOWNGRADE_MINOR, 0) > - return True > + > + self.DowngradeSshKeyParams() > + return not self.errors > > def _ComposePaths(self): > # We need to keep filenames locally because they might be renamed > between > diff --cc src/Ganeti/Config.hs > index d20e128,379df93..8d9e1b8 > --- a/src/Ganeti/Config.hs > +++ b/src/Ganeti/Config.hs > @@@ -83,11 -82,12 +83,13 @@@ module Ganeti.Confi > , instNodes > ) where > > -import Control.Applicative > +import Prelude () > +import Ganeti.Prelude > + > import Control.Arrow ((&&&)) > -import Control.Monad > -import Control.Monad.State > +import Control.Monad (liftM) > + import qualified Data.ByteString as BS > + import qualified Data.ByteString.UTF8 as UTF8 > import qualified Data.Foldable as F > import Data.List (foldl', nub) > import Data.Maybe (fromMaybe) > @@@ -191,7 -190,7 +193,7 @@@ getMasterNodes cfg > > -- | Get the list of master candidates, /not including/ the master itself. > getMasterCandidates :: ConfigData -> [Node] > --getMasterCandidates cfg = > ++getMasterCandidates cfg = > filter ((==) NRCandidate . getNodeRole cfg) . F.toList . configNodes $ cfg > > -- | Get the list of master candidates, /including/ the master. > diff --cc src/Ganeti/DataCollectors.hs > index fa35c62,33ad9cb..3c1146d > --- a/src/Ganeti/DataCollectors.hs > +++ b/src/Ganeti/DataCollectors.hs > @@@ -34,13 -34,11 +34,14 @@@ SOFTWARE, EVEN IF ADVISED OF THE POSSIB > > module Ganeti.DataCollectors( collectors ) where > > +import Prelude () > +import Ganeti.Prelude > + > + import qualified Data.ByteString.UTF8 as UTF8 > import Data.Map (findWithDefault) > -import Data.Monoid (mempty) > > import qualified Ganeti.DataCollectors.CPUload as CPUload > +import qualified Ganeti.DataCollectors.Diagnose as Diagnose > import qualified Ganeti.DataCollectors.Diskstats as Diskstats > import qualified Ganeti.DataCollectors.Drbd as Drbd > import qualified Ganeti.DataCollectors.InstStatus as InstStatus > diff --cc src/Ganeti/JSON.hs > index 6ce0f62,823dc31..e23ce57 > --- a/src/Ganeti/JSON.hs > +++ b/src/Ganeti/JSON.hs > @@@ -85,8 -85,10 +85,10 @@@ module Ganeti.JSO > > import Control.Applicative > import Control.DeepSeq > -import Control.Monad.Error.Class > +import Control.Monad.Error.Class (MonadError(..)) > import Control.Monad.Writer > + import qualified Data.ByteString as BS > + import qualified Data.ByteString.UTF8 as UTF8 > import qualified Data.Foldable as F > import qualified Data.Text as T > import qualified Data.Traversable as F > diff --cc src/Ganeti/MaintD/CleanupIncidents.hs > index 1347f04,0000000..f8aaf92 > mode 100644,000000..100644 > --- a/src/Ganeti/MaintD/CleanupIncidents.hs > +++ b/src/Ganeti/MaintD/CleanupIncidents.hs > @@@ -1,86 -1,0 +1,87 @@@ > +{-| Incident clean up in the maintenance daemon. > + > +This module implements the clean up of events that are finished, > +and acknowledged as such by the user. > + > +-} > + > +{- > + > +Copyright (C) 2015 Google Inc. > +All rights reserved. > + > +Redistribution and use in source and binary forms, with or without > +modification, are permitted provided that the following conditions are > +met: > + > +1. Redistributions of source code must retain the above copyright notice, > +this list of conditions and the following disclaimer. > + > +2. Redistributions in binary form must reproduce the above copyright > +notice, this list of conditions and the following disclaimer in the > +documentation and/or other materials provided with the distribution. > + > +THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS > +IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED > +TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR > +PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR > +CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, > +EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, > +PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR > +PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF > +LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING > +NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS > +SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. > + > +-} > + > +module Ganeti.MaintD.CleanupIncidents > + ( cleanupIncidents > + ) where > + > +import Control.Arrow ((&&&)) > +import Control.Monad (unless) > +import Control.Monad.IO.Class (liftIO) > ++import qualified Data.ByteString.UTF8 as UTF8 > +import Data.IORef (IORef) > + > +import Ganeti.BasicTypes (ResultT, mkResultT) > +import qualified Ganeti.HTools.Container as Container > +import qualified Ganeti.HTools.Node as Node > +import Ganeti.Logging.Lifted > +import Ganeti.MaintD.MemoryState (MemoryState, getIncidents, rmIncident) > +import Ganeti.Objects.Maintenance (Incident(..), RepairStatus(..)) > +import Ganeti.Utils (logAndBad) > + > +-- | Remove a single incident, provided the corresponding tag > +-- is no longer present. > +cleanupIncident :: IORef MemoryState > + -> Node.List > + -> Incident > + -> ResultT String IO () > +cleanupIncident memstate nl incident = do > + let location = incidentNode incident > + uuid = incidentUuid incident > + tag = incidentTag incident > + nodes = filter ((==) location . Node.name) $ Container.elems nl > + case nodes of > + [] -> do > + logInfo $ "No node any more with name " ++ location > - ++ "; will forget event " ++ uuid > - liftIO $ rmIncident memstate uuid > ++ ++ "; will forget event " ++ UTF8.toString uuid > ++ liftIO . rmIncident memstate $ UTF8.toString uuid > + [nd] -> unless (tag `elem` Node.nTags nd) $ do > + logInfo $ "Tag " ++ tag ++ " removed on " ++ location > - ++ "; will forget event " ++ uuid > - liftIO $ rmIncident memstate uuid > ++ ++ "; will forget event " ++ UTF8.toString uuid > ++ liftIO . rmIncident memstate $ UTF8.toString uuid > + _ -> mkResultT . logAndBad > + $ "Found More than one node with name " ++ location > + > +-- | Remove all incidents from the record that are in a final state > +-- and additionally the node tag for that incident has been removed. > +cleanupIncidents :: IORef MemoryState -> Node.List -> ResultT String IO () > +cleanupIncidents memstate nl = do > + incidents <- getIncidents memstate > + let finalized = filter ((> RSPending) . incidentRepairStatus) incidents > + logDebug . (++) "Finalized incidents " . show > + $ map (incidentNode &&& incidentUuid) finalized > + mapM_ (cleanupIncident memstate nl) finalized > diff --cc src/Ganeti/MaintD/CollectIncidents.hs > index ece48bc,0000000..ba31569 > mode 100644,000000..100644 > --- a/src/Ganeti/MaintD/CollectIncidents.hs > +++ b/src/Ganeti/MaintD/CollectIncidents.hs > @@@ -1,129 -1,0 +1,130 @@@ > +{-| Discovery of incidents by the maintenance daemon. > + > +This module implements the querying of all monitoring > +daemons for the value of the node-status data collector. > +Any new incident gets registered. > + > +-} > + > +{- > + > +Copyright (C) 2015 Google Inc. > +All rights reserved. > + > +Redistribution and use in source and binary forms, with or without > +modification, are permitted provided that the following conditions are > +met: > + > +1. Redistributions of source code must retain the above copyright notice, > +this list of conditions and the following disclaimer. > + > +2. Redistributions in binary form must reproduce the above copyright > +notice, this list of conditions and the following disclaimer in the > +documentation and/or other materials provided with the distribution. > + > +THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS > +IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED > +TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR > +PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR > +CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, > +EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, > +PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR > +PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF > +LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING > +NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS > +SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. > + > +-} > + > +module Ganeti.MaintD.CollectIncidents > + ( collectIncidents > + ) where > + > +import Control.Applicative (liftA2) > +import Control.Monad (unless) > +import Control.Monad.IO.Class (liftIO) > ++import qualified Data.ByteString.UTF8 as UTF8 > +import Data.IORef (IORef) > +import Network.Curl > +import System.Time (getClockTime) > +import qualified Text.JSON as J > + > +import Ganeti.BasicTypes (ResultT) > +import qualified Ganeti.Constants as C > +import qualified Ganeti.DataCollectors.Diagnose as D > +import Ganeti.DataCollectors.Types (getCategoryName) > +import qualified Ganeti.HTools.Container as Container > +import qualified Ganeti.HTools.Node as Node > +import Ganeti.Logging.Lifted > +import Ganeti.MaintD.MemoryState (MemoryState, getIncidents, updateIncident) > +import Ganeti.Objects.Maintenance > +import Ganeti.Utils (newUUID) > + > +-- | Query a node, unless it is offline, and return > +-- the paylod of the report, if available. For offline > +-- nodes return nothing. > +queryStatus :: Node.Node -> IO (Maybe J.JSValue) > +queryStatus node = do > + let name = Node.name node > + let url = name ++ ":" ++ show C.defaultMondPort > + ++ "/1/report/" ++ maybe "default" getCategoryName D.dcCategory > + ++ "/" ++ D.dcName > + if Node.offline node > + then do > + logDebug $ "Not asking " ++ name ++ "; it is offline" > + return Nothing > + else do > + (code, body) <- liftIO $ curlGetString url [] > + case code of > + CurlOK -> > + case J.decode body of > + J.Ok r -> return $ Just r > + _ -> return Nothing > + _ -> do > + logWarning $ "Failed to contact " ++ name > + return Nothing > + > +-- | Update the status of one node. > +updateNode :: IORef MemoryState -> Node.Node -> ResultT String IO () > +updateNode memstate node = do > + let name = Node.name node > + logDebug $ "Inspecting " ++ name > + report <- liftIO $ queryStatus node > + case report of > + Just (J.JSObject obj) > + | Just orig@(J.JSObject origobj) <- lookup "data" $ J.fromJSObject > obj, > + Just s <- lookup "status" $ J.fromJSObject origobj, > + J.Ok state <- J.readJSON s, > + state /= RANoop -> do > + let origs = J.encode orig > + logDebug $ "Relevant event on " ++ name ++ ": " ++ origs > + incidents <- getIncidents memstate > + unless (any (liftA2 (&&) > + ((==) name . incidentNode) > + ((==) orig . incidentOriginal)) incidents) $ do > + logInfo $ "Registering new incident on " ++ name ++ ": " ++ > origs > + uuid <- liftIO newUUID > + now <- liftIO getClockTime > + let tag = C.maintdSuccessTagPrefix ++ uuid > + incident = Incident { incidentOriginal = orig > + , incidentAction = state > + , incidentRepairStatus = RSNoted > + , incidentJobs = [] > + , incidentNode = name > + , incidentTag = tag > - , incidentUuid = uuid > ++ , incidentUuid = UTF8.fromString uuid > + , incidentCtime = now > + , incidentMtime = now > + , incidentSerial = 1 > + } > + liftIO $ updateIncident memstate incident > + _ -> return () > + > + > +-- | Query all MonDs for updates on the node-status. > +collectIncidents :: IORef MemoryState -> Node.List -> ResultT String IO () > +collectIncidents memstate nl = do > + _ <- getIncidents memstate -- always update the memory state, > + -- even if we do not observe anything > + logDebug "Querying all nodes for incidents" > + mapM_ (updateNode memstate) $ Container.elems nl > diff --cc src/Ganeti/MaintD/FailIncident.hs > index 4f9a7b8,0000000..917cb78 > mode 100644,000000..100644 > --- a/src/Ganeti/MaintD/FailIncident.hs > +++ b/src/Ganeti/MaintD/FailIncident.hs > @@@ -1,92 -1,0 +1,93 @@@ > +{-| Incident failing in the maintenace daemon > + > +This module implements the treatment of an incident, once > +a job failed. > + > +-} > + > +{- > + > +Copyright (C) 2015 Google Inc. > +All rights reserved. > + > +Redistribution and use in source and binary forms, with or without > +modification, are permitted provided that the following conditions are > +met: > + > +1. Redistributions of source code must retain the above copyright notice, > +this list of conditions and the following disclaimer. > + > +2. Redistributions in binary form must reproduce the above copyright > +notice, this list of conditions and the following disclaimer in the > +documentation and/or other materials provided with the distribution. > + > +THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS > +IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED > +TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR > +PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR > +CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, > +EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, > +PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR > +PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF > +LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING > +NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS > +SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. > + > +-} > + > +module Ganeti.MaintD.FailIncident > + ( failIncident > + ) where > + > +import Control.Exception.Lifted (bracket) > +import Control.Lens.Setter (over) > +import Control.Monad (liftM, when) > +import Control.Monad.IO.Class (liftIO) > ++import qualified Data.ByteString.UTF8 as UTF8 > +import Data.IORef (IORef) > +import System.IO.Error (tryIOError) > + > +import Ganeti.BasicTypes (ResultT, mkResultT, GenericResult(..)) > +import qualified Ganeti.Constants as C > +import Ganeti.JQueue (currentTimestamp) > +import Ganeti.Jobs (execJobsWaitOkJid) > +import Ganeti.Logging.Lifted > +import qualified Ganeti.Luxi as L > +import Ganeti.MaintD.MemoryState (MemoryState, getIncidents, updateIncident) > +import Ganeti.MaintD.Utils (annotateOpCode) > +import Ganeti.Objects.Lens (incidentJobsL) > +import Ganeti.Objects.Maintenance (Incident(..), RepairStatus(..)) > +import Ganeti.OpCodes (OpCode(..)) > +import qualified Ganeti.Path as Path > +import Ganeti.Types (JobId, fromJobId, TagKind(..)) > + > +-- | Mark an incident as failed. > +markAsFailed :: IORef MemoryState -> Incident -> ResultT String IO () > +markAsFailed memstate incident = do > + let uuid = incidentUuid incident > - newtag = C.maintdFailureTagPrefix ++ uuid > - logInfo $ "Marking incident " ++ uuid ++ " as failed" > ++ newtag = C.maintdFailureTagPrefix ++ UTF8.toString uuid > ++ logInfo $ "Marking incident " ++ UTF8.toString uuid ++ " as failed" > + now <- liftIO currentTimestamp > + luxiSocket <- liftIO Path.defaultQuerySocket > + jids <- bracket (mkResultT . liftM (either (Bad . show) Ok) > + . tryIOError $ L.getLuxiClient luxiSocket) > + (liftIO . L.closeClient) > + (mkResultT . execJobsWaitOkJid > + [[ annotateOpCode "marking incident handling as > failed" now > + . OpTagsSet TagKindNode [ newtag ] > + . Just $ incidentNode incident ]]) > + let incident' = over incidentJobsL (++ jids) > + $ incident { incidentRepairStatus = RSFailed > + , incidentTag = newtag > + } > + liftIO $ updateIncident memstate incident' > + > +-- | Mark the incident, if any, belonging to the given job as > +-- failed after having tagged it appropriately. > +failIncident :: IORef MemoryState -> JobId -> ResultT String IO () > +failIncident memstate jid = do > + incidents <- getIncidents memstate > + let affected = filter (elem jid . incidentJobs) incidents > + when (null affected) . logInfo > + $ "Job " ++ show (fromJobId jid) ++ " does not belong to an incident" > + mapM_ (markAsFailed memstate) affected > diff --cc src/Ganeti/MaintD/HandleIncidents.hs > index 600707d,0000000..c6da8fd > mode 100644,000000..100644 > --- a/src/Ganeti/MaintD/HandleIncidents.hs > +++ b/src/Ganeti/MaintD/HandleIncidents.hs > @@@ -1,297 -1,0 +1,298 @@@ > +{-| Incident handling in the maintenance daemon. > + > +This module implements the submission of actions for ongoing > +repair events reported by the node-status data collector. > + > +-} > + > +{- > + > +Copyright (C) 2015 Google Inc. > +All rights reserved. > + > +Redistribution and use in source and binary forms, with or without > +modification, are permitted provided that the following conditions are > +met: > + > +1. Redistributions of source code must retain the above copyright notice, > +this list of conditions and the following disclaimer. > + > +2. Redistributions in binary form must reproduce the above copyright > +notice, this list of conditions and the following disclaimer in the > +documentation and/or other materials provided with the distribution. > + > +THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS > +IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED > +TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR > +PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR > +CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, > +EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, > +PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR > +PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF > +LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING > +NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS > +SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. > + > +-} > + > +module Ganeti.MaintD.HandleIncidents > + ( handleIncidents > + ) where > + > +import Control.Arrow ((&&&)) > +import Control.Exception.Lifted (bracket) > +import Control.Lens.Setter (over) > +import Control.Monad (foldM) > +import Control.Monad.IO.Class (liftIO) > ++import qualified Data.ByteString.UTF8 as UTF8 > +import Data.Function (on) > +import Data.IORef (IORef) > +import qualified Data.Map as Map > +import qualified Data.Set as Set > +import qualified Text.JSON as J > + > +import Ganeti.BasicTypes ( GenericResult(..), ResultT, mkResultT, Down(..)) > +import qualified Ganeti.Constants as C > +import Ganeti.HTools.AlgorithmParams (AlgorithmOptions(..), defaultOptions) > +import Ganeti.HTools.Cluster.Evacuate (tryNodeEvac, EvacSolution(..)) > +import qualified Ganeti.HTools.Container as Container > +import qualified Ganeti.HTools.Group as Group > +import qualified Ganeti.HTools.Instance as Instance > +import qualified Ganeti.HTools.Node as Node > +import Ganeti.HTools.Types (Idx) > +import Ganeti.JQueue (currentTimestamp) > +import Ganeti.Jobs (execJobsWaitOkJid, submitJobs, forceFailover) > +import Ganeti.Logging.Lifted > +import qualified Ganeti.Luxi as L > +import Ganeti.MaintD.MemoryState ( MemoryState, getIncidents, rmIncident > + , updateIncident, appendJobs) > +import Ganeti.MaintD.Utils (annotateOpCode, getRepairCommand) > +import Ganeti.Objects.Lens (incidentJobsL) > +import Ganeti.Objects.Maintenance ( RepairStatus(..), RepairAction(..) > + , Incident(..)) > +import Ganeti.OpCodes (OpCode(..), MetaOpCode) > +import qualified Ganeti.Path as Path > +import Ganeti.Types ( cTimeOf, uuidOf, mkNonEmpty, fromJobId > + , EvacMode(..), TagKind(..)) > +import Ganeti.Utils (maxBy, logAndBad) > + > +-- | Given two incidents, choose the more severe one; for equally severe > +-- ones the older (by creation timestamp). > +moreSevereIncident :: Incident -> Incident -> Incident > +moreSevereIncident = maxBy (compare `on` incidentAction &&& (Down . > cTimeOf)) > + > +-- | From a given list of incidents, return, for each node, > +-- the one with the most severe action. > +rankIncidents :: [Incident] -> Map.Map String Incident > +rankIncidents = foldl (\m i -> Map.insertWith moreSevereIncident > + (incidentNode i) i m) Map.empty > + > +-- | Generate a job to drain a given node. > +drainJob :: String -> ResultT String IO [ MetaOpCode ] > +drainJob name = do > + name' <- mkNonEmpty name > + now <- liftIO currentTimestamp > + return $ map (annotateOpCode ("Draining " ++ name) now) > + [ OpNodeSetParams { opNodeName = name' > + , opNodeUuid = Nothing > + , opForce = True > + , opHvState = Nothing > + , opDiskState = Nothing > + , opMasterCandidate = Nothing > + , opOffline = Nothing > + , opDrained = Just True > + , opAutoPromote = False > + , opMasterCapable = Nothing > + , opVmCapable = Nothing > + , opSecondaryIp = Nothing > + , opgenericNdParams = Nothing > + , opPowered = Nothing > + } ] > + > +-- | Submit and register the next job for a node evacuation. > +handleEvacuation :: L.Client -- ^ Luxi client to use > + -> IORef MemoryState -- ^ memory state of the daemon > + -> (Group.List, Node.List, Instance.List) -- ^ cluster > + -> Idx -- ^ index of the node to evacuate > + -> Bool -- ^ whether to try migrations > + -> Set.Set Int -- ^ allowed nodes for evacuation > + -> Incident -- ^ the incident > + -> ResultT String IO (Set.Set Int) -- ^ nodes still > available > +handleEvacuation client memst (gl, nl, il) ndx migrate freenodes incident = > do > + let node = Container.find ndx nl > + name = Node.name node > + fNdNames = map (Node.name . flip Container.find nl) $ Set.elems > freenodes > + evacOpts = defaultOptions { algEvacMode = True > + , algIgnoreSoftErrors = True > + , algRestrictToNodes = Just fNdNames > + } > + evacFun = tryNodeEvac evacOpts gl nl il > + migrateFun = if migrate then id else forceFailover > + annotateFun = annotateOpCode $ "Evacuating " ++ name > + pendingIncident = incident { incidentRepairStatus = RSPending } > + updateJobs jids_r = case jids_r of > + Ok jids -> do > + let incident' = over incidentJobsL (++ jids) pendingIncident > + liftIO $ updateIncident memst incident' > + liftIO $ appendJobs memst jids > + logDebug $ "Jobs submitted: " ++ show (map fromJobId jids) > + Bad e -> mkResultT . logAndBad > + $ "Failure evacuating " ++ name ++ ": " ++ e > + logInstName i = logInfo $ "Evacuating instance " > + ++ Instance.name (Container.find i il) > + ++ " from " ++ name > + execSol sol = do > + now <- liftIO currentTimestamp > + let jobs = map (map (annotateFun now . migrateFun)) $ esOpCodes sol > + jids <- liftIO $ submitJobs jobs client > + updateJobs jids > + let touched = esMoved sol >>= \(_, _, nidxs) -> nidxs > + return $ freenodes Set.\\ Set.fromList touched > + logDebug $ "Handling evacuation of " ++ name > + case () of _ | not $ Node.offline node -> do > + logDebug $ "Draining node " ++ name > + job <- drainJob name > + jids <- liftIO $ submitJobs [job] client > + updateJobs jids > + return freenodes > + | i:_ <- Node.pList node -> do > + logInstName i > + (_, _, sol) <- mkResultT . return $ evacFun > ChangePrimary [i] > + execSol sol > + | i:_ <- Node.sList node -> do > + logInstName i > + (_, _, sol) <- mkResultT . return > + $ evacFun ChangeSecondary [i] > + execSol sol > + | otherwise -> do > + logDebug $ "Finished evacuation of " ++ name > + now <- liftIO currentTimestamp > + jids <- mkResultT $ execJobsWaitOkJid > + [[ annotateFun now > + . OpTagsSet TagKindNode [ incidentTag > incident ] > + $ Just name]] client > + let incident' = over incidentJobsL (++ jids) > + $ incident { incidentRepairStatus = > + RSCompleted } > + liftIO $ updateIncident memst incident' > + liftIO $ appendJobs memst jids > + return freenodes > + > +-- | Submit the next action for a live-repair incident. > +handleLiveRepairs :: L.Client -- ^ Luxi client to use > + -> IORef MemoryState -- ^ memory state of the daemon > + -> Idx -- ^ the node to handle the event on > + -> Set.Set Int -- ^ unaffected nodes > + -> Incident -- ^ the incident > + -> ResultT String IO (Set.Set Int) -- ^ nodes still > available > +handleLiveRepairs client memst ndx freenodes incident = do > + let maybeCmd = getRepairCommand incident > + uuid = incidentUuid incident > + name = incidentNode incident > + now <- liftIO currentTimestamp > + logDebug $ "Handling requested command " ++ show maybeCmd ++ " on " ++ > name > + case () of > + _ | null $ incidentJobs incident, > + Just cmd <- maybeCmd, > + cmd /= "" -> do > + logDebug "Submitting repair command job" > + name' <- mkNonEmpty name > + cmd' <- mkNonEmpty cmd > + orig' <- mkNonEmpty . J.encode $ incidentOriginal incident > + jids_r <- liftIO $ submitJobs > + [[ annotateOpCode "repair command requested by > node" now > + OpRepairCommand { opNodeName = name' > + , opRepairCommand = cmd' > + , opInput = Just orig' > + } ]] client > + case jids_r of > + Ok jids -> do > + let incident' = over incidentJobsL (++ jids) incident > + liftIO $ updateIncident memst incident' > + liftIO $ appendJobs memst jids > + logDebug $ "Jobs submitted: " ++ show (map fromJobId jids) > + Bad e -> mkResultT . logAndBad > + $ "Failure requesting command " ++ cmd ++ " on " ++ name > + ++ ": " ++ e > + | null $ incidentJobs incident -> do > - logInfo $ "Marking incident " ++ uuid ++ " as failed;" > ++ logInfo $ "Marking incident " ++ UTF8.toString uuid ++ " as > failed;" > + ++ " command for live repair not specified" > - let newtag = C.maintdFailureTagPrefix ++ uuid > ++ let newtag = C.maintdFailureTagPrefix ++ UTF8.toString uuid > + jids <- mkResultT $ execJobsWaitOkJid > + [[ annotateOpCode "marking incident as ill specified" > now > + . OpTagsSet TagKindNode [ newtag ] > + $ Just name ]] client > + let incident' = over incidentJobsL (++ jids) > + $ incident { incidentRepairStatus = RSFailed > + , incidentTag = newtag > + } > + liftIO $ updateIncident memst incident' > + liftIO $ appendJobs memst jids > + | otherwise -> do > + logDebug "Command execution has succeeded" > + jids <- mkResultT $ execJobsWaitOkJid > + [[ annotateOpCode "repair command requested by node" > now > + . OpTagsSet TagKindNode [ incidentTag incident ] > + $ Just name ]] client > + let incident' = over incidentJobsL (++ jids) > + $ incident { incidentRepairStatus = RSCompleted > } > + liftIO $ updateIncident memst incident' > + liftIO $ appendJobs memst jids > + return $ Set.delete ndx freenodes > + > + > +-- | Submit the next actions for a single incident, given the unaffected > nodes; > +-- register all submitted jobs and return the new set of unaffected nodes. > +handleIncident :: L.Client > + -> IORef MemoryState > + -> (Group.List, Node.List, Instance.List) > + -> Set.Set Int > + -> (String, Incident) > + -> ResultT String IO (Set.Set Int) > +handleIncident client memstate (gl, nl, il) freeNodes (name, incident) = do > + ndx <- case Container.keys $ Container.filter ((==) name . Node.name) nl > of > + [ndx] -> return ndx > + [] -> do > + logWarning $ "Node " ++ name ++ " no longer in the cluster;" > + ++ " clearing incident " ++ show incident > + liftIO . rmIncident memstate $ uuidOf incident > + fail $ "node " ++ name ++ " left the cluster" > + ndxs -> do > + logWarning $ "Abmigious node name " ++ name > + ++ "; could refer to indices " ++ show ndxs > + fail $ "ambigious name " ++ name > + case incidentAction incident of > + RANoop -> do > + logDebug $ "Nothing to do for " ++ show incident > + liftIO . rmIncident memstate $ uuidOf incident > + return freeNodes > + RALiveRepair -> > + handleLiveRepairs client memstate ndx freeNodes incident > + RAEvacuate -> > + handleEvacuation client memstate (gl, nl, il) ndx True freeNodes > incident > + RAEvacuateFailover -> > + handleEvacuation client memstate (gl, nl, il) ndx False freeNodes > incident > + > +-- | Submit the jobs necessary for the next maintenance step > +-- for each pending maintenance, i.e., the most radical maintenance > +-- for each node. Return the set of node indices unaffected by these > +-- operations. Also, for each job submitted, register it directly. > +handleIncidents :: IORef MemoryState > + -> (Group.List, Node.List, Instance.List) > + -> ResultT String IO (Set.Set Int) > +handleIncidents memstate (gl, nl, il) = do > + incidents <- getIncidents memstate > + let activeIncidents = filter ((<= RSPending) . incidentRepairStatus) > incidents > + incidentsToHandle = rankIncidents activeIncidents > + incidentNodes = Set.fromList . Container.keys > + $ Container.filter ((`Map.member` incidentsToHandle) . Node.name) nl > + freeNodes = Set.fromList (Container.keys nl) Set.\\ incidentNodes > + if null activeIncidents > + then return freeNodes > + else do > + luxiSocket <- liftIO Path.defaultQuerySocket > + bracket (liftIO $ L.getLuxiClient luxiSocket) > + (liftIO . L.closeClient) > + $ \ client -> > + foldM (handleIncident client memstate (gl, nl, il)) > freeNodes > + $ Map.assocs incidentsToHandle > diff --cc src/Ganeti/Monitoring/Server.hs > index aaad4f4,da78b00..668779b > --- a/src/Ganeti/Monitoring/Server.hs > +++ b/src/Ganeti/Monitoring/Server.hs > @@@ -47,13 -44,16 +47,14 @@@ import Ganeti.Prelud > import Control.Applicative > import Control.DeepSeq (force) > import Control.Exception.Base (evaluate) > -import Control.Monad > +import Control.Monad (void, forever, liftM, foldM, foldM_, mzero) > import Control.Monad.IO.Class > -import Data.ByteString.Char8 (pack, unpack) > +import Data.ByteString.Char8 (unpack) > + import qualified Data.ByteString.UTF8 as UTF8 > import Data.Maybe (fromMaybe) > import Data.List (find) > -import Data.Monoid (mempty) > import qualified Data.Map as Map > import qualified Data.PSQueue as Queue > -import Network.BSD (getServicePortNumber) > import Snap.Core > import Snap.Http.Server > import qualified Text.JSON as J > diff --cc src/Ganeti/Objects.hs > index 52a00fb,1855334..e6078cd > --- a/src/Ganeti/Objects.hs > +++ b/src/Ganeti/Objects.hs > @@@ -103,18 -103,12 +103,19 @@@ module Ganeti.Object > , module Ganeti.PartialParams > , module Ganeti.Objects.Disk > , module Ganeti.Objects.Instance > - ) where > + , module Ganeti.Objects.Maintenance > + , FilledHvStateParams(..) > + , PartialHvStateParams(..) > + , allHvStateParamFields > + , FilledHvState > + , PartialHvState ) where > + > +import Prelude () > +import Ganeti.Prelude > > -import Control.Applicative > import Control.Arrow (first) > import Control.Monad.State > + import qualified Data.ByteString.UTF8 as UTF8 > import Data.List (foldl', intercalate) > import Data.Maybe > import qualified Data.Map as Map > @@@ -690,8 -679,8 +691,10 @@@ $(buildObject "Cluster" "cluster" > , simpleField "compression_tools" [t| [String] > |] > , simpleField "enabled_user_shutdown" [t| Bool > |] > , simpleField "data_collectors" [t| Container DataCollectorConfig > |] > + , defaultField [| [] |] $ simpleField > + "diagnose_data_collector_filename" [t| String > |] > + , simpleField "ssh_key_type" [t| SshKeyType > |] > + , simpleField "ssh_key_bits" [t| Int > |] > ] > ++ timeStampFields > ++ uuidFields > diff --cc src/Ganeti/Objects/Disk.hs > index f3e08da,ca939d1..7e8b962 > --- a/src/Ganeti/Objects/Disk.hs > +++ b/src/Ganeti/Objects/Disk.hs > @@@ -36,9 -36,8 +36,10 @@@ SOFTWARE, EVEN IF ADVISED OF THE POSSIB > > module Ganeti.Objects.Disk where > > -import Control.Applicative ((<*>), (<$>)) > +import Prelude () > +import Ganeti.Prelude > + > + import qualified Data.ByteString.UTF8 as UTF8 > import Data.Char (isAsciiLower, isAsciiUpper, isDigit) > import Data.List (isPrefixOf, isInfixOf) > import Language.Haskell.TH.Syntax > diff --cc src/Ganeti/Objects/Instance.hs > index d25e134,fb35f65..a946b4e > --- a/src/Ganeti/Objects/Instance.hs > +++ b/src/Ganeti/Objects/Instance.hs > @@@ -36,8 -39,8 +39,10 @@@ SOFTWARE, EVEN IF ADVISED OF THE POSSIB > > module Ganeti.Objects.Instance where > > + import qualified Data.ByteString.UTF8 as UTF8 > -import Data.Monoid > ++ > +import Prelude () > +import Ganeti.Prelude > > import Ganeti.JSON (emptyContainer) > import Ganeti.Objects.Nic > diff --cc src/Ganeti/Objects/Maintenance.hs > index 2f0c2f8,0000000..ea6e709 > mode 100644,000000..100644 > --- a/src/Ganeti/Objects/Maintenance.hs > +++ b/src/Ganeti/Objects/Maintenance.hs > @@@ -1,114 -1,0 +1,115 @@@ > +{-# LANGUAGE TemplateHaskell #-} > + > +{-| Implementation of the Ganeti configuration for the maintenance daemon. > + > +-} > + > +{- > + > +Copyright (C) 2015 Google Inc. > +All rights reserved. > + > +Redistribution and use in source and binary forms, with or without > +modification, are permitted provided that the following conditions are > +met: > + > +1. Redistributions of source code must retain the above copyright notice, > +this list of conditions and the following disclaimer. > + > +2. Redistributions in binary form must reproduce the above copyright > +notice, this list of conditions and the following disclaimer in the > +documentation and/or other materials provided with the distribution. > + > +THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS > +IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED > +TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR > +PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR > +CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, > +EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, > +PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR > +PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF > +LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING > +NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS > +SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. > + > +-} > + > +module Ganeti.Objects.Maintenance > + ( MaintenanceData(..) > + , RepairAction(..) > + , RepairStatus(..) > + , Incident(..) > + ) where > + > ++import qualified Data.ByteString.UTF8 as UTF8 > +import qualified Text.JSON as J > + > +import qualified Ganeti.Constants as C > +import Ganeti.THH > +import Ganeti.THH.Field > +import Ganeti.Types > + > +-- | Action to be taken for a certain repair event. Note > +-- that the order is important, as we rely on values higher > +-- in the derived order to be more intrusive actions. > +$(declareLADT ''String "RepairAction" > + [ ("RANoop", "Ok") > + , ("RALiveRepair", "live-repair") > + , ("RAEvacuate", "evacuate") > + , ("RAEvacuateFailover", "evacuate-failover") > + ]) > +$(makeJSONInstance ''RepairAction) > + > +-- | Progress made on the particular repair event. Again we rely > +-- on the order in that everything larger than `RSPending` is finalized > +-- in the sense that no further jobs will be submitted. > +$(declareLADT ''String "RepairStatus" > + [ ("RSNoted", "noted") > + , ("RSPending", "pending") > + , ("RSCanceled", "canceled") > + , ("RSFailed", "failed") > + , ("RSCompleted", "completed") > + ]) > +$(makeJSONInstance ''RepairStatus) > + > +$(buildObject "Incident" "incident" $ > + [ simpleField "original" [t| J.JSValue |] > + , simpleField "action" [t| RepairAction |] > + , defaultField [| [] |] $ simpleField "jobs" [t| [ JobId ] |] > + , simpleField "node" [t| String |] > + , simpleField "repair-status" [t| RepairStatus |] > + , simpleField "tag" [t| String |] > + ] > + ++ uuidFields > + ++ timeStampFields > + ++ serialFields) > + > +instance SerialNoObject Incident where > + serialOf = incidentSerial > + > +instance TimeStampObject Incident where > + cTimeOf = incidentCtime > + mTimeOf = incidentMtime > + > +instance UuidObject Incident where > - uuidOf = incidentUuid > ++ uuidOf = UTF8.toString . incidentUuid > + > +$(buildObject "MaintenanceData" "maint" $ > + [ defaultField [| C.maintdDefaultRoundDelay |] > + $ simpleField "roundDelay" [t| Int |] > + , defaultField [| [] |] $ simpleField "jobs" [t| [ JobId ] |] > + , defaultField [| False |] $ simpleField "balance" [t| Bool |] > + , defaultField [| 0.1 :: Double |] > + $ simpleField "balanceThreshold" [t| Double |] > + , defaultField [| [] |] $ simpleField "evacuated" [t| [ String ] |] > + , defaultField [| [] |] $ simpleField "incidents" [t| [ Incident ] |] > + ] > + ++ timeStampFields > + ++ serialFields) > + > +instance SerialNoObject MaintenanceData where > + serialOf = maintSerial > + > +instance TimeStampObject MaintenanceData where > + cTimeOf = maintCtime > + mTimeOf = maintMtime > diff --cc src/Ganeti/OpParams.hs > index 8176ac6,0a793c0..a965df3 > --- a/src/Ganeti/OpParams.hs > +++ b/src/Ganeti/OpParams.hs > @@@ -299,13 -297,11 +299,15 @@@ module Ganeti.OpParam > , pEnabledUserShutdown > , pAdminStateSource > , pEnabledDataCollectors > + , pMaintdRoundDelay > + , pMaintdEnableBalancing > + , pMaintdBalancingThreshold > , pDataCollectorInterval > + , pDiagnoseDataCollectorFilename > , pNodeSslCerts > - , pSshKeys > + , pSshKeyBits > + , pSshKeyType > + , pRenewSshKeys > , pNodeSetup > , pVerifyClutter > , pLongSleep > diff --cc src/Ganeti/Query/Group.hs > index 9fe246d,45bd81a..66f100e > --- a/src/Ganeti/Query/Group.hs > +++ b/src/Ganeti/Query/Group.hs > @@@ -82,12 -82,7 +82,12 @@@ groupFields > , (FieldDefinition "pinst_list" "InstanceList" QFTOther > "List of primary instances", > FieldConfig (\cfg -> rsNormal . niceSort . mapMaybe instName . fst . > - getGroupInstances cfg . groupUuid), QffNormal) > + getGroupInstances cfg . uuidOf), QffNormal) > + , (FieldDefinition "hv_state" "HypervisorState" QFTOther > + "Custom static hypervisor state", > + FieldSimple (rsNormal . groupHvStateStatic), QffNormal) > + , (FieldDefinition "disk_state" "DiskState" QFTOther "Disk state", > + FieldSimple (rsNormal . groupDiskStateStatic), QffNormal) > ] ++ > map buildNdParamField allNDParamFields ++ > timeStampFields ++ > diff --cc src/Ganeti/Query/Server.hs > index 4a3143c,1b7cfa5..89c8b60 > --- a/src/Ganeti/Query/Server.hs > +++ b/src/Ganeti/Query/Server.hs > @@@ -272,18 -271,10 +273,22 @@@ handleCall _ _ cdata QueryClusterInfo > , ("data_collector_interval", > showJSON . fmap dataCollectorInterval > $ clusterDataCollectors cluster) > + , ("diagnose_data_collector_filename", > + showJSON $ clusterDiagnoseDataCollectorFilename cluster) > + , ("maint_round_delay", > + showJSON . maintRoundDelay $ configMaintenance cdata) > + , ("maint_balance", > + showJSON . maintBalance $ configMaintenance cdata) > + , ("maint_balance_threshold", > + showJSON . maintBalanceThreshold $ configMaintenance cdata) > + , ("hv_state", > + showJSON $ clusterHvStateStatic cluster) > + , ("disk_state", > + showJSON $ clusterDiskStateStatic cluster) > + , ("modify_ssh_setup", > + showJSON $ clusterModifySshSetup cluster) > + , ("ssh_key_type", showJSON $ clusterSshKeyType cluster) > + , ("ssh_key_bits", showJSON $ clusterSshKeyBits cluster) > ] > > in case master of > diff --cc src/Ganeti/UDSServer.hs > index 36f0ac1,868c4e9..35a493f > --- a/src/Ganeti/UDSServer.hs > +++ b/src/Ganeti/UDSServer.hs > @@@ -79,11 -77,9 +79,9 @@@ import Control.Monad.Trans.Contro > import Control.Exception (catch) > import Control.Monad > import qualified Data.ByteString as B > - import qualified Data.ByteString.Lazy as BL > import qualified Data.ByteString.UTF8 as UTF8 > - import qualified Data.ByteString.Lazy.UTF8 as UTF8L > import Data.IORef > -import Data.List > +import Data.List (isInfixOf) > import Data.Word (Word8) > import qualified Network.Socket as S > import System.Directory (removeFile) > diff --cc src/Ganeti/WConfd/ConfigModifications.hs > index 13a7df2,aead178..ad656f1 > --- a/src/Ganeti/WConfd/ConfigModifications.hs > +++ b/src/Ganeti/WConfd/ConfigModifications.hs > @@@ -40,16 -39,14 +40,17 @@@ SOFTWARE, EVEN IF ADVISED OF THE POSSIB > > module Ganeti.WConfd.ConfigModifications where > > -import Control.Applicative ((<$>)) > +import Prelude () > +import Ganeti.Prelude > + > import Control.Lens (_2) > import Control.Lens.Getter ((^.)) > -import Control.Lens.Setter ((.~), (%~)) > +import Control.Lens.Setter (Setter, (.~), (%~), (+~), over) > + import qualified Data.ByteString.UTF8 as UTF8 > > This breaks the import order up a bit as this line lands in the middle of Lens imports - mind reordering it? > import Control.Lens.Traversal (mapMOf) > -import Control.Monad (unless, when, forM_, foldM, liftM2) > -import Control.Monad.Error (throwError, MonadError) > +import Control.Lens.Type (Simple) > +import Control.Monad (unless, when, forM_, foldM, liftM, liftM2) > +import Control.Monad.Error.Class (throwError, MonadError) > import Control.Monad.IO.Class (liftIO) > import Control.Monad.Trans.State (StateT, get, put, modify, > runStateT, execStateT) > @@@ -121,7 -117,7 +122,7 @@@ getAllIDs cs > > instKeys = keysFromC . configInstances . csConfigData $ cs > nodeKeys = keysFromC . configNodes . csConfigData $ cs > -- > ++ > instValues = map uuidOf . valuesFromC > . configInstances . csConfigData $ cs > nodeValues = map uuidOf . valuesFromC . configNodes . csConfigData $ > cs > @@@ -655,77 -669,9 +674,77 @@@ updateDisk disk = d > (^. dL) (\cs -> do > dC <- toError $ replaceIn ct disk (cs ^. dL) > return ((serialOf disk + 1, ct), (dL .~ dC) cs))) > - . T.releaseDRBDMinors $ uuidOf disk > + . T.releaseDRBDMinors . UTF8.fromString $ uuidOf disk > return . MaybeForJSON $ fmap (_2 %~ TimeAsDoubleJSON) r > > +-- | Set a particular value and bump serial in the hosting > +-- structure. Arguments are a setter to focus on the part > +-- of the configuration that gets serial-bumped, and a modification > +-- of that part. The function will do the change and bump the serial > +-- in the WConfdMonad temporarily acquiring the configuration lock. > +-- Return True if that succeeded and False if the configuration lock > +-- was not available; no change is done in the latter case. > +changeAndBump :: (SerialNoObjectL a, TimeStampObjectL a) > + => Simple Setter ConfigState a > + -> (a -> a) > + -> WConfdMonad Bool > +changeAndBump focus change = do > + now <- liftIO getClockTime > + let operation = over focus $ (serialL +~ 1) . (mTimeL .~ now) . change > + liftM isJust $ modifyConfigWithLock > + (\_ cs -> return . operation $ cs) > + (return ()) > + > +-- | Change and bump part of the maintenance part of the configuration. > +changeAndBumpMaint :: (MaintenanceData -> MaintenanceData) -> WConfdMonad > Bool > +changeAndBumpMaint = changeAndBump $ csConfigDataL . configMaintenanceL > + > +-- | Set the maintenance intervall. > +setMaintdRoundDelay :: Int -> WConfdMonad Bool > +setMaintdRoundDelay delay = changeAndBumpMaint $ maintRoundDelayL .~ delay > + > +-- | Clear the list of current maintenance jobs. > +clearMaintdJobs :: WConfdMonad Bool > +clearMaintdJobs = changeAndBumpMaint $ maintJobsL .~ [] > + > +-- | Append new jobs to the list of current maintenace jobs, if > +-- not alread present. > +appendMaintdJobs :: [JobId] -> WConfdMonad Bool > +appendMaintdJobs jobs = changeAndBumpMaint . over maintJobsL > + $ ordNub . (++ jobs) > + > +-- | Set the autobalance flag. > +setMaintdBalance :: Bool -> WConfdMonad Bool > +setMaintdBalance value = changeAndBumpMaint $ maintBalanceL .~ value > + > +-- | Set the auto-balance threshold. > +setMaintdBalanceThreshold :: Double -> WConfdMonad Bool > +setMaintdBalanceThreshold value = changeAndBumpMaint > + $ maintBalanceThresholdL .~ value > + > +-- | Add a name to the list of recently evacuated instances. > +addMaintdEvacuated :: [String] -> WConfdMonad Bool > +addMaintdEvacuated names = changeAndBumpMaint . over maintEvacuatedL > + $ ordNub . (++ names) > + > +-- | Remove a name from the list of recently evacuated instances. > +rmMaintdEvacuated :: String -> WConfdMonad Bool > +rmMaintdEvacuated name = changeAndBumpMaint . over maintEvacuatedL > + $ filter (/= name) > + > +-- | Update an incident to the list of known incidents; if the incident, > +-- as identified by the UUID, is not present, it is added. > +updateMaintdIncident :: Incident -> WConfdMonad Bool > +updateMaintdIncident incident = > + changeAndBumpMaint . over maintIncidentsL > + $ (incident :) . filter ((/= uuidOf incident) . uuidOf) > + > +-- | Remove an incident from the list of known incidents. > +rmMaintdIncident :: String -> WConfdMonad Bool > +rmMaintdIncident uuid = > + changeAndBumpMaint . over maintIncidentsL > + $ filter ((/= uuid) . uuidOf) > + > -- * The list of functions exported to RPC. > > exportedFunctions :: [Name] > diff --cc src/Ganeti/WConfd/ConfigVerify.hs > index a6d537b,246b627..118d775 > --- a/src/Ganeti/WConfd/ConfigVerify.hs > +++ b/src/Ganeti/WConfd/ConfigVerify.hs > @@@ -39,8 -39,8 +39,9 @@@ module Ganeti.WConfd.ConfigVerif > , verifyConfigErr > ) where > > -import Control.Monad.Error > +import Control.Monad (forM_) > +import Control.Monad.Error.Class (MonadError(..)) > + import qualified Data.ByteString.UTF8 as UTF8 > import qualified Data.Foldable as F > import qualified Data.Map as M > import qualified Data.Set as S > diff --cc src/Ganeti/WConfd/TempRes.hs > index 5aa6343,565fae2..9c0220d > --- a/src/Ganeti/WConfd/TempRes.hs > +++ b/src/Ganeti/WConfd/TempRes.hs > @@@ -73,13 -73,13 +73,15 @@@ module Ganeti.WConfd.TempRe > , reserved > ) where > > -import Control.Applicative > +import Prelude () > +import Ganeti.Prelude > + > import Control.Lens.At > -import Control.Monad.Error > +import Control.Monad.Error.Class (MonadError(..)) > import Control.Monad.State > import Control.Monad.Trans.Maybe > + import qualified Data.ByteString as BS > + import qualified Data.ByteString.UTF8 as UTF8 > import qualified Data.Foldable as F > import Data.Maybe > import Data.Map (Map) > diff --cc test/hs/Test/Ganeti/JQScheduler.hs > index 3d79877,77eb2ac..04a6287 > --- a/test/hs/Test/Ganeti/JQScheduler.hs > +++ b/test/hs/Test/Ganeti/JQScheduler.hs > @@@ -37,10 -37,9 +37,11 @@@ SOFTWARE, EVEN IF ADVISED OF THE POSSIB > > module Test.Ganeti.JQScheduler (testJQScheduler) where > > -import Control.Applicative > +import Prelude () > +import Ganeti.Prelude > + > import Control.Lens ((&), (.~), _2) > + import qualified Data.ByteString.UTF8 as UTF8 > import Data.List (inits) > import Data.Maybe > import qualified Data.Map as Map > diff --cc test/hs/Test/Ganeti/Objects.hs > index 6d8deef,90967ce..8e97443 > --- a/test/hs/Test/Ganeti/Objects.hs > +++ b/test/hs/Test/Ganeti/Objects.hs > @@@ -55,7 -52,10 +55,9 @@@ import Ganeti.Prelud > import Test.QuickCheck > import qualified Test.HUnit as HUnit > > -import Control.Applicative > -import Control.Monad > +import Control.Monad (liftM, when) > + import qualified Data.ByteString as BS > + import qualified Data.ByteString.UTF8 as UTF8 > import Data.Char > import qualified Data.List as List > import qualified Data.Map as Map > @@@ -91,29 -91,9 +93,32 @@@ instance Arbitrary (Container DataColle > return GenericContainer { > fromContainer = Map.fromList $ zip names configs } > > +-- FYI: Currently only memory node value is used > +instance Arbitrary PartialHvStateParams where > + arbitrary = PartialHvStateParams <$> pure Nothing <*> pure Nothing > + <*> pure Nothing <*> genMaybe (fromPositive <$> arbitrary) > + <*> pure Nothing > + > +instance Arbitrary PartialHvState where > + arbitrary = do > + hv_params <- arbitrary > + return GenericContainer { > + fromContainer = Map.fromList [ hv_params ] } > + > +-- FYI: Currently only memory node value is used > +instance Arbitrary FilledHvStateParams where > + arbitrary = FilledHvStateParams <$> pure 0 <*> pure 0 <*> pure 0 > + <*> (fromPositive <$> arbitrary) <*> pure 0 > + > +instance Arbitrary FilledHvState where > + arbitrary = do > + hv_params <- arbitrary > + return GenericContainer { > + fromContainer = Map.fromList [ hv_params ] } > + > + instance Arbitrary BS.ByteString where > + arbitrary = fmap UTF8.fromString arbitrary > + > $(genArbitrary ''PartialNDParams) > > instance Arbitrary Node where > @@@ -398,37 -380,15 +405,44 @@@ instance Arbitrary FilterRule wher > <*> arbitrary > <*> arbitrary > <*> arbitrary > - <*> genUUID > + <*> fmap UTF8.fromString genUUID > + > + instance Arbitrary SshKeyType where > + arbitrary = oneof > + [ pure RSA > + , pure DSA > + , pure ECDSA > + ] > > +instance Arbitrary RepairStatus where > + arbitrary = elements [ RSNoted, RSPending, RSCanceled, RSFailed, > RSCompleted ] > + > +instance Arbitrary RepairAction where > + arbitrary = elements [ RANoop, RALiveRepair, RAEvacuate, > RAEvacuateFailover ] > + > +instance Arbitrary Incident where > + arbitrary = Incident <$> pure (J.JSObject $ J.toJSObject []) > + <*> arbitrary > + <*> arbitrary > + <*> arbitrary > + <*> arbitrary > + <*> arbitrary > + <*> arbitrary > + <*> arbitrary > + <*> arbitrary > + <*> arbitrary > + > +instance Arbitrary MaintenanceData where > + arbitrary = MaintenanceData <$> (fromPositive <$> arbitrary) > + <*> arbitrary > + <*> arbitrary > + <*> arbitrary > + <*> arbitrary > + <*> arbitrary > + <*> arbitrary > + <*> arbitrary > + <*> arbitrary > + > -- | Generates a network instance with minimum netmasks of /24. Generating > -- bigger networks slows down the tests, because long bit strings are > generated > -- for the reservations. > @@@ -485,8 -445,7 +499,8 @@@ genEmptyCluster ncount = d > networks = GenericContainer Map.empty > disks = GenericContainer Map.empty > filters = GenericContainer Map.empty > + maintenance <- arbitrary > - let contgroups = GenericContainer $ Map.singleton guuid grp > + let contgroups = GenericContainer $ Map.singleton (UTF8.fromString guuid) > grp > serial <- arbitrary > -- timestamp fields > ctime <- arbitrary > diff --cc test/py/cfgupgrade_unittest.py > index 0706f53,a6dec64..132575a > --- a/test/py/cfgupgrade_unittest.py > +++ b/test/py/cfgupgrade_unittest.py > @@@ -76,7 -74,8 +76,9 @@@ def GetMinimalConfig() > "cpu-avg-load": { "active": True, "interval": 5000000 }, > "xen-cpu-avg-load": { "active": True, "interval": 5000000 }, > }, > + "diagnose_data_collector_filename": "", > + "ssh_key_type": "dsa", > + "ssh_key_bits": 1024, > }, > "instances": {}, > "disks": {}, > > -- > Lisa Velden > Software Engineer > [email protected] > > Google Germany GmbH > Dienerstraße 12 > 80331 München > > Geschäftsführer: Matthew Scott Sucherman, Paul Terence Manicle > Registergericht und -nummer: Hamburg, HRB 86891 > Sitz der Gesellschaft: Hamburg > Hrvoje Ribicic Ganeti Engineering Google Germany GmbH Dienerstr. 12, 80331, München Geschäftsführer: Matthew Scott Sucherman, Paul Terence Manicle Registergericht und -nummer: Hamburg, HRB 86891 Sitz der Gesellschaft: Hamburg Diese E-Mail ist vertraulich. Wenn Sie nicht der richtige Adressat sind, leiten Sie diese bitte nicht weiter, informieren Sie den Absender und löschen Sie die E-Mail und alle Anhänge. Vielen Dank. This e-mail is confidential. If you are not the right addressee please do not forward it, please inform the sender, and please erase this e-mail including any attachments. Thanks.
