Uploaded a new patch to 19917. Changed the verbosity to debug, and addressed Daan’s comment on providing more distinctive text messages.
Sorry that I haven’t split them into smaller patches. Note in a few cases the original code was like: try { pstmt = txn.prepareAutoCloseStatement(sql); String gmtCutTime = DateUtil.getDateDisplayString(TimeZone.getTimeZone("GMT"), cutTime); pstmt.setString(1, gmtCutTime); pstmt.setString(2, gmtCutTime); ResultSet rs = pstmt.executeQuery(); while (rs.next()) { RunningHostCountInfo info = new RunningHostCountInfo(); info.setDcId(rs.getLong(1)); info.setHostType(rs.getString(2)); info.setCount(rs.getInt(3)); l.add(info); } } catch (SQLException e) { } catch (Throwable e) { } The try block only throws SQLException as checked exception, and this code would also swallow any unchecked exceptions. I removed the catch (Throwable) in these cases to avoid potentially swallowing any unexpected runtime exceptions. Please let me know if this is not desirable so I can further update. Thanks, Ding On Apr 2, 2014, at 5:17 PM, Ding Yuan <y...@eecg.toronto.edu> wrote: > Thanks all for the quick comments! > If i understand the discussion correctly, I will just change all the added > log printing statements to debug verbosity. I will upload a new patch for > that shortly. > > Now a bit back story: the reason we are doing this is that we recently did an > analysis on many bugs collected from JIRA to understand why today’s cloud > system fails. And we found that almost all of the cluster-wide failures are > caused by buggy exception handling, which would often turn a component > failure into a cluster-wide one. One of the common bug pattern is ignoring > some exceptions — allowing them to propagate and finally turn into disaster. > Therefore we built a simple static checking tool just to check some simple > rules for exception handling, such as if an exception is ignored. Admittedly, > it would be much harder to reason about the potential consequences caused for > ignoring a certain exception, that’s why without much more domain knowledge I > can only recommend to 1) avoid over-catching an exception, especially when > the handling logic will swallow it, and 2) log them, as what this patch does. > > Nevertheless, it seems the four cases I mentioned in JIRA-6242 are > particularly suspicious. It might be worthwhile to double check their > correctness if you have time. I am reposting them below. > > Thanks! > Ding > > ========================= > Case 1: > Line: 375, File: "com/cloud/network/ovs/OvsTunnelManagerImpl.java" > > 326: try { > 327: String myIp = getGreEndpointIP(dest.getHost(), nw); > .. .. .. > 373: } catch (Exception e) { > 374: // I really thing we should do a better handling of these exceptions > 375: s_logger.warn("Ovs Tunnel network created tunnel failed", e); > 376: } > > Comment seems to suggest for a better handling. > ========================= > ========================= > Case 2: > Line: 2295, File: "com/cloud/resource/ResourceManagerImpl.java" > > 2284: try { > 2285: /* > 2286: * FIXME: this is a buggy logic, check with alex. Shouldn't > 2287: * return if propagation return non null > 2288: */ > 2289: Boolean result = _clusterMgr.propagateResourceEvent(h.getId(), > ResourceState.Event.UpdatePassword); > 2290: if (result != null) { > 2291: return result; > 2292: } > 2293: > 2294: doUpdateHostPassword(h.getId()); > 2295: } catch (AgentUnavailableException e) { > 2296: } > > Seem from the comment the logic should be fixed. > A similar code snipeet is at: > Line: 2276, File: "com/cloud/resource/ResourceManagerImpl.java" > ========================= > > ========================= > Case 3: > > Line: 184, File: > "org/apache/cloudstack/api/command/user/autoscale/CreateAutoScaleVmGroupCmd.java" > > > 174: try > 175: { > 176: success = _autoScaleService.configureAutoScaleVmGroup(this); > 177: if (success) { > 178: vmGroup = _entityMgr.findById(AutoScaleVmGroup.class, getEntityId()); > 179: AutoScaleVmGroupResponse responseObject = > _responseGenerator.createAutoScaleVmGroupResponse(vmGroup); > 180: setResponseObject(responseObject); > 181: responseObject.setResponseName(getCommandName()); > 182: } > 183: } catch (Exception ex) { > 184: // TODO what will happen if Resource Layer fails in a step inbetween > 185: s_logger.warn("Failed to create autoscale vm group", ex); > 186: } > > The comment, "TODO", seems to suggest for better handling. > ========================= > ========================= > Case 4: > > Line: 222, File: "com/cloud/api/ApiDispatcher.java" > This snippet is moved to ParamProcessWorker.java in the trunk. > > 203: try { > 204: setFieldValue(field, cmd, paramObj, parameterAnnotation); > .. .. > 220: } catch (CloudRuntimeException cloudEx) { > 221: s_logger.error("CloudRuntimeException", cloudEx); > 222: // FIXME: Better error message? This only happens if the API command is > not executable, which typically > 223: //means > 224: // there was > 225: // and IllegalAccessException setting one of the parameters. > 226: throw new ServerApiException(ApiErrorCode.INTERNAL_ERROR, "Internal > error executing API command " + cmd.getCommandName().substring(0, > cmd.getCommandName().length() - 8)); > 227: } > > The "FIXME" comment seems to suggest for getter error message. > ========================= > On Apr 2, 2014, at 4:37 PM, Daan Hoogland <daan.hoogl...@gmail.com> wrote: > >> Ding Yuan, >> >> Any objections to that? >> >> On Wed, Apr 2, 2014 at 10:32 PM, Alena Prokharchyk >> <alena.prokharc...@citrix.com> wrote: >>> >>> >>> On 4/2/14, 1:27 PM, "Daan Hoogland" <daan.hoogl...@gmail.com> wrote: >>> >>>> I think we agree indeed. Doesn't mean we should start this discuss >>>> thread or write a arch guideline on the wiki somewhere. Maybe Ding >>>> Yuan wants to do a preliminary version of it? >>> >>> Wiki guide would be useful indeed. >>> >>> >>>> >>>> In the meantime I don't think that it hurts for the present patch to >>>> do everything in debug and decide about higher levels needed later. >>> >>> Agree. >>> >>> >>>> >>>> regards, >>>> >>>> On Wed, Apr 2, 2014 at 10:11 PM, Alena Prokharchyk >>>> <alena.prokharc...@citrix.com> wrote: >>>>> Daan, >>>>> >>>>> Correct me if I¹m wrong, but all of the logging added by Ding, fall >>>>> under >>>>> "to go with it or to indicate passing a certain code path². I¹ve just >>>>> noticed that some of them were added with DEBUG, and some with WARN >>>>> level, >>>>> and wanted to correct that. >>>>> >>>>> So we should: >>>>> >>>>> 1) For sure: never print them out in WARN as there is nothing admin >>>>> should >>>>> do in this case, because the code just handles them by ignoring. >>>>> 2) Figure out what would be the correct level to log them with: INFO or >>>>> DEBUG >>>>> >>>>> From ³Logging best practices² articles, I can see that people use INFO >>>>> as >>>>> a ³storyline² of normal application behavior, and DEBUG for sort of >>>>> information that helps you to track down the failure cases scenarios. >>>>> To >>>>> me, stuff added by Ding, falls under second category. But I might be >>>>> wrong >>>>> as I don¹t recall on the spot any discussions happening on the debug >>>>> topic, from the mailing list. >>>>> >>>>> -Alena. >>>>> >>>>> >>>>> On 4/2/14, 12:57 PM, "Daan Hoogland" <daan.hoogl...@gmail.com> wrote: >>>>> >>>>>> Alena, >>>>>> >>>>>> What I read in your comment is a description of INFO vs WARN. Debug >>>>>> would be only for outputting stacktraces to go with it or to indicate >>>>>> passing a certain code path. >>>>>> >>>>>> Agree? >>>>>> >>>>>> On Wed, Apr 2, 2014 at 8:31 PM, Alena Prokharchyk >>>>>> <alena.prokharc...@citrix.com> wrote: >>>>>>> >>>>>>> ----------------------------------------------------------- >>>>>>> This is an automatically generated e-mail. To reply, visit: >>>>>>> https://reviews.apache.org/r/19917/#review39324 >>>>>>> ----------------------------------------------------------- >>>>>>> >>>>>>> >>>>>>> Is there a reason why logs for some exceptions are being logged in >>>>>>> DEBUG mode, and some in WARN? From my point of view, if the code only >>>>>>> catches it and doesn't error out, it should be logged in DEBUG. Lots of >>>>>>> Admins are seeking for WARN statements in the log, and they might be >>>>>>> confused seeing WARN w/o further failure or retry. >>>>>>> >>>>>>> - Alena Prokharchyk >>>>>>> >>>>>>> >>>>>>> On April 2, 2014, 1:55 p.m., Ding Yuan wrote: >>>>>>>> >>>>>>>> ----------------------------------------------------------- >>>>>>>> This is an automatically generated e-mail. To reply, visit: >>>>>>>> https://reviews.apache.org/r/19917/ >>>>>>>> ----------------------------------------------------------- >>>>>>>> >>>>>>>> (Updated April 2, 2014, 1:55 p.m.) >>>>>>>> >>>>>>>> >>>>>>>> Review request for cloudstack. >>>>>>>> >>>>>>>> >>>>>>>> Repository: cloudstack-git >>>>>>>> >>>>>>>> >>>>>>>> Description >>>>>>>> ------- >>>>>>>> >>>>>>>> This is the patch for JIRA-6242. See >>>>>>>> https://issues.apache.org/jira/browse/CLOUDSTACK-6242 for more >>>>>>>> details. >>>>>>>> Thanks! >>>>>>>> >>>>>>>> >>>>>>>> Diffs >>>>>>>> ----- >>>>>>>> >>>>>>>> >>>>>>>> engine/orchestration/src/com/cloud/agent/manager/AgentManagerImpl.java >>>>>>>> 0d41bc1 >>>>>>>> >>>>>>>> engine/orchestration/src/com/cloud/agent/manager/ClusteredAgentManager >>>>>>>> Im >>>>>>>> pl.java 01508a4 >>>>>>>> >>>>>>>> engine/orchestration/src/com/cloud/vm/VirtualMachineManagerImpl.java >>>>>>>> 3e088db >>>>>>>> >>>>>>>> engine/orchestration/src/org/apache/cloudstack/engine/datacenter/entit >>>>>>>> y/ >>>>>>>> api/db/dao/EngineDataCenterDaoImpl.java 4b6818e >>>>>>>> engine/schema/src/com/cloud/dc/dao/DataCenterDaoImpl.java ea5039f >>>>>>>> engine/schema/src/com/cloud/host/dao/HostDaoImpl.java 426c90d >>>>>>>> engine/schema/src/com/cloud/storage/dao/StoragePoolHostDaoImpl.java >>>>>>>> e42eaf4 >>>>>>>> engine/schema/src/com/cloud/storage/dao/VMTemplateDaoImpl.java >>>>>>>> 34fdca5 >>>>>>>> engine/schema/src/com/cloud/upgrade/dao/Upgrade2214to30.java >>>>>>>> 58dd916 >>>>>>>> engine/schema/src/com/cloud/vm/dao/ConsoleProxyDaoImpl.java 5e9c2f0 >>>>>>>> engine/schema/src/com/cloud/vm/dao/SecondaryStorageVmDaoImpl.java >>>>>>>> 1f382d6 >>>>>>>> >>>>>>>> engine/storage/src/org/apache/cloudstack/storage/datastore/DataObjectM >>>>>>>> an >>>>>>>> agerImpl.java 6ed1274 >>>>>>>> >>>>>>>> framework/ipc/src/org/apache/cloudstack/framework/serializer/OnwireCla >>>>>>>> ss >>>>>>>> Registry.java 83c8a42 >>>>>>>> >>>>>>>> plugins/hypervisors/xen/src/com/cloud/hypervisor/xen/discoverer/XcpSer >>>>>>>> ve >>>>>>>> rDiscoverer.java 0ad6dc4 >>>>>>>> >>>>>>>> plugins/hypervisors/xen/src/com/cloud/hypervisor/xen/resource/XenServe >>>>>>>> rC >>>>>>>> onnectionPool.java b779085 >>>>>>>> >>>>>>>> plugins/hypervisors/xen/src/com/cloud/hypervisor/xen/resource/XenServe >>>>>>>> rS >>>>>>>> torageProcessor.java e512046 >>>>>>>> >>>>>>>> plugins/storage/volume/solidfire/src/org/apache/cloudstack/storage/dat >>>>>>>> as >>>>>>>> tore/lifecycle/SolidFirePrimaryDataStoreLifeCycle.java af6a77a >>>>>>>> server/src/com/cloud/resource/ResourceManagerImpl.java f9a59ba >>>>>>>> server/src/com/cloud/server/ConfigurationServerImpl.java b8da4c8 >>>>>>>> >>>>>>>> services/console-proxy/server/src/com/cloud/consoleproxy/ConsoleProxyT >>>>>>>> hu >>>>>>>> mbnailHandler.java 06f21d3 >>>>>>>> utils/src/com/cloud/utils/net/NetUtils.java 6350986 >>>>>>>> >>>>>>>> Diff: https://reviews.apache.org/r/19917/diff/ >>>>>>>> >>>>>>>> >>>>>>>> Testing >>>>>>>> ------- >>>>>>>> >>>>>>>> >>>>>>>> Thanks, >>>>>>>> >>>>>>>> Ding Yuan >>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> Daan >>>>> >>>> >>>> >>>> >>>> -- >>>> Daan >>> >> >> >> >> -- >> Daan >> >