Uploaded a new patch to 19917. Changed the verbosity to debug, and addressed
Daan’s comment on providing more distinctive text messages.
Sorry that I haven’t split them into smaller patches.
Note in a few cases the original code was like:
try {
pstmt = txn.prepareAutoCloseStatement(sql);
String gmtCutTime =
DateUtil.getDateDisplayString(TimeZone.getTimeZone("GMT"), cutTime);
pstmt.setString(1, gmtCutTime);
pstmt.setString(2, gmtCutTime);
ResultSet rs = pstmt.executeQuery();
while (rs.next()) {
RunningHostCountInfo info = new RunningHostCountInfo();
info.setDcId(rs.getLong(1));
info.setHostType(rs.getString(2));
info.setCount(rs.getInt(3));
l.add(info);
}
} catch (SQLException e) {
} catch (Throwable e) {
}
The try block only throws SQLException as checked exception, and this code
would also swallow any unchecked exceptions. I removed the catch (Throwable) in
these cases to avoid potentially swallowing any unexpected runtime exceptions.
Please let me know if this is not desirable so I can further update.
Thanks,
Ding
On Apr 2, 2014, at 5:17 PM, Ding Yuan <[email protected]> wrote:
> Thanks all for the quick comments!
> If i understand the discussion correctly, I will just change all the added
> log printing statements to debug verbosity. I will upload a new patch for
> that shortly.
>
> Now a bit back story: the reason we are doing this is that we recently did an
> analysis on many bugs collected from JIRA to understand why today’s cloud
> system fails. And we found that almost all of the cluster-wide failures are
> caused by buggy exception handling, which would often turn a component
> failure into a cluster-wide one. One of the common bug pattern is ignoring
> some exceptions — allowing them to propagate and finally turn into disaster.
> Therefore we built a simple static checking tool just to check some simple
> rules for exception handling, such as if an exception is ignored. Admittedly,
> it would be much harder to reason about the potential consequences caused for
> ignoring a certain exception, that’s why without much more domain knowledge I
> can only recommend to 1) avoid over-catching an exception, especially when
> the handling logic will swallow it, and 2) log them, as what this patch does.
>
> Nevertheless, it seems the four cases I mentioned in JIRA-6242 are
> particularly suspicious. It might be worthwhile to double check their
> correctness if you have time. I am reposting them below.
>
> Thanks!
> Ding
>
> =========================
> Case 1:
> Line: 375, File: "com/cloud/network/ovs/OvsTunnelManagerImpl.java"
>
> 326: try {
> 327: String myIp = getGreEndpointIP(dest.getHost(), nw);
> .. .. ..
> 373: } catch (Exception e) {
> 374: // I really thing we should do a better handling of these exceptions
> 375: s_logger.warn("Ovs Tunnel network created tunnel failed", e);
> 376: }
>
> Comment seems to suggest for a better handling.
> =========================
> =========================
> Case 2:
> Line: 2295, File: "com/cloud/resource/ResourceManagerImpl.java"
>
> 2284: try {
> 2285: /*
> 2286: * FIXME: this is a buggy logic, check with alex. Shouldn't
> 2287: * return if propagation return non null
> 2288: */
> 2289: Boolean result = _clusterMgr.propagateResourceEvent(h.getId(),
> ResourceState.Event.UpdatePassword);
> 2290: if (result != null) {
> 2291: return result;
> 2292: }
> 2293:
> 2294: doUpdateHostPassword(h.getId());
> 2295: } catch (AgentUnavailableException e) {
> 2296: }
>
> Seem from the comment the logic should be fixed.
> A similar code snipeet is at:
> Line: 2276, File: "com/cloud/resource/ResourceManagerImpl.java"
> =========================
>
> =========================
> Case 3:
>
> Line: 184, File:
> "org/apache/cloudstack/api/command/user/autoscale/CreateAutoScaleVmGroupCmd.java"
>
>
> 174: try
> 175: {
> 176: success = _autoScaleService.configureAutoScaleVmGroup(this);
> 177: if (success) {
> 178: vmGroup = _entityMgr.findById(AutoScaleVmGroup.class, getEntityId());
> 179: AutoScaleVmGroupResponse responseObject =
> _responseGenerator.createAutoScaleVmGroupResponse(vmGroup);
> 180: setResponseObject(responseObject);
> 181: responseObject.setResponseName(getCommandName());
> 182: }
> 183: } catch (Exception ex) {
> 184: // TODO what will happen if Resource Layer fails in a step inbetween
> 185: s_logger.warn("Failed to create autoscale vm group", ex);
> 186: }
>
> The comment, "TODO", seems to suggest for better handling.
> =========================
> =========================
> Case 4:
>
> Line: 222, File: "com/cloud/api/ApiDispatcher.java"
> This snippet is moved to ParamProcessWorker.java in the trunk.
>
> 203: try {
> 204: setFieldValue(field, cmd, paramObj, parameterAnnotation);
> .. ..
> 220: } catch (CloudRuntimeException cloudEx) {
> 221: s_logger.error("CloudRuntimeException", cloudEx);
> 222: // FIXME: Better error message? This only happens if the API command is
> not executable, which typically
> 223: //means
> 224: // there was
> 225: // and IllegalAccessException setting one of the parameters.
> 226: throw new ServerApiException(ApiErrorCode.INTERNAL_ERROR, "Internal
> error executing API command " + cmd.getCommandName().substring(0,
> cmd.getCommandName().length() - 8));
> 227: }
>
> The "FIXME" comment seems to suggest for getter error message.
> =========================
> On Apr 2, 2014, at 4:37 PM, Daan Hoogland <[email protected]> wrote:
>
>> Ding Yuan,
>>
>> Any objections to that?
>>
>> On Wed, Apr 2, 2014 at 10:32 PM, Alena Prokharchyk
>> <[email protected]> wrote:
>>>
>>>
>>> On 4/2/14, 1:27 PM, "Daan Hoogland" <[email protected]> wrote:
>>>
>>>> I think we agree indeed. Doesn't mean we should start this discuss
>>>> thread or write a arch guideline on the wiki somewhere. Maybe Ding
>>>> Yuan wants to do a preliminary version of it?
>>>
>>> Wiki guide would be useful indeed.
>>>
>>>
>>>>
>>>> In the meantime I don't think that it hurts for the present patch to
>>>> do everything in debug and decide about higher levels needed later.
>>>
>>> Agree.
>>>
>>>
>>>>
>>>> regards,
>>>>
>>>> On Wed, Apr 2, 2014 at 10:11 PM, Alena Prokharchyk
>>>> <[email protected]> wrote:
>>>>> Daan,
>>>>>
>>>>> Correct me if I¹m wrong, but all of the logging added by Ding, fall
>>>>> under
>>>>> "to go with it or to indicate passing a certain code path². I¹ve just
>>>>> noticed that some of them were added with DEBUG, and some with WARN
>>>>> level,
>>>>> and wanted to correct that.
>>>>>
>>>>> So we should:
>>>>>
>>>>> 1) For sure: never print them out in WARN as there is nothing admin
>>>>> should
>>>>> do in this case, because the code just handles them by ignoring.
>>>>> 2) Figure out what would be the correct level to log them with: INFO or
>>>>> DEBUG
>>>>>
>>>>> From ³Logging best practices² articles, I can see that people use INFO
>>>>> as
>>>>> a ³storyline² of normal application behavior, and DEBUG for sort of
>>>>> information that helps you to track down the failure cases scenarios.
>>>>> To
>>>>> me, stuff added by Ding, falls under second category. But I might be
>>>>> wrong
>>>>> as I don¹t recall on the spot any discussions happening on the debug
>>>>> topic, from the mailing list.
>>>>>
>>>>> -Alena.
>>>>>
>>>>>
>>>>> On 4/2/14, 12:57 PM, "Daan Hoogland" <[email protected]> wrote:
>>>>>
>>>>>> Alena,
>>>>>>
>>>>>> What I read in your comment is a description of INFO vs WARN. Debug
>>>>>> would be only for outputting stacktraces to go with it or to indicate
>>>>>> passing a certain code path.
>>>>>>
>>>>>> Agree?
>>>>>>
>>>>>> On Wed, Apr 2, 2014 at 8:31 PM, Alena Prokharchyk
>>>>>> <[email protected]> wrote:
>>>>>>>
>>>>>>> -----------------------------------------------------------
>>>>>>> This is an automatically generated e-mail. To reply, visit:
>>>>>>> https://reviews.apache.org/r/19917/#review39324
>>>>>>> -----------------------------------------------------------
>>>>>>>
>>>>>>>
>>>>>>> Is there a reason why logs for some exceptions are being logged in
>>>>>>> DEBUG mode, and some in WARN? From my point of view, if the code only
>>>>>>> catches it and doesn't error out, it should be logged in DEBUG. Lots of
>>>>>>> Admins are seeking for WARN statements in the log, and they might be
>>>>>>> confused seeing WARN w/o further failure or retry.
>>>>>>>
>>>>>>> - Alena Prokharchyk
>>>>>>>
>>>>>>>
>>>>>>> On April 2, 2014, 1:55 p.m., Ding Yuan wrote:
>>>>>>>>
>>>>>>>> -----------------------------------------------------------
>>>>>>>> This is an automatically generated e-mail. To reply, visit:
>>>>>>>> https://reviews.apache.org/r/19917/
>>>>>>>> -----------------------------------------------------------
>>>>>>>>
>>>>>>>> (Updated April 2, 2014, 1:55 p.m.)
>>>>>>>>
>>>>>>>>
>>>>>>>> Review request for cloudstack.
>>>>>>>>
>>>>>>>>
>>>>>>>> Repository: cloudstack-git
>>>>>>>>
>>>>>>>>
>>>>>>>> Description
>>>>>>>> -------
>>>>>>>>
>>>>>>>> This is the patch for JIRA-6242. See
>>>>>>>> https://issues.apache.org/jira/browse/CLOUDSTACK-6242 for more
>>>>>>>> details.
>>>>>>>> Thanks!
>>>>>>>>
>>>>>>>>
>>>>>>>> Diffs
>>>>>>>> -----
>>>>>>>>
>>>>>>>>
>>>>>>>> engine/orchestration/src/com/cloud/agent/manager/AgentManagerImpl.java
>>>>>>>> 0d41bc1
>>>>>>>>
>>>>>>>> engine/orchestration/src/com/cloud/agent/manager/ClusteredAgentManager
>>>>>>>> Im
>>>>>>>> pl.java 01508a4
>>>>>>>>
>>>>>>>> engine/orchestration/src/com/cloud/vm/VirtualMachineManagerImpl.java
>>>>>>>> 3e088db
>>>>>>>>
>>>>>>>> engine/orchestration/src/org/apache/cloudstack/engine/datacenter/entit
>>>>>>>> y/
>>>>>>>> api/db/dao/EngineDataCenterDaoImpl.java 4b6818e
>>>>>>>> engine/schema/src/com/cloud/dc/dao/DataCenterDaoImpl.java ea5039f
>>>>>>>> engine/schema/src/com/cloud/host/dao/HostDaoImpl.java 426c90d
>>>>>>>> engine/schema/src/com/cloud/storage/dao/StoragePoolHostDaoImpl.java
>>>>>>>> e42eaf4
>>>>>>>> engine/schema/src/com/cloud/storage/dao/VMTemplateDaoImpl.java
>>>>>>>> 34fdca5
>>>>>>>> engine/schema/src/com/cloud/upgrade/dao/Upgrade2214to30.java
>>>>>>>> 58dd916
>>>>>>>> engine/schema/src/com/cloud/vm/dao/ConsoleProxyDaoImpl.java 5e9c2f0
>>>>>>>> engine/schema/src/com/cloud/vm/dao/SecondaryStorageVmDaoImpl.java
>>>>>>>> 1f382d6
>>>>>>>>
>>>>>>>> engine/storage/src/org/apache/cloudstack/storage/datastore/DataObjectM
>>>>>>>> an
>>>>>>>> agerImpl.java 6ed1274
>>>>>>>>
>>>>>>>> framework/ipc/src/org/apache/cloudstack/framework/serializer/OnwireCla
>>>>>>>> ss
>>>>>>>> Registry.java 83c8a42
>>>>>>>>
>>>>>>>> plugins/hypervisors/xen/src/com/cloud/hypervisor/xen/discoverer/XcpSer
>>>>>>>> ve
>>>>>>>> rDiscoverer.java 0ad6dc4
>>>>>>>>
>>>>>>>> plugins/hypervisors/xen/src/com/cloud/hypervisor/xen/resource/XenServe
>>>>>>>> rC
>>>>>>>> onnectionPool.java b779085
>>>>>>>>
>>>>>>>> plugins/hypervisors/xen/src/com/cloud/hypervisor/xen/resource/XenServe
>>>>>>>> rS
>>>>>>>> torageProcessor.java e512046
>>>>>>>>
>>>>>>>> plugins/storage/volume/solidfire/src/org/apache/cloudstack/storage/dat
>>>>>>>> as
>>>>>>>> tore/lifecycle/SolidFirePrimaryDataStoreLifeCycle.java af6a77a
>>>>>>>> server/src/com/cloud/resource/ResourceManagerImpl.java f9a59ba
>>>>>>>> server/src/com/cloud/server/ConfigurationServerImpl.java b8da4c8
>>>>>>>>
>>>>>>>> services/console-proxy/server/src/com/cloud/consoleproxy/ConsoleProxyT
>>>>>>>> hu
>>>>>>>> mbnailHandler.java 06f21d3
>>>>>>>> utils/src/com/cloud/utils/net/NetUtils.java 6350986
>>>>>>>>
>>>>>>>> Diff: https://reviews.apache.org/r/19917/diff/
>>>>>>>>
>>>>>>>>
>>>>>>>> Testing
>>>>>>>> -------
>>>>>>>>
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>>
>>>>>>>> Ding Yuan
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Daan
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Daan
>>>
>>
>>
>>
>> --
>> Daan
>>
>