Uploaded a new patch to 19917. Changed the verbosity to debug, and addressed 
Daan’s comment on providing more distinctive text messages. 

Sorry that I haven’t split them into smaller patches. 

Note in a few cases the original code was like:
         try {
            pstmt = txn.prepareAutoCloseStatement(sql);
            String gmtCutTime = 
DateUtil.getDateDisplayString(TimeZone.getTimeZone("GMT"), cutTime);
            pstmt.setString(1, gmtCutTime);
            pstmt.setString(2, gmtCutTime);

            ResultSet rs = pstmt.executeQuery();
            while (rs.next()) {
                RunningHostCountInfo info = new RunningHostCountInfo();
                info.setDcId(rs.getLong(1));
                info.setHostType(rs.getString(2));
                info.setCount(rs.getInt(3));

                l.add(info);
                       }
                 } catch (SQLException e) {
                 } catch (Throwable e) {
                 }

The try block only throws SQLException as checked exception, and this code 
would also swallow any unchecked exceptions. I removed the catch (Throwable) in 
these cases to avoid potentially swallowing any unexpected runtime exceptions. 
Please let me know if this is not desirable so I can further update. 

Thanks,
Ding
 
On Apr 2, 2014, at 5:17 PM, Ding Yuan <y...@eecg.toronto.edu> wrote:

> Thanks all for the quick comments!
> If i understand the discussion correctly, I will just change all the added 
> log printing statements to debug verbosity. I will upload a new patch for 
> that shortly. 
> 
> Now a bit back story: the reason we are doing this is that we recently did an 
> analysis on many bugs collected from JIRA to understand why today’s cloud 
> system fails. And we found that almost all of the cluster-wide failures are 
> caused by buggy exception handling, which would often turn a component 
> failure into a cluster-wide one. One of the common bug pattern is ignoring 
> some exceptions — allowing them to propagate and finally turn into disaster. 
> Therefore we built a simple static checking tool just to check some simple 
> rules for exception handling, such as if an exception is ignored. Admittedly, 
> it would be much harder to reason about the potential consequences caused for 
> ignoring a certain exception, that’s why without much more domain knowledge I 
> can only recommend to 1) avoid over-catching an exception, especially when 
> the handling logic will swallow it, and 2) log them, as what this patch does.
> 
> Nevertheless, it seems the four cases I mentioned in JIRA-6242 are 
> particularly suspicious. It might be worthwhile to double check their 
> correctness if you have time. I am reposting them below.
> 
> Thanks!
> Ding
> 
> ========================= 
> Case 1: 
> Line: 375, File: "com/cloud/network/ovs/OvsTunnelManagerImpl.java" 
> 
> 326: try { 
> 327: String myIp = getGreEndpointIP(dest.getHost(), nw); 
>   .. .. .. 
> 373: } catch (Exception e) { 
> 374: // I really thing we should do a better handling of these exceptions 
> 375: s_logger.warn("Ovs Tunnel network created tunnel failed", e); 
> 376: } 
> 
> Comment seems to suggest for a better handling. 
> ========================= 
> ========================= 
> Case 2: 
> Line: 2295, File: "com/cloud/resource/ResourceManagerImpl.java" 
> 
> 2284: try { 
> 2285: /* 
> 2286: * FIXME: this is a buggy logic, check with alex. Shouldn't 
> 2287: * return if propagation return non null 
> 2288: */ 
> 2289: Boolean result = _clusterMgr.propagateResourceEvent(h.getId(), 
> ResourceState.Event.UpdatePassword); 
> 2290: if (result != null) { 
> 2291: return result; 
> 2292: } 
> 2293: 
> 2294: doUpdateHostPassword(h.getId()); 
> 2295: } catch (AgentUnavailableException e) { 
> 2296: } 
> 
> Seem from the comment the logic should be fixed. 
> A similar code snipeet is at: 
>   Line: 2276, File: "com/cloud/resource/ResourceManagerImpl.java" 
> ========================= 
> 
> ========================= 
> Case 3: 
> 
>   Line: 184, File: 
> "org/apache/cloudstack/api/command/user/autoscale/CreateAutoScaleVmGroupCmd.java"
>  
> 
> 174: try 
> 175: { 
> 176: success = _autoScaleService.configureAutoScaleVmGroup(this); 
> 177: if (success) { 
> 178: vmGroup = _entityMgr.findById(AutoScaleVmGroup.class, getEntityId()); 
> 179: AutoScaleVmGroupResponse responseObject = 
> _responseGenerator.createAutoScaleVmGroupResponse(vmGroup); 
> 180: setResponseObject(responseObject); 
> 181: responseObject.setResponseName(getCommandName()); 
> 182: } 
> 183: } catch (Exception ex) { 
> 184: // TODO what will happen if Resource Layer fails in a step inbetween 
> 185: s_logger.warn("Failed to create autoscale vm group", ex); 
> 186: } 
> 
> The comment, "TODO", seems to suggest for better handling. 
> ========================= 
> ========================= 
> Case 4: 
> 
>   Line: 222, File: "com/cloud/api/ApiDispatcher.java" 
> This snippet is moved to ParamProcessWorker.java in the trunk. 
> 
> 203: try { 
> 204: setFieldValue(field, cmd, paramObj, parameterAnnotation); 
>      .. .. 
> 220: } catch (CloudRuntimeException cloudEx) { 
> 221: s_logger.error("CloudRuntimeException", cloudEx); 
> 222: // FIXME: Better error message? This only happens if the API command is 
> not executable, which typically 
> 223: //means 
> 224: // there was 
> 225: // and IllegalAccessException setting one of the parameters. 
> 226: throw new ServerApiException(ApiErrorCode.INTERNAL_ERROR, "Internal 
> error executing API command " + cmd.getCommandName().substring(0, 
> cmd.getCommandName().length() - 8)); 
> 227: } 
> 
> The "FIXME" comment seems to suggest for getter error message. 
> =========================
> On Apr 2, 2014, at 4:37 PM, Daan Hoogland <daan.hoogl...@gmail.com> wrote:
> 
>> Ding Yuan,
>> 
>> Any objections to that?
>> 
>> On Wed, Apr 2, 2014 at 10:32 PM, Alena Prokharchyk
>> <alena.prokharc...@citrix.com> wrote:
>>> 
>>> 
>>> On 4/2/14, 1:27 PM, "Daan Hoogland" <daan.hoogl...@gmail.com> wrote:
>>> 
>>>> I think we agree indeed. Doesn't mean we should start this discuss
>>>> thread or write a arch guideline on the wiki somewhere. Maybe Ding
>>>> Yuan wants to do a preliminary version of it?
>>> 
>>> Wiki guide would be useful indeed.
>>> 
>>> 
>>>> 
>>>> In the meantime I don't think that it hurts for the present patch to
>>>> do everything in debug and decide about higher levels needed later.
>>> 
>>> Agree.
>>> 
>>> 
>>>> 
>>>> regards,
>>>> 
>>>> On Wed, Apr 2, 2014 at 10:11 PM, Alena Prokharchyk
>>>> <alena.prokharc...@citrix.com> wrote:
>>>>> Daan,
>>>>> 
>>>>> Correct me if I¹m wrong, but all of the logging added by Ding, fall
>>>>> under
>>>>> "to go with it or to indicate passing a certain code path². I¹ve just
>>>>> noticed that some of them were added with DEBUG, and some with WARN
>>>>> level,
>>>>> and wanted to correct that.
>>>>> 
>>>>> So we should:
>>>>> 
>>>>> 1) For sure: never print them out in WARN as there is nothing admin
>>>>> should
>>>>> do in this case, because the code just handles them by ignoring.
>>>>> 2) Figure out what would be the correct level to log them with: INFO or
>>>>> DEBUG
>>>>> 
>>>>> From ³Logging best practices² articles, I can see that people use INFO
>>>>> as
>>>>> a ³storyline² of normal application behavior, and DEBUG for sort of
>>>>> information that helps you to track down the failure cases scenarios.
>>>>> To
>>>>> me, stuff added by Ding, falls under second category. But I might be
>>>>> wrong
>>>>> as I don¹t recall on the spot any discussions happening on the debug
>>>>> topic, from the mailing list.
>>>>> 
>>>>> -Alena.
>>>>> 
>>>>> 
>>>>> On 4/2/14, 12:57 PM, "Daan Hoogland" <daan.hoogl...@gmail.com> wrote:
>>>>> 
>>>>>> Alena,
>>>>>> 
>>>>>> What I read in your comment is a description of INFO vs WARN. Debug
>>>>>> would be only for outputting stacktraces to go with it or to indicate
>>>>>> passing a certain code path.
>>>>>> 
>>>>>> Agree?
>>>>>> 
>>>>>> On Wed, Apr 2, 2014 at 8:31 PM, Alena Prokharchyk
>>>>>> <alena.prokharc...@citrix.com> wrote:
>>>>>>> 
>>>>>>> -----------------------------------------------------------
>>>>>>> This is an automatically generated e-mail. To reply, visit:
>>>>>>> https://reviews.apache.org/r/19917/#review39324
>>>>>>> -----------------------------------------------------------
>>>>>>> 
>>>>>>> 
>>>>>>> Is there a reason why logs for some exceptions are being logged in
>>>>>>> DEBUG mode, and some in WARN? From my point of view, if the code only
>>>>>>> catches it and doesn't error out, it should be logged in DEBUG. Lots of
>>>>>>> Admins are seeking for WARN statements in the log, and they might be
>>>>>>> confused seeing WARN w/o further failure or retry.
>>>>>>> 
>>>>>>> - Alena Prokharchyk
>>>>>>> 
>>>>>>> 
>>>>>>> On April 2, 2014, 1:55 p.m., Ding Yuan wrote:
>>>>>>>> 
>>>>>>>> -----------------------------------------------------------
>>>>>>>> This is an automatically generated e-mail. To reply, visit:
>>>>>>>> https://reviews.apache.org/r/19917/
>>>>>>>> -----------------------------------------------------------
>>>>>>>> 
>>>>>>>> (Updated April 2, 2014, 1:55 p.m.)
>>>>>>>> 
>>>>>>>> 
>>>>>>>> Review request for cloudstack.
>>>>>>>> 
>>>>>>>> 
>>>>>>>> Repository: cloudstack-git
>>>>>>>> 
>>>>>>>> 
>>>>>>>> Description
>>>>>>>> -------
>>>>>>>> 
>>>>>>>> This is the patch for JIRA-6242. See
>>>>>>>> https://issues.apache.org/jira/browse/CLOUDSTACK-6242 for more
>>>>>>>> details.
>>>>>>>> Thanks!
>>>>>>>> 
>>>>>>>> 
>>>>>>>> Diffs
>>>>>>>> -----
>>>>>>>> 
>>>>>>>> 
>>>>>>>> engine/orchestration/src/com/cloud/agent/manager/AgentManagerImpl.java
>>>>>>>> 0d41bc1
>>>>>>>> 
>>>>>>>> engine/orchestration/src/com/cloud/agent/manager/ClusteredAgentManager
>>>>>>>> Im
>>>>>>>> pl.java 01508a4
>>>>>>>> 
>>>>>>>> engine/orchestration/src/com/cloud/vm/VirtualMachineManagerImpl.java
>>>>>>>> 3e088db
>>>>>>>> 
>>>>>>>> engine/orchestration/src/org/apache/cloudstack/engine/datacenter/entit
>>>>>>>> y/
>>>>>>>> api/db/dao/EngineDataCenterDaoImpl.java 4b6818e
>>>>>>>>  engine/schema/src/com/cloud/dc/dao/DataCenterDaoImpl.java ea5039f
>>>>>>>>  engine/schema/src/com/cloud/host/dao/HostDaoImpl.java 426c90d
>>>>>>>>  engine/schema/src/com/cloud/storage/dao/StoragePoolHostDaoImpl.java
>>>>>>>> e42eaf4
>>>>>>>>  engine/schema/src/com/cloud/storage/dao/VMTemplateDaoImpl.java
>>>>>>>> 34fdca5
>>>>>>>>  engine/schema/src/com/cloud/upgrade/dao/Upgrade2214to30.java
>>>>>>>> 58dd916
>>>>>>>>  engine/schema/src/com/cloud/vm/dao/ConsoleProxyDaoImpl.java 5e9c2f0
>>>>>>>>  engine/schema/src/com/cloud/vm/dao/SecondaryStorageVmDaoImpl.java
>>>>>>>> 1f382d6
>>>>>>>> 
>>>>>>>> engine/storage/src/org/apache/cloudstack/storage/datastore/DataObjectM
>>>>>>>> an
>>>>>>>> agerImpl.java 6ed1274
>>>>>>>> 
>>>>>>>> framework/ipc/src/org/apache/cloudstack/framework/serializer/OnwireCla
>>>>>>>> ss
>>>>>>>> Registry.java 83c8a42
>>>>>>>> 
>>>>>>>> plugins/hypervisors/xen/src/com/cloud/hypervisor/xen/discoverer/XcpSer
>>>>>>>> ve
>>>>>>>> rDiscoverer.java 0ad6dc4
>>>>>>>> 
>>>>>>>> plugins/hypervisors/xen/src/com/cloud/hypervisor/xen/resource/XenServe
>>>>>>>> rC
>>>>>>>> onnectionPool.java b779085
>>>>>>>> 
>>>>>>>> plugins/hypervisors/xen/src/com/cloud/hypervisor/xen/resource/XenServe
>>>>>>>> rS
>>>>>>>> torageProcessor.java e512046
>>>>>>>> 
>>>>>>>> plugins/storage/volume/solidfire/src/org/apache/cloudstack/storage/dat
>>>>>>>> as
>>>>>>>> tore/lifecycle/SolidFirePrimaryDataStoreLifeCycle.java af6a77a
>>>>>>>>  server/src/com/cloud/resource/ResourceManagerImpl.java f9a59ba
>>>>>>>>  server/src/com/cloud/server/ConfigurationServerImpl.java b8da4c8
>>>>>>>> 
>>>>>>>> services/console-proxy/server/src/com/cloud/consoleproxy/ConsoleProxyT
>>>>>>>> hu
>>>>>>>> mbnailHandler.java 06f21d3
>>>>>>>>  utils/src/com/cloud/utils/net/NetUtils.java 6350986
>>>>>>>> 
>>>>>>>> Diff: https://reviews.apache.org/r/19917/diff/
>>>>>>>> 
>>>>>>>> 
>>>>>>>> Testing
>>>>>>>> -------
>>>>>>>> 
>>>>>>>> 
>>>>>>>> Thanks,
>>>>>>>> 
>>>>>>>> Ding Yuan
>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> --
>>>>>> Daan
>>>>> 
>>>> 
>>>> 
>>>> 
>>>> --
>>>> Daan
>>> 
>> 
>> 
>> 
>> -- 
>> Daan
>> 
> 

Reply via email to