Re: Service Engine Logging

Jacques Le Roux Sun, 07 Apr 2013 04:02:01 -0700

From: "Adrian Crum" <adrian.c...@sandglass-software.com>
> On 4/7/2013 10:30 AM, Jacques Le Roux wrote:
>> From: "Adrian Crum" <adrian.c...@sandglass-software.com>
>>>  From my perspective, you are the one complicating things. If you don't
>>> want to slow down the system with logging I/O, then turn logging off.
>> When you are continously developping a system, removing all log info/timing 
>> from production can be problematic.
>> It's sometimes the only way you have to understand why and how a random 
>> issue appears.
> 
> 
> That is helpful. So, when you are troubleshooting a production 
> installation, you enable TIMING and INFO logging to see what's going on. 
> Understood. Now, what is the purpose of using a threshold?


To separate slow services and, for the 2nd  threshold, possibly very slow ones.
Sometimes, it depends not only on external system but also on a whole 
"ecosystem" (net connections, firewalls, etc.)
We know things always changes, it's like a chaos sometimes.

At this stage you are yet sure not what for looking for.
So you need a 1st rough filter to better understand what's going on
Putting thresholds on services durations can then help to separate cases.
 
> You mentioned marking slow services, but a single threshold value for 
> all services will not do that - because every service is different and 
> each one will have a unique typical elapsed time. That was the reason 
> why I changed the text of the message - because services that were 
> completed in timely manner were being logged as "slow" - which was not true.

There were slow and very slow, that why I used 2 properties, and set the 1st to 
0, to have the behaviour you introduced and I agree with for OOTB setting (show 
all services duration)
 
> Have you considered using the metrics feature?

Ha forgot to ask about it, now I see and remember this thread and I will digg 
in http://markmail.org/message/x4lzvda66ju6gdg5

Of course, I'd not be against a brief briefing, or a link to explain.

Thanks for the reminder

Jacques

> 
> 
>>   
>>> "We've always done it that way" is not a valid argument.
>> I was just saying that personally I did not have problems whith how it was 
>> before your change. I was easy to adapt to my need. I have also no problems 
>> with your change, it's sill easy to adapt.
>>
>>> You still haven't given us any requirements that demonstrate the need to
>>> put complicated decision making around a single log entry.
>> My solution is not complicated at all 2 properties for 2 thresholds. Why do 
>> you feel it as complicated?
>> Actually I'm not stuck with this solution. If you want, comment it out or 
>> even revert and the chapter is closed.
>>
>> Jacques
>>
>>> -Adrian
>>>
>>> On 4/7/2013 9:18 AM, Jacques Le Roux wrote:
>>>> This seems over complicated to me. Don't arbitrarily slow OFBiz by storing 
>>>> (IOs) services durations at each service call. Let's keep it simple and 
>>>> easy tunable.
>>>>
>>>> The way it was before ( >50 => slow; > 200 => very slow) was there for 
>>>> years. I did not code it, and after a decade we want to change it, OK.
>>>>
>>>> Most of the time you don't care about that. When you do, you simply want 
>>>> to separate slow services from others.
>>>>
>>>> I believe OOTB 2 properties are enough. I set them to allow to see all 
>>>> services durations and to specifically mark slow services. I picked 1 sec, 
>>>> why? It would be terribly slow services internally, but externals ones can 
>>>> vary much more.
>>>> If you need more thresholds/separations it's easy from what is coded now 
>>>> to add suiting your needs.
>>>>
>>>>   From my experience, when looking at logs on a system you quickly know 
>>>> which are these values, when you are interested by them
>>>>
>>>> The most important point here is not slow the system and to follow 
>>>> KISS/YAGNI ways.
>>>>
>>>> Please don't complicate things, thanks!
>>>>
>>>> Jacques
>>>>
>>>>
>>>> From: "Atul Vani" <atul.v...@hotwaxmedia.com>
>>>>> Here's what I think, it's all raw though :)
>>>>> * As suggested by Jacopo, we maintain stats in some kind of entity. Let's
>>>>> say average running time.
>>>>> * We use this average running time to decide if a timing log should be
>>>>> printed. The thing is, not all services are same, some are complex and are
>>>>> supposed to take longer than others, so a static number would never
>>>>> suffice. We do not have any means to calculate the complexity of a service
>>>>> (by looking at the mini-lang code) to decide this number, so either a user
>>>>> will tell it or we can compute it on the basis of past run times.
>>>>> * There can come cases when user would want to tell the number (all past
>>>>> experiences were bad ones, or he just optimized), so we can add some
>>>>> attribute to the service definition. There is something like metric
>>>>> already present, not sure how that works.
>>>>> * A single number for all services does not make sense to me, because (as
>>>>> mentioned above) not all services are same.
>>>>>
>>>>> On Sun, 07 Apr 2013 04:14:58 +0530, Adrian Crum
>>>>> <adrian.c...@sandglass-software.com> wrote:
>>>>>
>>>>>> No, a commit is NOT a reply to an email.
>>>>>>
>>>>>> Please, let's discuss this. You seem to be forcing your perception of
>>>>>> how logging should be done on the rest of the community. I would prefer
>>>>>> that we all participate in a discussion and come up with a design that
>>>>>> works for everyone.
>>>>>>
>>>>>> -Adrian
>>>>>>
>>>>>> On 4/6/2013 5:32 PM, Jacques Le Roux wrote:
>>>>>>> I thought the commit comment answered your question
>>>>>>>
>>>>>>> OOTB, there is only 1 change from what you committed: now services
>>>>>>> longer than 1 sec will also show as slow in log
>>>>>>>
>>>>>>> Jacques
>>>>>>>
>>>>>>> From: "Adrian Crum" <adrian.c...@sandglass-software.com>
>>>>>>>> Please don't change the timing logging - there should not be any
>>>>>>>> conditions placed on it.
>>>>>>>>
>>>>>>>> You didn't answer my question. I was hoping we could avoid a commit war
>>>>>>>> by discussing your requirements and designing a solution that makes
>>>>>>>> everyone happy.
>>>>>>>>
>>>>>>>> -Adrian
>>>>>>>>
>>>>>>>> On 4/6/2013 12:00 PM, Jacques Le Roux wrote:
>>>>>>>>> Hi Adrian,
>>>>>>>>>
>>>>>>>>> Thanks for asking, I committed and commented at revision: 1465223
>>>>>>>>>
>>>>>>>>> Atul,
>>>>>>>>>
>>>>>>>>> It was not easy to read your patch in the email (cut at 80 chars).
>>>>>>>>> Please open a Jira if you want to improve my commit.
>>>>>>>>>
>>>>>>>>> Thanks
>>>>>>>>>
>>>>>>>>> Jacques
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> From: "Adrian Crum" <adrian.c...@sandglass-software.com>
>>>>>>>>>> Jacques,
>>>>>>>>>>
>>>>>>>>>> What are your requirements? What are you looking for in the logs?
>>>>>>>>>>
>>>>>>>>>> -Adrian
>>>>>>>>>>
>>>>>>>>>> On 4/4/2013 10:54 PM, Jacques Le Roux wrote:
>>>>>>>>>>> These numbers are from experience of hours and hours staring at
>>>>>>>>>>> clusters logs, but yes it's arbitrary and depend on context (as I
>>>>>>>>>>> guess were picked the initial numbers which are there for years
>>>>>>>>>>> Then why not the obvious solution Jacopo proposed of properties,
>>>>>>>>>>> easy to change even dynamically...
>>>>>>>>>>> I can't see anything more flexible, at least at 23:43 after days of
>>>>>>>>>>> works.
>>>>>>>>>>> AS you said, once you spot one such line in log it's not a biggie
>>>>>>>>>>> to get there (you have the class line in log) and adapt suiting
>>>>>>>>>>> your needs.
>>>>>>>>>>> So maybe "like you proposed" we could indeed put a very low value
>>>>>>>>>>> (I mean 0) as property.
>>>>>>>>>>>
>>>>>>>>>>> For Atul's proposition, sorry not the courage to check tonight
>>>>>>>>>>> (maybe a patch in a Jira would help to read)
>>>>>>>>>>>
>>>>>>>>>>> Jacques
>>>>>>>>>>> PS: I guessed "fuss around", knew tinker, not fidget :D
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Adrian Crum wrote:
>>>>>>>>>>>> Why not 20 or 30 or 40?
>>>>>>>>>>>>
>>>>>>>>>>>> That's the problem with arbitrary values - they don't mean
>>>>>>>>>>>> anything.
>>>>>>>>>>>>
>>>>>>>>>>>>      From my perspective, if anyone has timing enabled, then they
>>>>>>>>>>>> want to
>>>>>>>>>>>> see what's going on in the system.
>>>>>>>>>>>>
>>>>>>>>>>>> Feel free to change it.
>>>>>>>>>>>>
>>>>>>>>>>>> -Adrian
>>>>>>>>>>>>
>>>>>>>>>>>> On 4/3/2013 9:22 AM, Jacques Le Roux wrote:
>>>>>>>>>>>>> Hi Adrian, All,
>>>>>>>>>>>>>
>>>>>>>>>>>>> Should we really show the timing for all services?
>>>>>>>>>>>>> Maybe increasing from 50 to 75 or even 100 for the 1st case would
>>>>>>>>>>>>> be enough?
>>>>>>>>>>>>>
>>>>>>>>>>>>> Jacques
>>>>>>>>>>>>>
>>>>>>>>>>>>> From: <adri...@apache.org>
>>>>>>>>>>>>>> Author: adrianc
>>>>>>>>>>>>>> Date: Wed Apr  3 07:57:24 2013
>>>>>>>>>>>>>> New Revision: 1463863
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> URL: http://svn.apache.org/r1463863
>>>>>>>>>>>>>> Log:
>>>>>>>>>>>>>> Log message cleanup in ServiceDispatcher.java. Removed confusing
>>>>>>>>>>>>>> text about services taking too long.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Modified:
>>>>>>>>>>>>>>          
>>>>>>>>>>>>>> ofbiz/trunk/framework/service/src/org/ofbiz/service/ServiceDispatcher.java
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Modified:
>>>>>>>>>>>>>> ofbiz/trunk/framework/service/src/org/ofbiz/service/ServiceDispatcher.java
>>>>>>>>>>>>>> URL:
>>>>>>>>>>>>>> http://svn.apache.org/viewvc/ofbiz/trunk/framework/service/src/org/ofbiz/service/ServiceDispatcher.java?rev=1463863&r1=1463862&r2=1463863&view=diff
>>>>>>>>>>>>>> ==============================================================================
>>>>>>>>>>>>>> ---
>>>>>>>>>>>>>> ofbiz/trunk/framework/service/src/org/ofbiz/service/ServiceDispatcher.java
>>>>>>>>>>>>>> (original) +++
>>>>>>>>>>>>>> ofbiz/trunk/framework/service/src/org/ofbiz/service/ServiceDispatcher.java
>>>>>>>>>>>>>> Wed Apr  3 07:57:24 2013 @@ -571,10 +571,8 @@ public
>>>>>>>>>>>>>>               class ServiceDispatcher { rs.setEndStamp();
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>               long timeToRun = System.currentTimeMillis() -
>>>>>>>>>>>>>> serviceStartTime;
>>>>>>>>>>>>>> -        if (Debug.timingOn() && timeToRun > 50) {
>>>>>>>>>>>>>> -            Debug.logTiming("Slow sync service execution
>>>>>>>>>>>>>> detected: service [" + localName + "/" + modelService.name + "]
>>>>>>>>>>>>>> finished in [" + timeToRun + "] milliseconds", module);
>>>>>>>>>>>>>> -        } else if (Debug.infoOn() && timeToRun > 200) {
>>>>>>>>>>>>>> -            Debug.logInfo("Very slow sync service execution
>>>>>>>>>>>>>> detected: service [" + localName + "/" + modelService.name + "]
>>>>>>>>>>>>>> finished in [" + timeToRun + "] milliseconds", module); +
>>>>>>>>>>>>>> if (Debug.timingOn()) {
>>>>>>>>>>>>>> +            Debug.logTiming("Sync service [" + localName + "/"
>>>>>>>>>>>>>> + modelService.name + "] finished in [" + timeToRun + "]
>>>>>>>>>>>>>>               milliseconds", module); }
>>>>>>>>>>>>>>               if ((Debug.verboseOn() || modelService.debug) &&
>>>>>>>>>>>>>> timeToRun > 50 && !modelService.hideResultInLog) {
>>>>>>>>>>>>>>                   // Sanity check - some service results can be
>>>>>>>>>>>>>> multiple MB in size. Limit message size to 10K.
>>>>> -- 
>>>>> Using Opera's revolutionary email client: http://www.opera.com/mail/
>

Re: Service Engine Logging

Reply via email to