[389-devel] Re: Logging future direction and ideas.

Rich Megginson Fri, 10 May 2019 09:24:51 -0700

On 5/9/19 9:13 PM, William Brown wrote:

Hi all,


So I think it's time for me to write some logging code to improve the 
situation. Relevant links before we start:

https://pagure.io/389-ds-base/issue/49415
http://www.port389.org/docs/389ds/design/logging-performance-improvement.html
https://pagure.io/389-ds-base/issue/50350
https://pagure.io/389-ds-base/issue/50361


All of these links touch on issues around logging, and I think they all combine 
to create three important points:

* The performance of logging should be improved
* The amount of details (fine grain) and information in logs should improve
* The structure of the log content should be improved to aid interaction 
(possibly even machine parsable)

I will turn this into a design document, but there are some questions I would 
like some input to help answer as part of this process to help set the 
direction and tasks to achieve.

-- Should our logs as they exist today, continue to exist?

I think that my view on this is "no". I think if we make something better, we 
have little need to continue to support our legacy interfaces. Of course, this would be a 
large change and it may not sit comfortably with people.

A large part of this thinking is that the "new" log interface I want to add is 
focused on *operations* rather than auditing accesses or changes, or over looking at 
errors. The information of both the current access/audit/error would largely be melded 
into a single operation log, and then with tools like logconv, we
could parse and extract information that would behave the same way as 
access/error/audit.

At the same time - I can see how people *may* want a "realtime" audit of operations as 
they occur (IE access log), but this still today is already limited by having to "wait" 
for operations to complete.

In a crash scenario, we would be able to still view the logs that are queued, 
so I think there are not so many concerns about losing information in these 
cases (in fact we'd probably have more).

-- What should the operation log look like?

I think it should be structured, and should be whole-units of information, 
related to a single operation. IE only at the conclusion of the operation is it 
logged (thus the async!). It should support arbitrary, nested timers, and would 
*not* support log levels - it's a detailed log of the process each query goes 
through.

An example could be something like:

[timestamp] - [conn=id op=id] - start operation
[timestamp] - [conn=id op=id] - start time = time ...
[timestamp] - [conn=id op=id] - started internal search '(some=filter)'
[timestamp] - [conn=id op=id parentop=id] - start nested operation
[timestamp] - [conn=id op=id parentop=id] - start time = time ...
...
[timestamp] - [conn=id op=id parentop=id] - end time = time...
[timestamp] - [conn=id op=id parentop=id] - duration = diff end - start
[timestamp] - [conn=id op=id parentop=id] - end nested operation - result -> ...
[timestamp] - [conn=id op=id] - ended internal search '(some=filter)'
...
[timestamp] - [conn=id op=id] - end time = time
[timestamp] - [conn=id op=id] - duration = diff end - start


Due to the structured - blocked nature, there would be no interleaving of 
operation messages. therefor the log would appear as:

[timestamp] - [conn=00 op=00] - start operation
[timestamp] - [conn=00 op=00] - start time = time ...
[timestamp] - [conn=00 op=00] - started internal search '(some=filter)'
[timestamp] - [conn=00 op=00 parentop=01] - start nested operation
[timestamp] - [conn=00 op=00 parentop=01] - start time = time ...
...
[timestamp] - [conn=00 op=00 parentop=01] - end time = time...
[timestamp] - [conn=00 op=00 parentop=01] - duration = diff end - start
[timestamp] - [conn=00 op=00 parentop=01] - end nested operation - result -> ...
[timestamp] - [conn=00 op=00] - ended internal search '(some=filter)'
...
[timestamp] - [conn=00 op=00] - end time = time
[timestamp] - [conn=00 op=00] - duration = diff end - start
[timestamp] - [conn=22 op=00] - start operation
[timestamp] - [conn=22 op=00] - start time = time ...
[timestamp] - [conn=22 op=00] - started internal search '(some=filter)'
[timestamp] - [conn=22 op=00 parentop=01] - start nested operation
[timestamp] - [conn=22 op=00 parentop=01] - start time = time ...
...
[timestamp] - [conn=22 op=00 parentop=01] - end time = time...
[timestamp] - [conn=22 op=00 parentop=01] - duration = diff end - start
[timestamp] - [conn=22 op=00 parentop=01] - end nested operation - result -> ...
[timestamp] - [conn=22 op=00] - ended internal search '(some=filter)'
...
[timestamp] - [conn=22 op=00] - end time = time
[timestamp] - [conn=22 op=00] - duration = diff end - start

An alternate method for structuring could be a machine readable format like 
json:

{
     'timestamp': 'time',
     'duration': ....,
     'bind': 'dn of who initiated operation',
     'events': [
         'debug': 'msg',
         'internal_search': {
              'timestamp': 'time',
              'duration': ....,
         }
     ],
}

This would make writing tools like logconv much easier, as we wouldn't need 
regex or others to parse and understand these logs.


+1

I would just like developers to keep in mind a couple of points

- traceability - the ability to trace operations through all of the various parts of the server, and eventually even system calls/kernel, for use in debugging and performance improvements(identify bottlenecks, etc.) - how to make it easy to trace operations using the log files


- json - json also makes it easy to consume these logs - just about all 
log/event collectors use a single-line json format



-- Should our log interface from slapi_log_err etc change?

This is a super tricky question ... I think the answer is yes - we are using 
log_err for messages that should be associated to an operation - and 
additionally, we often disable or lower the level of error messages due to the 
low performance. It would be hard to shift behaviour and expectations around 
the behaviour of this log, so a better idea would be to fix things oourselves 
by adjusting how we log in plugins. We'll probably always need to support 
log_err, but then those developers would not gain the benefit of our 
improvements.

We could translate log_err to emit simple text messages into the events section 
of the log, but for full benefit, using the operation log interfaces would be 
best.

-- Should we still improve the performance of logs as they exist today?

Probably yes - but I think there is a good reason to approach this differently.

I think we should be log_async.c and log_operation.c in parallel to existing 
logging, and have a similer feature gating flag in use like we have done for 
other features. We can then build this in parallel and gain confidence and 
experience with it. Once complete, we could then change the log_access, 
log_audit and log_err macros/functions to call related log_op calls, or rewrite 
those logs to have log_async.c execution backends.



SUMMARY:

* Make slapi_log_op and an async log system in parallel to existing log infra
* Determine a deprecation or improvement plan for other log types
* Port log types to async, or remove them.



--
Sincerely,

William
_______________________________________________
389-devel mailing list -- 389-devel@lists.fedoraproject.org
To unsubscribe send an email to 389-devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: https://getfedora.org/code-of-conduct.html
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/389-devel@lists.fedoraproject.org


_______________________________________________
389-devel mailing list -- 389-devel@lists.fedoraproject.org
To unsubscribe send an email to 389-devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: https://getfedora.org/code-of-conduct.html
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/389-devel@lists.fedoraproject.org

[389-devel] Re: Logging future direction and ideas.

Reply via email to