Re: [Twisted-Python] logging

2013-06-15 Thread Hynek Schlawack

Am 15.06.2013 um 08:18 schrieb Christopher Armstrong ra...@twistedmatrix.com:

 As far as actual *proposals* go, I have these ones, that are all independent:
 
 1. include all keyword arguments in log output without requiring
 specifying the formatting in the string
 2. make it really easy to specify the system
 3. stop affecting the system of application code based on the
 framework code that's running the application code (i.e., don't use
 callWithContext to specify the system any more)
 
 Of these, I think #2 and #3 have the most benefit; I can do #1 with my
 own logging statements just fine, and while IMO it'd be nice if the
 whole world adopted a nice identifier-based system, the lion's share
 of the benefit comes from my use of it consistently in the
 application's codebase.
 
 This conversation has gotten pretty sprawling; time to reel it in with
 some code.
 
 What do you think of this for an API that meets in the middle?
 
 https://gist.github.com/radeex/5787124
 
 This example implementation only concerns itself with the points under
 debate right now; obviously it's completely unusable in general. But
 anyway:
 
 1. it supports English logs
 2. it doesn't require you to specify a formatting if you want to just
 log a bunch of data
 3. it makes it easy to specify a system (both manually and based on
 the module your code is in)

I’ve held back from this discussion so far because it seemed to me that I 
always missed some part of the discussion to fully understand what you’re all 
talking about. I would like to comment on this concrete proposal though before 
I hold my peace forever. (NB I’m not replying just to Christopher but try to 
address everything I saw on the thread so far – I like most of his proposal.)

I find that there’s some kind of false dichotomy brought up in this discussion 
and API and output are somewhat muddled together a bit (maybe I’m just 
misunderstanding though – that’s why I didn’t comment until now).

I personally like my logs 100% structured (except for Exceptions) and still be 
able to “comment” on events in plain English if I need to.

And I don’t see why comments/events should be special case on output (square 
brackets in this example). If you have an event called user_error, you can 
always add a key called error for another “symbol” or just an error_msg if you 
insist on English. When looking for a certain type of user_error, you simply 
write an AND clause in your logging software (Kibana or whatever). It’s pretty 
easy to keep *that* consistent across applications.

For example, *my* log would look like this:

event=user_error peer=127.0.0.1 error=pebkac

If the programmer in question hadn’t enough logging pain in their life to see 
that’s reasonable, they can always do:

event=user_error peer=127.0.0.1 error_msg='Problem exists between keyboard and 
chair.'

Still perfectly parseable, perfectly readable. And with {!r} easy to achieve. A 
nice API I would like to have be:

log('user_error', peer=self.transport.getPeer().host, error_msg='Problem exists 
between keyboard and chair.') – and log figures out itself if it needs to 
quote. I could also live with them all quoted, i.e.:

event='user_error' peer='127.0.0.1' error='pebkac'

to have less special cases.


I hope that makes some sense, the point I’m trying to make that events don’t 
need to be distinct by themselves. If you enforce that, you’re forcing 
structure on them which you could spread over multiple fields that are *much* 
more pleasant to parse.

Regards,
Hynek

P.S. For convenience, I usually write a log method in Twisted protocols that 
prepends the messages with state data, peer IP etc, but that JFTR.
___
Twisted-Python mailing list
Twisted-Python@twistedmatrix.com
http://twistedmatrix.com/cgi-bin/mailman/listinfo/twisted-python


Re: [Twisted-Python] logging

2013-06-15 Thread Glyph

On Jun 14, 2013, at 10:43 PM, Christopher Armstrong ra...@twistedmatrix.com 
wrote:

 I don't think it's worth coming up with some kind of GUID-based
 system, because I don't think anyone's going to go to the trouble to
 use it, and I think it basically offers no practical benefit over
 simple event names.

Here are the benefits over simple event names:

It's opaque, so as code moves around, it doesn't look dated or wrong.  Your 
'system' argument looks a whole lot like a FQPN, but as you keep saying, code 
moves around a lot.  Having a *second* dotted identifier that looks a lot like 
the module name is going to look a lot like cruft to someone when it stops 
matching a module name, they're going to be motivated to clean it up, and 
then the cleaned up version is going to break everyone's logging 
instrumentation.
Since UUIDs can be derived from other information, you can start off without 
one and add one later.  If you didn't feel like specifying one in the first 
place with your log messages, then you're stuck until the monitoring and 
application code that comes to some agreement.

My view of our present disagreement seems to come down to this: I believe that 
nobody is going to bother to do logging right in the first place (particularly, 
they are not going to take the time to specify or enumerate the types of events 
that are interesting that their system emits), and will only realize they need 
to extract interesting stuff from their logs later.  I think we need to provide 
the best possible support for the developer later (the opaque UUID they can 
associate with an API symbol) and the monitoring folks now (the derived UUID 
that they can use in place of a gross and broken-in-the-future regex).

This doesn't mean that the UUID magically makes monitoring always work in the 
future; developers can still change their format strings willy-nilly.  But, it 
at least provides a mechanism that they *could* use to avoid breakage if 
everyone believes it should be avoided.

I understand your point to be that you think that people should, and that they 
therefore will, go to the trouble to categorize every thing that gets logged as 
they're writing the logging stuff.

In support of my argument, I offer as evidence the unfortunate mess that is 
Twisted's current ad-hoc usage of logging, plus the existence of Splunk, a 
publicly-traded company whose whole reason for existing is that they can run 
regexes over poorly-structured logs to extract useful information and 
statistics from them :-).

To be fair, one element in support of your argument is that you managed to 
write a system that used this idiom and it worked well in practice.  I'm sure 
that it did, and I think it's a good thing for people to adopt.

My concern is that lots of useful log-worthy events are coming out of an 
open-source library that won't be exposed to the discipline that a particular 
team adopts for logging.

Finally, it's worth noting that GUID-based identifiers and textual, 
hierarchical identifiers are not mutually exclusive.  You can have 
log.info(log_guid=98185C94-57D7-439A-8206-1D4A2ACBD983, 
log_system=spaceships.combat.explosion) in a single event.  We could provide 
both tools to the developer, and they're not obliged to use either.

(I see you've written some other messages, so more forthcoming...)

-glyph

___
Twisted-Python mailing list
Twisted-Python@twistedmatrix.com
http://twistedmatrix.com/cgi-bin/mailman/listinfo/twisted-python


Re: [Twisted-Python] logging

2013-06-15 Thread Glyph

On Jun 14, 2013, at 10:48 PM, Christopher Armstrong ra...@twistedmatrix.com 
wrote:

 There's... a lot here. I'm overwhelmed by how complex this system
 would be, and I think that end users would be as well. I don't really
 want to put UUIDs into my source code, whether I type them or paste
 them, and I don't want to have to invoke command line tools to figure
 out what those UUIDs in order to be a good developer.

The system isn't really that complex at all.  It's calling uuid.uuid5 once, and 
it's even doing that internally in a utility function, so you don't need to 
understand what it's doing.

 I think there's basically no practical benefit over the hierarchical
 system + event-identifier system, where:

I tried to lay out the benefits in my other recent reply.

 1. it's trivial to specify a hierarchical, easy-to-read system key that has 
 a small-ish scope

The fact that you have to say trivial here suggests it's not actually trivial 
:).  I think that the amount of energy people initially put into logging is so 
low that they will often forget to do something like this.  But, even if they 
remember in the future, we have to deal with the plethora of messages already 
within Twisted, and every Twisted library, that *don't* do this.

 2. you think of a unique event name in that small-ish scope and type it out.

How do you know that your event name is unique though, other than running 
'grep'?

 I'm beginning to think #1 is the most important of all, but I think
 I'll continue to use event-names to describe all my log statements.

This might be a good practice anyway.

 As I said in my response to Wilfredo, I think it's really important to
 stop specifying the system for application code in Twisted. The
 application's code should not be considered a part of the HTTP
 protocol's system. There are also other problems with the way we use
 system, like putting extra random data like request ID in there.


The system that Twisted currently specifies is supposed to be the reactor 
event which caused a particular chunk of code to run.  This is not related to 
the subsystem within your application where the functionality that is logging 
is implemented (which, really, still sounds to me like module name).  I think 
this could be quite useful, especially in systems where the same functionality 
is exposed over multiple protocols and disparate events could cause the same 
subsystem to eventually get called.

However, there are a ton of problems with the way that's implemented; I can't 
really argue with you here about the current implementation, because it's so 
poorly done that I think it's next to useless.  With Failure's extended 
exception caught here stacks, it doesn't provide any information other than 
the traceback.

However, I'd like to bring it back at some point, especially if we can figure 
out how to have the causality of an event.  Wouldn't it be fantastic to see a 
trace like this when something went wrong?

SomeProtocol at 0x.dataReceived
  called callLater(x)
  called callInThread(y)
  which called callFromThread(z)
  and that's why your code is running now


___
Twisted-Python mailing list
Twisted-Python@twistedmatrix.com
http://twistedmatrix.com/cgi-bin/mailman/listinfo/twisted-python


Re: [Twisted-Python] 13.1.0 and #6499

2013-06-15 Thread Laurens Van Houtven
This was resolved :)

cheers
lvh
___
Twisted-Python mailing list
Twisted-Python@twistedmatrix.com
http://twistedmatrix.com/cgi-bin/mailman/listinfo/twisted-python


Re: [Twisted-Python] logging

2013-06-15 Thread Phil Mayers

On 06/15/2013 06:48 AM, Christopher Armstrong wrote:


There's... a lot here. I'm overwhelmed by how complex this system
would be, and I think that end users would be as well. I don't really
want to put UUIDs into my source code, whether I type them or paste
them, and I don't want to have to invoke command line tools to figure
out what those UUIDs in order to be a good developer.


+1. I like the general idea of better logging via keyword args and an 
id, but I strongly dislike the idea of using UUIDs as log message 
identifiers.


In fact, I'll go so far as to say I defintely wouldn't use such a 
system, and for Twisted to migrate to it would actually reduce the 
utility of the in-built logging, because I'd have to build and maintain 
something to translate Twisted log messages I cared about into whatever 
system I built to do my own logging :o(



I think there's basically no practical benefit over the hierarchical
system + event-identifier system, where:


+1. Hierarchial IDs also permit prefix-based wildcarding:

log.observe(twisted.factory.*)
log.exclude(twisted.factory.http.*)
log.observe(myapp.thing)
log.observe(txSomeProject.event.*)

Obviously this is just an example; I haven't put any real thought into it.



1. it's trivial to specify a hierarchical, easy-to-read system key
that has a small-ish scope
2. you think of a unique event name in that small-ish scope and type it out.

I'm beginning to think #1 is the most important of all, but I think
I'll continue to use event-names to describe all my log statements.


I pretty much agree with all of this.

___
Twisted-Python mailing list
Twisted-Python@twistedmatrix.com
http://twistedmatrix.com/cgi-bin/mailman/listinfo/twisted-python


Re: [Twisted-Python] logging

2013-06-15 Thread Glyph

On Jun 15, 2013, at 3:55 AM, Phil Mayers p.may...@imperial.ac.uk wrote:

 On 06/15/2013 06:48 AM, Christopher Armstrong wrote:
 
 There's... a lot here. I'm overwhelmed by how complex this system
 would be, and I think that end users would be as well. I don't really
 want to put UUIDs into my source code, whether I type them or paste
 them, and I don't want to have to invoke command line tools to figure
 out what those UUIDs in order to be a good developer.
 
 +1. I like the general idea of better logging via keyword args and an id, 
 but I strongly dislike the idea of using UUIDs as log message identifiers.
 
 In fact, I'll go so far as to say I defintely wouldn't use such a system, and 
 for Twisted to migrate to it would actually reduce the utility of the 
 in-built logging, because I'd have to build and maintain something to 
 translate Twisted log messages I cared about into whatever system I built to 
 do my own logging :o(

I am really, really puzzled by this reaction.  I am wondering if you read my 
message carefully, or if I didn't express myself well.

If I were to implement the system that I have proposed, you could completely 
ignore it.  You could never deal with a UUID ever, and process logs based on 
whatever other attributes you like.  There's nothing to translate.  Unless when 
you say translate you mean translating the UUID of a currently ad-hoc message 
(one which doesn't specify or document its own unique attributes) which is 
something that would just be impossible without this feature.

The only overhead that you'd ever incur would be a single call per log event - 
probably per-process-lifetime, since the result could be cached - to 
uuid.uuid5, which is not exactly the most expensive function.  It would be 
called internally though; you wouldn't have to call it yourself or be aware 
that it was being called.

If you had the use-case that I believe many people do - which is to evolve a 
stable identifier for a previously ad-hoc log message - then you could do that. 
 Otherwise this wouldn't affect you.

What I've proposed with UUIDs is to identify *specific semantic events* that 
you might want to do monitoring/alerting on, which may move between systems.

Does this explanation make you feel less worried about the inclusion of such a 
feature?

 I think there's basically no practical benefit over the hierarchical
 system + event-identifier system, where:
 
 +1. Hierarchial IDs also permit prefix-based wildcarding:
 
 log.observe(twisted.factory.*)
 log.exclude(twisted.factory.http.*)
 log.observe(myapp.thing)
 log.observe(txSomeProject.event.*)

This is already implemented in 
http://trac.calendarserver.org/browser/CalendarServer/trunk/twext/python/log.py;
 see for example setLogLevelForNamespace() 
http://trac.calendarserver.org/browser/CalendarServer/trunk/twext/python/log.py#L175.

Namespaces are just python module names by default, but nothing requires that; 
the only requirement is that they're strings separated with dots.  You can 
instantiate a Logger object with whatever 'namespace' argument you want.

So: again, you can get what you want exactly and ignore the other stuff.

 1. it's trivial to specify a hierarchical, easy-to-read system key
 that has a small-ish scope
 2. you think of a unique event name in that small-ish scope and type it out.
 
 I'm beginning to think #1 is the most important of all, but I think
 I'll continue to use event-names to describe all my log statements.
 
 I pretty much agree with all of this.

I am still not seeing the utility of one of these identifiers beyond 
class/method name, but you can always just pass the namespace explicitly.

-glyph___
Twisted-Python mailing list
Twisted-Python@twistedmatrix.com
http://twistedmatrix.com/cgi-bin/mailman/listinfo/twisted-python


Re: [Twisted-Python] logging

2013-06-15 Thread Glyph

On Jun 14, 2013, at 11:18 PM, Christopher Armstrong ra...@twistedmatrix.com 
wrote:

 This conversation has gotten pretty sprawling; time to reel it in with
 some code.
 
 What do you think of this for an API that meets in the middle?
 
 https://gist.github.com/radeex/5787124
 
 This example implementation only concerns itself with the points under
 debate right now; obviously it's completely unusable in general. But
 anyway:
 
 1. it supports English logs
 2. it doesn't require you to specify a formatting if you want to just
 log a bunch of data
 3. it makes it easy to specify a system (both manually and based on
 the module your code is in)


First off, thanks again for taking the time to correspond about this, I'm very 
happy for this conversation to be attracting attention and making progress :).

Moving right along...

This code is relying on the semantics of the existing log.msg() call (which are 
bad, for the reasons I enumerated in the start of this thread), right down to 
the dependence on %-formatting.  The main thing I'm interested in at the moment 
is actually fixing those semantics to be more useful; the specific thing we're 
talking about here, persistent log identifiers, is interesting to me at the 
moment mostly as it relates to that.

I realize you might just be relying on it for familiarity, you might not have 
had time to read the whole new thing, so I'm not trying to tear down your idea 
based on that alone :).

But Wilfredo already implemented a thing which addresses the 
namespacing-of-log-messaages issue, and it does about 2/3 of what your proposal 
already does (except it has, like, tests and stuff).  Right down to the 
_getframe() hack (although  
http://trac.calendarserver.org/browser/CalendarServer/trunk/twext/python/log.py#L312
 uses inspect.currentframe() which has a public-sounding name at least :)).  As 
I mentioned in my immediately-previous reply, the logger's 'namespace' does not 
need to be the same as your module name, so it can be a more persistent 
identifier if you have one.

More significantly, you're actually going to the trouble of constructing a 
format string.  This produces output in an ad-hoc format: '='-separated str()s 
of your log keys (although only the ones you pass directly in your message; 
those annotating or augmenting the message at subsequent points by other 
publishers / observers are ignored), which you are presumably relying upon 
being smooshed into a plain-text log file, right along side many other messages 
which do not fit this format.  Presumably this could lead to issues like 
traceback lines with '=' assignments in them showing up in whatever kind of 
ad-hoc log-analysis you want to do later, and ambiguous unparseable results if 
any fields in the log message have a '=' and a space in their repr.

Let's not make it the responsibility of the emitter, or the format string, to 
get those values into a text log file that can be scanned with a regex (albeit 
in the case of your proposal, a much-simplified regex).  Let's push that 
responsibility down into the log observer, and actually save it as structured 
data and *parse* it later.

As all the hip young kids will tell you, this type of data should be persisted 
with JSON, but we should probably have code to support multiple formats.  
Another advantage of serializing and parsing the logs as structured data would 
allow tools written to work on a live log observer to also analyze persistent 
logs in at least a similar way, if not entirely the same way.

(An object that has survived the grim ordeal of translation to and from JSON 
will surely be made less by the process, and can't be treated quite the same 
way by a log observer; but, as I'm sure you'll agree, this is probably better 
than the alternative of Pickle.  An unpickled object will be brought fully back 
to life, but, in the words of the recent NIN release, it will have came back 
haunted.)

Maybe we could even do a SQLite observer.  Honestly I'm not sure which format 
is best.

Since it's also important to be able to *filter* logs, and avoid the work of 
formatting or serializing them at all, it would be best to push this work as 
far down the line as possible.  The construction of the format string is itself 
a whole ton of string-copying, the sort of thing I'd really like to avoid.  
(Last I measured, dictionary operations that don't cause a reallocation - and 
most don't, unless you're inserting in a long loop - are cheaper than even 
apparently small string copies.)

In fact, it strikes me that perhaps *this* is the feature we should be talking 
about implementing: writing a structured log observer and parser, firming up 
the contract between emitters and observers so it's clear what to do with data 
that can't be serialized - more than exactly which fields we use to identify 
messages that are interesting to us.

Relatedly, we could make log analysis a first-class task; instead of assuming 
someone will grep the log file 

Re: [Twisted-Python] logging

2013-06-15 Thread Christopher Armstrong
On Sat, Jun 15, 2013 at 2:18 AM, Hynek Schlawack h...@ox.cx wrote:

 Am 15.06.2013 um 08:18 schrieb Christopher Armstrong 
 ra...@twistedmatrix.com:
 What do you think of this for an API that meets in the middle?

 https://gist.github.com/radeex/5787124

 This example implementation only concerns itself with the points under
 debate right now; obviously it's completely unusable in general. But
 anyway:

 1. it supports English logs
 2. it doesn't require you to specify a formatting if you want to just
 log a bunch of data
 3. it makes it easy to specify a system (both manually and based on
 the module your code is in)

 I’ve held back from this discussion so far because it seemed to me that I 
 always missed some part of the discussion to fully understand what you’re all 
 talking about. I would like to comment on this concrete proposal though 
 before I hold my peace forever. (NB I’m not replying just to Christopher but 
 try to address everything I saw on the thread so far – I like most of his 
 proposal.)

 I find that there’s some kind of false dichotomy brought up in this 
 discussion and API and output are somewhat muddled together a bit (maybe I’m 
 just misunderstanding though – that’s why I didn’t comment until now).

You're right. I didn't make a strong enough point about the fact that
the output formatting isn't important, but I assumed everyone already
knew that. I should have been more clear.


 I personally like my logs 100% structured (except for Exceptions) and still 
 be able to “comment” on events in plain English if I need to.

 And I don’t see why comments/events should be special case on output (square 
 brackets in this example). If you have an event called user_error, you can 
 always add a key called error for another “symbol” or just an error_msg if 
 you insist on English. When looking for a certain type of user_error, you 
 simply write an AND clause in your logging software (Kibana or whatever). 
 It’s pretty easy to keep *that* consistent across applications.

 For example, *my* log would look like this:

 event=user_error peer=127.0.0.1 error=pebkac

The special formatting in the example I gave was only intended for the
dumb file-based format. I thought it was just a nice touch on the spur
of the moment. In practice, I would use this for local logging, but
then set up a log observer that passed raw key/value data to my real
log aggregation/storage/filtering system. I'm making a distinction
between what we show our users (in a twistd.log file) and what we give
to our automated systems (sent over network protocols to logging
systems).

 If the programmer in question hadn’t enough logging pain in their life to see 
 that’s reasonable, they can always do:

 event=user_error peer=127.0.0.1 error_msg='Problem exists between keyboard 
 and chair.'

 Still perfectly parseable, perfectly readable. And with {!r} easy to achieve. 
 A nice API I would like to have be:

 log('user_error', peer=self.transport.getPeer().host, error_msg='Problem 
 exists between keyboard and chair.') – and log figures out itself if it needs 
 to quote. I could also live with them all quoted, i.e.:

 event='user_error' peer='127.0.0.1' error='pebkac'

 to have less special cases.

It sounds like you're arguing that the human-readable *.log format
should be closer to the simple key/value representation that we use
underneath.

Would you also argue that instead of having a log line that looks like:

2013-06-15 10:24:21-0500 [-] Server Shut Down.

We should actually format them (in twistd.log) like this?

time=1371331461.0 system='-' msg='Server Shut Down.'

 I hope that makes some sense, the point I’m trying to make that events don’t 
 need to be distinct by themselves. If you enforce that, you’re forcing 
 structure on them which you could spread over multiple fields that are *much* 
 more pleasant to parse.

So, if I had my own way, event would be a required argument to the
log() function in my example, so that the only thing I'm forcing on
the structure of a log message is that you *have* an event argument.

But I'm pretty sure that's not going to fly. :-) So the only reason I
wanted to support event in a special way in my example is to
emphasize and encourage its use.

 Hynek


--
Christopher Armstrong
http://radix.twistedmatrix.com/
http://planet-if.com/

___
Twisted-Python mailing list
Twisted-Python@twistedmatrix.com
http://twistedmatrix.com/cgi-bin/mailman/listinfo/twisted-python


Re: [Twisted-Python] logging

2013-06-15 Thread Christopher Armstrong
On Sat, Jun 15, 2013 at 6:37 AM, Glyph gl...@twistedmatrix.com wrote:

 On Jun 14, 2013, at 11:18 PM, Christopher Armstrong
 ra...@twistedmatrix.com wrote:

 This conversation has gotten pretty sprawling; time to reel it in with
 some code.

 What do you think of this for an API that meets in the middle?

 https://gist.github.com/radeex/5787124

 This example implementation only concerns itself with the points under
 debate right now; obviously it's completely unusable in general. But
 anyway:

 1. it supports English logs
 2. it doesn't require you to specify a formatting if you want to just
 log a bunch of data
 3. it makes it easy to specify a system (both manually and based on
 the module your code is in)


 First off, thanks again for taking the time to correspond about this, I'm
 very happy for this conversation to be attracting attention and making
 progress :).

 Moving right along...

 This code is relying on the semantics of the existing log.msg() call (which
 are bad, for the reasons I enumerated in the start of this thread), right
 down to the dependence on %-formatting.  The main thing I'm interested in at
 the moment is actually fixing those semantics to be more useful; the
 specific thing we're talking about here, persistent log identifiers, is
 interesting to me at the moment mostly as it relates to that.

 I realize you might just be relying on it for familiarity, you might not
 have had time to read the whole new thing, so I'm not trying to tear down
 your idea based on that alone :).

Yeah... I feel like a lot of the things you say in this email aren't
related to the point I was trying to get across, which was the
interface and effect, not the implementation of it.



 But Wilfredo already implemented a thing which addresses the
 namespacing-of-log-messaages issue, and it does about 2/3 of what your
 proposal already does (except it has, like, tests and stuff).  Right down to
 the _getframe() hack (although
 http://trac.calendarserver.org/browser/CalendarServer/trunk/twext/python/log.py#L312
 uses inspect.currentframe() which has a public-sounding name at least :)).
 As I mentioned in my immediately-previous reply, the logger's 'namespace'
 does not need to be the same as your module name, so it can be a more
 persistent identifier if you have one.

 More significantly, you're actually going to the trouble of constructing a
 format string.  This produces output in an ad-hoc format: '='-separated
 str()s of your log keys (although only the ones you pass directly in your
 message; those annotating or augmenting the message at subsequent points by
 other publishers / observers are ignored), which you are presumably relying
 upon being smooshed into a plain-text log file, right along side many other
 messages which do not fit this format.  Presumably this could lead to issues
 like traceback lines with '=' assignments in them showing up in whatever
 kind of ad-hoc log-analysis you want to do later, and ambiguous unparseable
 results if any fields in the log message have a '=' and a space in their
 repr.

 Let's not make it the responsibility of the emitter, or the format string,
 to get those values into a text log file that can be scanned with a regex
 (albeit in the case of your proposal, a much-simplified regex).  Let's push
 that responsibility down into the log observer, and actually save it as
 structured data and *parse* it later.

 As all the hip young kids will tell you, this type of data should be
 persisted with JSON, but we should probably have code to support multiple
 formats.  Another advantage of serializing and parsing the logs as
 structured data would allow tools written to work on a live log observer
 to also analyze persistent logs in at least a similar way, if not entirely
 the same way.

 (An object that has survived the grim ordeal of translation to and from JSON
 will surely be made less by the process, and can't be treated quite the same
 way by a log observer; but, as I'm sure you'll agree, this is probably
 better than the alternative of Pickle.  An unpickled object will be brought
 fully back to life, but, in the words of the recent NIN release, it will
 have came back haunted.)

 Maybe we could even do a SQLite observer.  Honestly I'm not sure which
 format is best.

So, I responded to Hynek about this confusion already, so I'll let you
read that. I have always assumed that twistd.log would continue to be
vaguely human-readable, with special formatting for several of the
keys. I also assume that developers will be able to hook up additional
observers to save out .concatenated-json files, or stream to some
network protocol, or whatever. The little bit of formatting my example
does is just a minor convenience for the people still logging at
twistd.log files. The arguments you're making now sounds ilke you want
to et rid of that ad-hoc formatting; is that true? So, instead of


2013-06-15 10:24:21-0500 [-] Server Shut Down.

we should have this?

time=1371331461.0 

Re: [Twisted-Python] logging

2013-06-15 Thread Christopher Armstrong
On Thu, Jun 13, 2013 at 5:14 PM, Glyph gl...@twistedmatrix.com wrote:
 Hello everybody. Today I'd like to talk about everyone's favorite subsystem
 within Twisted: logging.

 There are a number of problems with Twisted's logging system.

 The general theme of these problems is that Twisted log system was designed
 with some intended use-cases in mind, but its implementation was stopped
 halfway through and none of those use-cases were really well satisfied.

 At various conferences, and in person at various times, we've gotten
 together to express frustration at the logging system, but I don't think
 that the problems with it have ever been really thoroughly written down.

 Wilfredo has been working on logging here:
 http://trac.calendarserver.org/browser/CalendarServer/trunk/twext/python/log.py.
 This began as an attempt to just address some issues with our own logging
 wrappers, but has been growing into a design that is more fully capable of
 replacing logging within Twisted wholesale. I hope one of us can contribute
 it to Twisted soon, but in the meanwhile, please feel free to provide
 feedback - hopefully if some people can provide a bit of informal feedback
 now, the formal review process can be shorter and more pleasant :).

I like it. Let's merge it.


--
Christopher Armstrong
http://radix.twistedmatrix.com/
http://planet-if.com/

___
Twisted-Python mailing list
Twisted-Python@twistedmatrix.com
http://twistedmatrix.com/cgi-bin/mailman/listinfo/twisted-python


Re: [Twisted-Python] logging

2013-06-15 Thread Glyph

On Jun 15, 2013, at 9:17 AM, Christopher Armstrong ra...@twistedmatrix.com 
wrote:

 I think I would just want one more feature:
 
 Logger().emit(dudelog.LogLevel.warn, a=hi)
 
 Right now, this doesn't spit out anything into my log file, with
 default observers. I want it to emit some simple representation of the
 keys and values.
 
 On the other hand, the same is true of log.msg(a=hi), but maybe this
 should be implemented (somehow) in Logger() codepath to avoid breaking
 compatibility?

I keep going back and forth on this.

On the one hand, the main thing I want to do is to just say if you want your 
message to appear in the log file, let's just implement a structured log file 
and you can read it from there.  If you have no human-readable format string to 
explain your message, clearly you didn't intend for a human to read it.

On the other hand, we already implement log filtering, so filtering the message 
out of the log based on the absence of the format string is a little 
overloaded.  And we should really be encouraging people to populate messages 
with useful structured data.

(I was going to say that without a format string I couldn't have my stand-in 
UUID feature, but then I realized that namespace + set of keys is probably good 
enough to generate that too, so never mind.  Also it seems like I'm the only 
one who likes that feature so maybe it doesn't matter!)

How would you feel about a special token that you have to pass explicitly?

Logger().warn(ALL_KEYS, a=hi)

-glyph

___
Twisted-Python mailing list
Twisted-Python@twistedmatrix.com
http://twistedmatrix.com/cgi-bin/mailman/listinfo/twisted-python


Re: [Twisted-Python] logging

2013-06-15 Thread Christopher Armstrong
On Sat, Jun 15, 2013 at 9:38 PM, Glyph gl...@twistedmatrix.com wrote:


 On Jun 15, 2013, at 9:17 AM, Christopher Armstrong 
 ra...@twistedmatrix.com wrote:

 I think I would just want one more feature:

 Logger().emit(dudelog.LogLevel.warn, a=hi)


 Right now, this doesn't spit out anything into my log file, with
 default observers. I want it to emit some simple representation of the
 keys and values.

 On the other hand, the same is true of log.msg(a=hi), but maybe this
 should be implemented (somehow) in Logger() codepath to avoid breaking
 compatibility?


 I keep going back and forth on this.

 On the one hand, the main thing I want to do is to just say if you want
 your message to appear in the log file, let's just implement a structured
 log file and you can read it from there.  If you have no human-readable
 format string to explain your message, clearly you didn't intend for a
 human to read it.

 On the other hand, we already implement log filtering, so filtering the
 message out of the log based on the absence of the format string is a
 little overloaded.  And we should really be encouraging people to populate
 messages with useful structured data.

 (I was going to say that without a format string I couldn't have my
 stand-in UUID feature, but then I realized that namespace + set of keys is
 probably good enough to generate that too, so never mind.  Also it seems
 like I'm the only one who likes that feature so maybe it doesn't matter!)

 How would you feel about a special token that you have to pass explicitly?

 Logger().warn(ALL_KEYS, a=hi)


I'm not wild about the need for an extra constant I'd have to import, but
I'd like to understand why you think the arbitrary keys shouldn't be
included in the message. Can you elaborate on why?


-- 
Christopher Armstrong
http://radix.twistedmatrix.com/
http://planet-if.com/
___
Twisted-Python mailing list
Twisted-Python@twistedmatrix.com
http://twistedmatrix.com/cgi-bin/mailman/listinfo/twisted-python