Re: [systemd-devel] RFC: filter and search journalctl

2015-08-18 Thread Anne Mulhern




- Original Message -
 From: Anne Mulhern amulh...@redhat.com
 To: systemd-devel@lists.freedesktop.org
 Sent: Monday, August 17, 2015 11:34:10 AM
 Subject: Re: [systemd-devel] RFC: filter and search journalctl
 
 
 
 
 
 - Original Message -
  From: Zbigniew Jędrzejewski-Szmek zbys...@in.waw.pl
  To: Anne Mulhern amulh...@redhat.com
  Cc: systemd-devel@lists.freedesktop.org, Sebastian Schindler
  sebastian.schind...@travelping.com
  Sent: Monday, August 17, 2015 10:45:11 AM
  Subject: Re: [systemd-devel] RFC: filter and search journalctl
  
  On Mon, Aug 17, 2015 at 10:24:22AM -0400, Anne Mulhern wrote:
   
   
   
   
   - Original Message -
From: Zbigniew Jędrzejewski-Szmek zbys...@in.waw.pl
To: Sebastian Schindler sebastian.schind...@travelping.com
Cc: systemd-devel@lists.freedesktop.org
Sent: Saturday, August 8, 2015 3:48:30 PM
Subject: Re: [systemd-devel] RFC: filter and search journalctl

On Fri, Aug 07, 2015 at 11:53:13AM +0200, Sebastian Schindler wrote:
 Grep-ing seems to be the only solution to find log entries if you
 don't
 fully
 know what you're looking for. For example: You want to see all
 entries
 containing a certain MESSAGE that gets enriched with additional
 information
 during the logging process:
 
 MESSAGE=host HOST has closed connection CONNECTION_ID
This is a bit contentious, but at least I would like to see some
grep functionality implemented directly in journalctl.

   
   I am late to the party, but I think it is obvious that the right way
   for
   this
   to be achieved, in a perfect world, is that this log entry be accompanied
   by a MESSAGE_ID, and HOST and CONNECTION_ID keys, and a catalog entry
   that
   combined
   with the keys, generates the above message so that grepping is entirely
   unnecessary.
   
   It is true that this perfect world is not just around the corner, or
   anything like that,
   but it is technically possible.
   
   I agree that grepping would be handy for me, right now, for just the
   reasons stated
   in the original message.
   
   I wonder if it would be reasonable for journalctl to supply the
   (additional) fields that are
   guaranteed to be associated with a MESSAGE_ID
  And what what happen when the entry is malformed, i.e. missing some
  fields?
  Would journald reject the message? I don't think this would be useful to
  anyone at all. Instead the readers of the message should gracefully adapt
  to missing fields.
  
 
 I think it would be wrong for journald to reject a message that does not
 supply
 all the declared fields. It would also be wrong for journalctl to crash when
 given the
 --catalog flag if the fields are missing. I don't know what it does right
 now, because it is not that easy a situation to engineer, AFAICT. I guess the
 best thing would be to supply a special catalog message indicating that an
 error had occurred when trying to construct a catalog message. Something
 that indicated the fields that were missing that caused the error would be
 nice.
 Just so long as that didn't turn into an infinite loop, somehow. If somebody
 knows what journalctl does do in this situation, please pass that information
 along.
 
Re-reading the docs, I realize that the information is right there in plain 
sight.
If the value is not defined, the variable name, minus the @ signs, is displayed.
Nice and simple.

But now, I wonder what happens for fields like _UDEV_DEVLINK, which can be set
0, 1, or many times. If unset, the variable name minus the @ signs is displayed.
If set once, the value is substituted. If set twice?

-- SNIP --

 
 ___
 systemd-devel mailing list
 systemd-devel@lists.freedesktop.org
 http://lists.freedesktop.org/mailman/listinfo/systemd-devel
 

- mulhern
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] RFC: filter and search journalctl

2015-08-18 Thread Zbigniew Jędrzejewski-Szmek
On Tue, Aug 18, 2015 at 09:00:27AM -0400, Anne Mulhern wrote:
 Re-reading the docs, I realize that the information is right there in plain 
 sight.
 If the value is not defined, the variable name, minus the @ signs, is 
 displayed.
 Nice and simple.
 
 But now, I wonder what happens for fields like _UDEV_DEVLINK, which can be set
 0, 1, or many times. If unset, the variable name minus the @ signs is 
 displayed.
 If set once, the value is substituted. If set twice?
One value is used. Which one is pseudorandom.

Zbyszek
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] RFC: filter and search journalctl

2015-08-17 Thread Zbigniew Jędrzejewski-Szmek
On Mon, Aug 17, 2015 at 10:24:22AM -0400, Anne Mulhern wrote:
 
 
 
 
 - Original Message -
  From: Zbigniew Jędrzejewski-Szmek zbys...@in.waw.pl
  To: Sebastian Schindler sebastian.schind...@travelping.com
  Cc: systemd-devel@lists.freedesktop.org
  Sent: Saturday, August 8, 2015 3:48:30 PM
  Subject: Re: [systemd-devel] RFC: filter and search journalctl
  
  On Fri, Aug 07, 2015 at 11:53:13AM +0200, Sebastian Schindler wrote:
   Grep-ing seems to be the only solution to find log entries if you don't
   fully
   know what you're looking for. For example: You want to see all entries
   containing a certain MESSAGE that gets enriched with additional 
   information
   during the logging process:
   
   MESSAGE=host HOST has closed connection CONNECTION_ID
  This is a bit contentious, but at least I would like to see some
  grep functionality implemented directly in journalctl.
  
 
 I am late to the party, but I think it is obvious that the right way for 
 this
 to be achieved, in a perfect world, is that this log entry be accompanied
 by a MESSAGE_ID, and HOST and CONNECTION_ID keys, and a catalog entry that 
 combined
 with the keys, generates the above message so that grepping is entirely
 unnecessary.
 
 It is true that this perfect world is not just around the corner, or anything 
 like that,
 but it is technically possible.
 
 I agree that grepping would be handy for me, right now, for just the reasons 
 stated
 in the original message.
 
 I wonder if it would be reasonable for journalctl to supply the (additional) 
 fields that are
 guaranteed to be associated with a MESSAGE_ID
And what what happen when the entry is malformed, i.e. missing some fields?
Would journald reject the message? I don't think this would be useful to
anyone at all. Instead the readers of the message should gracefully adapt
to missing fields.

...
 Is it reasonable to preface any MESSAGE_ID
 specific keys with the MESSAGE_ID, e.g.,
 9bb33380-fbfa-4d5b-88b5-6e6bb8a39124:KEY? Or perhaps a double underscore, 
 e.g.,
 __KEY would do the trick?
MESSAGE_ID is a contrace between the writers of the message and the readers of
the message. The first say: messages with this ID mean ... and have have the
fields ... . There is no need to mark the fields in any other way,
except by documentation or custom.

Zbyszek
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] RFC: filter and search journalctl

2015-08-17 Thread Anne Mulhern




- Original Message -
 From: Zbigniew Jędrzejewski-Szmek zbys...@in.waw.pl
 To: Sebastian Schindler sebastian.schind...@travelping.com
 Cc: systemd-devel@lists.freedesktop.org
 Sent: Saturday, August 8, 2015 3:48:30 PM
 Subject: Re: [systemd-devel] RFC: filter and search journalctl
 
 On Fri, Aug 07, 2015 at 11:53:13AM +0200, Sebastian Schindler wrote:
  Grep-ing seems to be the only solution to find log entries if you don't
  fully
  know what you're looking for. For example: You want to see all entries
  containing a certain MESSAGE that gets enriched with additional information
  during the logging process:
  
  MESSAGE=host HOST has closed connection CONNECTION_ID
 This is a bit contentious, but at least I would like to see some
 grep functionality implemented directly in journalctl.
 

I am late to the party, but I think it is obvious that the right way for this
to be achieved, in a perfect world, is that this log entry be accompanied
by a MESSAGE_ID, and HOST and CONNECTION_ID keys, and a catalog entry that 
combined
with the keys, generates the above message so that grepping is entirely
unnecessary.

It is true that this perfect world is not just around the corner, or anything 
like that,
but it is technically possible.

I agree that grepping would be handy for me, right now, for just the reasons 
stated
in the original message.

I wonder if it would be reasonable for journalctl to supply the (additional) 
fields that are
guaranteed to be associated with a MESSAGE_ID, and how this information might
be registered. One way is to essentially derive this from an associated catalog
entry, if any. Any fields that the catalog entry uses really ought to be 
supplied
along w/ the MESSAGE_ID. This mapping is available to any human being, of 
course, by
inspecting journal entries.

But it also seems likely that there might be fields that
should be guaranteed to accompany a MESSAGE_ID that should not be incorporated 
into
a catalog message. I would be interested in the idea of, e.g., extending the 
format of
the catalog file that an application distributes to allow an extra line that 
specifies
guaranteed fields, or alternatively, to allow an additional file, dedicated to 
specifying
this interface. This is a bit analogous to the interface file that is specified 
for
foreign language bindings for a library.

I'm also curious about a mechanism to distinguish those entries that are 
supplied
specifically for a particular MESSAGE_ID from those that are, e.g., 
auto-generated
by systemd or derived from some other sources. systemd has already taken the 
underscore
for the unfakeable entries it provides. Is it reasonable to preface any 
MESSAGE_ID
specific keys with the MESSAGE_ID, e.g.,
9bb33380-fbfa-4d5b-88b5-6e6bb8a39124:KEY? Or perhaps a double underscore, 
e.g.,
__KEY would do the trick?

-- SNIP --

 
 Zbyszek
 ___
 systemd-devel mailing list
 systemd-devel@lists.freedesktop.org
 http://lists.freedesktop.org/mailman/listinfo/systemd-devel
 

- mulhern
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] RFC: filter and search journalctl

2015-08-17 Thread Anne Mulhern




- Original Message -
 From: Zbigniew Jędrzejewski-Szmek zbys...@in.waw.pl
 To: Anne Mulhern amulh...@redhat.com
 Cc: systemd-devel@lists.freedesktop.org, Sebastian Schindler 
 sebastian.schind...@travelping.com
 Sent: Monday, August 17, 2015 10:45:11 AM
 Subject: Re: [systemd-devel] RFC: filter and search journalctl
 
 On Mon, Aug 17, 2015 at 10:24:22AM -0400, Anne Mulhern wrote:
  
  
  
  
  - Original Message -
   From: Zbigniew Jędrzejewski-Szmek zbys...@in.waw.pl
   To: Sebastian Schindler sebastian.schind...@travelping.com
   Cc: systemd-devel@lists.freedesktop.org
   Sent: Saturday, August 8, 2015 3:48:30 PM
   Subject: Re: [systemd-devel] RFC: filter and search journalctl
   
   On Fri, Aug 07, 2015 at 11:53:13AM +0200, Sebastian Schindler wrote:
Grep-ing seems to be the only solution to find log entries if you don't
fully
know what you're looking for. For example: You want to see all entries
containing a certain MESSAGE that gets enriched with additional
information
during the logging process:

MESSAGE=host HOST has closed connection CONNECTION_ID
   This is a bit contentious, but at least I would like to see some
   grep functionality implemented directly in journalctl.
   
  
  I am late to the party, but I think it is obvious that the right way for
  this
  to be achieved, in a perfect world, is that this log entry be accompanied
  by a MESSAGE_ID, and HOST and CONNECTION_ID keys, and a catalog entry that
  combined
  with the keys, generates the above message so that grepping is entirely
  unnecessary.
  
  It is true that this perfect world is not just around the corner, or
  anything like that,
  but it is technically possible.
  
  I agree that grepping would be handy for me, right now, for just the
  reasons stated
  in the original message.
  
  I wonder if it would be reasonable for journalctl to supply the
  (additional) fields that are
  guaranteed to be associated with a MESSAGE_ID
 And what what happen when the entry is malformed, i.e. missing some fields?
 Would journald reject the message? I don't think this would be useful to
 anyone at all. Instead the readers of the message should gracefully adapt
 to missing fields.
 

I think it would be wrong for journald to reject a message that does not supply
all the declared fields. It would also be wrong for journalctl to crash when 
given the
--catalog flag if the fields are missing. I don't know what it does right
now, because it is not that easy a situation to engineer, AFAICT. I guess the
best thing would be to supply a special catalog message indicating that an
error had occurred when trying to construct a catalog message. Something
that indicated the fields that were missing that caused the error would be nice.
Just so long as that didn't turn into an infinite loop, somehow. If somebody
knows what journalctl does do in this situation, please pass that information 
along.

Other consumers of log entries should behave in whatever manner seems best to 
them
if a declared field is missing.

What I'm looking for here is the best way for an application which wants to use 
the
journaling facilities provided to publish useful information about its log 
entry API. The
advantage of publishing it in the manner I've suggested is that journalctl could
be very helpful about telling consumers of the journal what keys they should 
expect
to see. Something like:
journalctl --list-keys MESSAGE_ID
and maybe even a journal API for programmatic access to this information would 
be
very nice.

Of course, there are other ways for an application to publish its log entry API.
But, it does seem odd for it to do this outside the structures that systemd has
already set up, when it is an API for journal entries.

Since this really is an API, with all the usual issues about versioning and so 
forth,
it really is essential that the information be published somewhere, not 
laboriously extracted
from a scan of the code by potential log entry consumers. 

 ...
  Is it reasonable to preface any MESSAGE_ID
  specific keys with the MESSAGE_ID, e.g.,
  9bb33380-fbfa-4d5b-88b5-6e6bb8a39124:KEY? Or perhaps a double underscore,
  e.g.,
  __KEY would do the trick?
 MESSAGE_ID is a contrace between the writers of the message and the readers
 of
 the message. The first say: messages with this ID mean ... and have have the
 fields ... . There is no need to mark the fields in any other way,
 except by documentation or custom.
 
 Zbyszek
 

The reason this seems important to me is the problem of a shared namespace.
These MESSAGE_ID UUIDs are globally registered, since there is a high enough 
probability
that every UUID is different that they are, to all intents and purposes, unique.
But the keys do not have this advantage. In this shared namespace, it would
be easy enough for journald to steal a key that was already in use by another
application. This would generate all the obvious and usual problems, most 
probably
forcing

Re: [systemd-devel] RFC: filter and search journalctl

2015-08-08 Thread Goffredo Baroncelli
On 2015-08-07 11:53, Sebastian Schindler wrote:
 Hi all.
 
 The journal format offers powerful filter capabilities. Unfortunately this 
 power
 is lost, if you have to use grep to find certain information.
 Example given (unscientific benchmark), count the number of entries for a
 (known) executable:
 
 
 $ journalctl --disk-usage
 Archived and active journals take up 344.1M on disk.
 
 $ $ time (journalctl _EXE=/usr/sbin/dhclient -o verbose | \
  grep -F _EXE=/usr/sbin/dhclient | wc -l)
 1233
 
 real0m0.111s
 user0m0.007s
 sys 0m0.091s
 
 
 $ $ time (journalctl -o verbose | grep -F _EXE=/usr/sbin/dhclient | wc 
 -l)
 1233
 
 real0m7.515s
 user0m5.088s
 sys 0m6.896s
 
 
 This shows that using grep-piping is magnitudes slower than journalctl.

This is due to the fact that the journal file is structured like a database; 
all fields are fully indexed, so journalctl is faster in case of a query like 
KEY=VALUE.

For other kind of search (by regexp, or only by value), journalctl cannot use 
the indexes so it is a lot slower because it has to process all the journal log.

I am curious  if you do 

$ time ( grep -F _EXE=/usr/sbin/dhclient /var/log/journal/*/*| wc -l)

which is the time resulting

 
 Grep-ing seems to be the only solution to find log entries if you don't fully
 know what you're looking for. For example: You want to see all entries
 containing a certain MESSAGE that gets enriched with additional information
 during the logging process:
 
 MESSAGE=host HOST has closed connection CONNECTION_ID
 
 At the moment you have no option to look for this kind of information unless
 someone has set something like  MESSAGE_ID you can filter for. There are 
 several
 use cases using this pattern of thinking:
 
 * there's no option to show all set FIELD keys in the current journal, 
 although
   this information is encoded in the header of each journal file
 * there's no support for negated filtering, you can't easily hide output of a
   certain unit which is creating too much noise
 * there's no support for regular expressions (except for the --unit option),
   this is especially problematic when you're looking for certain MESSAGEs
 * there's no option to show all entries containing a certain field
 * logical expressions are somewhat hard to read/write because parenthesis 
 can't
   be used to enforce certain logical expressions
 
 What do you think about a query language for journalctl that allows more
 powerful search options? This could be introduced without ignoring the
 capabilities the journal file format has to offer. Are there maybe already 
 plans
 to introduce something alike into journalctl? Do some people here have
 experience with query languages for such a use case? Things come to mind like
 PCAP filter, SPARQL, Lucene or the SPHINX Query Language.
 ___
 systemd-devel mailing list
 systemd-devel@lists.freedesktop.org
 http://lists.freedesktop.org/mailman/listinfo/systemd-devel
 


-- 
gpg @keyserver.linux.it: Goffredo Baroncelli kreijackATinwind.it
Key fingerprint BBF5 1610 0B64 DAC6 5F7D  17B2 0EDA 9B37 8B82 E0B5

-- 
gpg @keyserver.linux.it: Goffredo Baroncelli kreijackATinwind.it
Key fingerprint BBF5 1610 0B64 DAC6 5F7D  17B2 0EDA 9B37 8B82 E0B5
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] RFC: filter and search journalctl

2015-08-08 Thread Zbigniew Jędrzejewski-Szmek
On Fri, Aug 07, 2015 at 11:53:13AM +0200, Sebastian Schindler wrote:
 Grep-ing seems to be the only solution to find log entries if you don't fully
 know what you're looking for. For example: You want to see all entries
 containing a certain MESSAGE that gets enriched with additional information
 during the logging process:
 
 MESSAGE=host HOST has closed connection CONNECTION_ID
This is a bit contentious, but at least I would like to see some
grep functionality implemented directly in journalctl.

 At the moment you have no option to look for this kind of information unless
 someone has set something like  MESSAGE_ID you can filter for. There are 
 several
 use cases using this pattern of thinking:
 
 * there's no option to show all set FIELD keys in the current journal, 
 although
   this information is encoded in the header of each journal file
This should be easy enough to add.

 * there's no support for negated filtering, you can't easily hide output of a
   certain unit which is creating too much noise
This has been on the todo list for a long time.

 * there's no support for regular expressions (except for the --unit option),
   this is especially problematic when you're looking for certain MESSAGEs
 * there's no option to show all entries containing a certain field
 * logical expressions are somewhat hard to read/write because parenthesis 
 can't
   be used to enforce certain logical expressions
journalctl is supposed to be simple. Arbitrarily complex queries
are not something that is ever going to be well supported. Like
David said, there's ELK and other stacks for that.

 What do you think about a query language for journalctl that allows more
 powerful search options? This could be introduced without ignoring the
 capabilities the journal file format has to offer. Are there maybe already 
 plans
 to introduce something alike into journalctl? Do some people here have
 experience with query languages for such a use case? Things come to mind like
 PCAP filter, SPARQL, Lucene or the SPHINX Query Language.
It really depends. I think that anything which directly queries
information from the journal headers yes, regexps should be discussed,
anything more complicated no.

Zbyszek
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


[systemd-devel] RFC: filter and search journalctl

2015-08-07 Thread Sebastian Schindler
Hi all.

The journal format offers powerful filter capabilities. Unfortunately this power
is lost, if you have to use grep to find certain information.
Example given (unscientific benchmark), count the number of entries for a
(known) executable:


$ journalctl --disk-usage
Archived and active journals take up 344.1M on disk.

$ $ time (journalctl _EXE=/usr/sbin/dhclient -o verbose | \
 grep -F _EXE=/usr/sbin/dhclient | wc -l)
1233

real0m0.111s
user0m0.007s
sys 0m0.091s


$ $ time (journalctl -o verbose | grep -F _EXE=/usr/sbin/dhclient | wc -l)
1233

real0m7.515s
user0m5.088s
sys 0m6.896s


This shows that using grep-piping is magnitudes slower than journalctl.

Grep-ing seems to be the only solution to find log entries if you don't fully
know what you're looking for. For example: You want to see all entries
containing a certain MESSAGE that gets enriched with additional information
during the logging process:

MESSAGE=host HOST has closed connection CONNECTION_ID

At the moment you have no option to look for this kind of information unless
someone has set something like  MESSAGE_ID you can filter for. There are several
use cases using this pattern of thinking:

* there's no option to show all set FIELD keys in the current journal, although
  this information is encoded in the header of each journal file
* there's no support for negated filtering, you can't easily hide output of a
  certain unit which is creating too much noise
* there's no support for regular expressions (except for the --unit option),
  this is especially problematic when you're looking for certain MESSAGEs
* there's no option to show all entries containing a certain field
* logical expressions are somewhat hard to read/write because parenthesis can't
  be used to enforce certain logical expressions

What do you think about a query language for journalctl that allows more
powerful search options? This could be introduced without ignoring the
capabilities the journal file format has to offer. Are there maybe already plans
to introduce something alike into journalctl? Do some people here have
experience with query languages for such a use case? Things come to mind like
PCAP filter, SPARQL, Lucene or the SPHINX Query Language.
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] RFC: filter and search journalctl

2015-08-07 Thread David Timothy Strauss
I think you should look into forwarding your logs to a more sophisticated
aggregator, like the ELK stack.
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel