Re: [systemd-devel] RFC: filter and search journalctl
- Original Message - From: Anne Mulhern amulh...@redhat.com To: systemd-devel@lists.freedesktop.org Sent: Monday, August 17, 2015 11:34:10 AM Subject: Re: [systemd-devel] RFC: filter and search journalctl - Original Message - From: Zbigniew Jędrzejewski-Szmek zbys...@in.waw.pl To: Anne Mulhern amulh...@redhat.com Cc: systemd-devel@lists.freedesktop.org, Sebastian Schindler sebastian.schind...@travelping.com Sent: Monday, August 17, 2015 10:45:11 AM Subject: Re: [systemd-devel] RFC: filter and search journalctl On Mon, Aug 17, 2015 at 10:24:22AM -0400, Anne Mulhern wrote: - Original Message - From: Zbigniew Jędrzejewski-Szmek zbys...@in.waw.pl To: Sebastian Schindler sebastian.schind...@travelping.com Cc: systemd-devel@lists.freedesktop.org Sent: Saturday, August 8, 2015 3:48:30 PM Subject: Re: [systemd-devel] RFC: filter and search journalctl On Fri, Aug 07, 2015 at 11:53:13AM +0200, Sebastian Schindler wrote: Grep-ing seems to be the only solution to find log entries if you don't fully know what you're looking for. For example: You want to see all entries containing a certain MESSAGE that gets enriched with additional information during the logging process: MESSAGE=host HOST has closed connection CONNECTION_ID This is a bit contentious, but at least I would like to see some grep functionality implemented directly in journalctl. I am late to the party, but I think it is obvious that the right way for this to be achieved, in a perfect world, is that this log entry be accompanied by a MESSAGE_ID, and HOST and CONNECTION_ID keys, and a catalog entry that combined with the keys, generates the above message so that grepping is entirely unnecessary. It is true that this perfect world is not just around the corner, or anything like that, but it is technically possible. I agree that grepping would be handy for me, right now, for just the reasons stated in the original message. I wonder if it would be reasonable for journalctl to supply the (additional) fields that are guaranteed to be associated with a MESSAGE_ID And what what happen when the entry is malformed, i.e. missing some fields? Would journald reject the message? I don't think this would be useful to anyone at all. Instead the readers of the message should gracefully adapt to missing fields. I think it would be wrong for journald to reject a message that does not supply all the declared fields. It would also be wrong for journalctl to crash when given the --catalog flag if the fields are missing. I don't know what it does right now, because it is not that easy a situation to engineer, AFAICT. I guess the best thing would be to supply a special catalog message indicating that an error had occurred when trying to construct a catalog message. Something that indicated the fields that were missing that caused the error would be nice. Just so long as that didn't turn into an infinite loop, somehow. If somebody knows what journalctl does do in this situation, please pass that information along. Re-reading the docs, I realize that the information is right there in plain sight. If the value is not defined, the variable name, minus the @ signs, is displayed. Nice and simple. But now, I wonder what happens for fields like _UDEV_DEVLINK, which can be set 0, 1, or many times. If unset, the variable name minus the @ signs is displayed. If set once, the value is substituted. If set twice? -- SNIP -- ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/systemd-devel - mulhern ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] RFC: filter and search journalctl
On Tue, Aug 18, 2015 at 09:00:27AM -0400, Anne Mulhern wrote: Re-reading the docs, I realize that the information is right there in plain sight. If the value is not defined, the variable name, minus the @ signs, is displayed. Nice and simple. But now, I wonder what happens for fields like _UDEV_DEVLINK, which can be set 0, 1, or many times. If unset, the variable name minus the @ signs is displayed. If set once, the value is substituted. If set twice? One value is used. Which one is pseudorandom. Zbyszek ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] RFC: filter and search journalctl
On Mon, Aug 17, 2015 at 10:24:22AM -0400, Anne Mulhern wrote: - Original Message - From: Zbigniew Jędrzejewski-Szmek zbys...@in.waw.pl To: Sebastian Schindler sebastian.schind...@travelping.com Cc: systemd-devel@lists.freedesktop.org Sent: Saturday, August 8, 2015 3:48:30 PM Subject: Re: [systemd-devel] RFC: filter and search journalctl On Fri, Aug 07, 2015 at 11:53:13AM +0200, Sebastian Schindler wrote: Grep-ing seems to be the only solution to find log entries if you don't fully know what you're looking for. For example: You want to see all entries containing a certain MESSAGE that gets enriched with additional information during the logging process: MESSAGE=host HOST has closed connection CONNECTION_ID This is a bit contentious, but at least I would like to see some grep functionality implemented directly in journalctl. I am late to the party, but I think it is obvious that the right way for this to be achieved, in a perfect world, is that this log entry be accompanied by a MESSAGE_ID, and HOST and CONNECTION_ID keys, and a catalog entry that combined with the keys, generates the above message so that grepping is entirely unnecessary. It is true that this perfect world is not just around the corner, or anything like that, but it is technically possible. I agree that grepping would be handy for me, right now, for just the reasons stated in the original message. I wonder if it would be reasonable for journalctl to supply the (additional) fields that are guaranteed to be associated with a MESSAGE_ID And what what happen when the entry is malformed, i.e. missing some fields? Would journald reject the message? I don't think this would be useful to anyone at all. Instead the readers of the message should gracefully adapt to missing fields. ... Is it reasonable to preface any MESSAGE_ID specific keys with the MESSAGE_ID, e.g., 9bb33380-fbfa-4d5b-88b5-6e6bb8a39124:KEY? Or perhaps a double underscore, e.g., __KEY would do the trick? MESSAGE_ID is a contrace between the writers of the message and the readers of the message. The first say: messages with this ID mean ... and have have the fields ... . There is no need to mark the fields in any other way, except by documentation or custom. Zbyszek ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] RFC: filter and search journalctl
- Original Message - From: Zbigniew Jędrzejewski-Szmek zbys...@in.waw.pl To: Sebastian Schindler sebastian.schind...@travelping.com Cc: systemd-devel@lists.freedesktop.org Sent: Saturday, August 8, 2015 3:48:30 PM Subject: Re: [systemd-devel] RFC: filter and search journalctl On Fri, Aug 07, 2015 at 11:53:13AM +0200, Sebastian Schindler wrote: Grep-ing seems to be the only solution to find log entries if you don't fully know what you're looking for. For example: You want to see all entries containing a certain MESSAGE that gets enriched with additional information during the logging process: MESSAGE=host HOST has closed connection CONNECTION_ID This is a bit contentious, but at least I would like to see some grep functionality implemented directly in journalctl. I am late to the party, but I think it is obvious that the right way for this to be achieved, in a perfect world, is that this log entry be accompanied by a MESSAGE_ID, and HOST and CONNECTION_ID keys, and a catalog entry that combined with the keys, generates the above message so that grepping is entirely unnecessary. It is true that this perfect world is not just around the corner, or anything like that, but it is technically possible. I agree that grepping would be handy for me, right now, for just the reasons stated in the original message. I wonder if it would be reasonable for journalctl to supply the (additional) fields that are guaranteed to be associated with a MESSAGE_ID, and how this information might be registered. One way is to essentially derive this from an associated catalog entry, if any. Any fields that the catalog entry uses really ought to be supplied along w/ the MESSAGE_ID. This mapping is available to any human being, of course, by inspecting journal entries. But it also seems likely that there might be fields that should be guaranteed to accompany a MESSAGE_ID that should not be incorporated into a catalog message. I would be interested in the idea of, e.g., extending the format of the catalog file that an application distributes to allow an extra line that specifies guaranteed fields, or alternatively, to allow an additional file, dedicated to specifying this interface. This is a bit analogous to the interface file that is specified for foreign language bindings for a library. I'm also curious about a mechanism to distinguish those entries that are supplied specifically for a particular MESSAGE_ID from those that are, e.g., auto-generated by systemd or derived from some other sources. systemd has already taken the underscore for the unfakeable entries it provides. Is it reasonable to preface any MESSAGE_ID specific keys with the MESSAGE_ID, e.g., 9bb33380-fbfa-4d5b-88b5-6e6bb8a39124:KEY? Or perhaps a double underscore, e.g., __KEY would do the trick? -- SNIP -- Zbyszek ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/systemd-devel - mulhern ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] RFC: filter and search journalctl
- Original Message - From: Zbigniew Jędrzejewski-Szmek zbys...@in.waw.pl To: Anne Mulhern amulh...@redhat.com Cc: systemd-devel@lists.freedesktop.org, Sebastian Schindler sebastian.schind...@travelping.com Sent: Monday, August 17, 2015 10:45:11 AM Subject: Re: [systemd-devel] RFC: filter and search journalctl On Mon, Aug 17, 2015 at 10:24:22AM -0400, Anne Mulhern wrote: - Original Message - From: Zbigniew Jędrzejewski-Szmek zbys...@in.waw.pl To: Sebastian Schindler sebastian.schind...@travelping.com Cc: systemd-devel@lists.freedesktop.org Sent: Saturday, August 8, 2015 3:48:30 PM Subject: Re: [systemd-devel] RFC: filter and search journalctl On Fri, Aug 07, 2015 at 11:53:13AM +0200, Sebastian Schindler wrote: Grep-ing seems to be the only solution to find log entries if you don't fully know what you're looking for. For example: You want to see all entries containing a certain MESSAGE that gets enriched with additional information during the logging process: MESSAGE=host HOST has closed connection CONNECTION_ID This is a bit contentious, but at least I would like to see some grep functionality implemented directly in journalctl. I am late to the party, but I think it is obvious that the right way for this to be achieved, in a perfect world, is that this log entry be accompanied by a MESSAGE_ID, and HOST and CONNECTION_ID keys, and a catalog entry that combined with the keys, generates the above message so that grepping is entirely unnecessary. It is true that this perfect world is not just around the corner, or anything like that, but it is technically possible. I agree that grepping would be handy for me, right now, for just the reasons stated in the original message. I wonder if it would be reasonable for journalctl to supply the (additional) fields that are guaranteed to be associated with a MESSAGE_ID And what what happen when the entry is malformed, i.e. missing some fields? Would journald reject the message? I don't think this would be useful to anyone at all. Instead the readers of the message should gracefully adapt to missing fields. I think it would be wrong for journald to reject a message that does not supply all the declared fields. It would also be wrong for journalctl to crash when given the --catalog flag if the fields are missing. I don't know what it does right now, because it is not that easy a situation to engineer, AFAICT. I guess the best thing would be to supply a special catalog message indicating that an error had occurred when trying to construct a catalog message. Something that indicated the fields that were missing that caused the error would be nice. Just so long as that didn't turn into an infinite loop, somehow. If somebody knows what journalctl does do in this situation, please pass that information along. Other consumers of log entries should behave in whatever manner seems best to them if a declared field is missing. What I'm looking for here is the best way for an application which wants to use the journaling facilities provided to publish useful information about its log entry API. The advantage of publishing it in the manner I've suggested is that journalctl could be very helpful about telling consumers of the journal what keys they should expect to see. Something like: journalctl --list-keys MESSAGE_ID and maybe even a journal API for programmatic access to this information would be very nice. Of course, there are other ways for an application to publish its log entry API. But, it does seem odd for it to do this outside the structures that systemd has already set up, when it is an API for journal entries. Since this really is an API, with all the usual issues about versioning and so forth, it really is essential that the information be published somewhere, not laboriously extracted from a scan of the code by potential log entry consumers. ... Is it reasonable to preface any MESSAGE_ID specific keys with the MESSAGE_ID, e.g., 9bb33380-fbfa-4d5b-88b5-6e6bb8a39124:KEY? Or perhaps a double underscore, e.g., __KEY would do the trick? MESSAGE_ID is a contrace between the writers of the message and the readers of the message. The first say: messages with this ID mean ... and have have the fields ... . There is no need to mark the fields in any other way, except by documentation or custom. Zbyszek The reason this seems important to me is the problem of a shared namespace. These MESSAGE_ID UUIDs are globally registered, since there is a high enough probability that every UUID is different that they are, to all intents and purposes, unique. But the keys do not have this advantage. In this shared namespace, it would be easy enough for journald to steal a key that was already in use by another application. This would generate all the obvious and usual problems, most probably forcing
Re: [systemd-devel] RFC: filter and search journalctl
On 2015-08-07 11:53, Sebastian Schindler wrote: Hi all. The journal format offers powerful filter capabilities. Unfortunately this power is lost, if you have to use grep to find certain information. Example given (unscientific benchmark), count the number of entries for a (known) executable: $ journalctl --disk-usage Archived and active journals take up 344.1M on disk. $ $ time (journalctl _EXE=/usr/sbin/dhclient -o verbose | \ grep -F _EXE=/usr/sbin/dhclient | wc -l) 1233 real0m0.111s user0m0.007s sys 0m0.091s $ $ time (journalctl -o verbose | grep -F _EXE=/usr/sbin/dhclient | wc -l) 1233 real0m7.515s user0m5.088s sys 0m6.896s This shows that using grep-piping is magnitudes slower than journalctl. This is due to the fact that the journal file is structured like a database; all fields are fully indexed, so journalctl is faster in case of a query like KEY=VALUE. For other kind of search (by regexp, or only by value), journalctl cannot use the indexes so it is a lot slower because it has to process all the journal log. I am curious if you do $ time ( grep -F _EXE=/usr/sbin/dhclient /var/log/journal/*/*| wc -l) which is the time resulting Grep-ing seems to be the only solution to find log entries if you don't fully know what you're looking for. For example: You want to see all entries containing a certain MESSAGE that gets enriched with additional information during the logging process: MESSAGE=host HOST has closed connection CONNECTION_ID At the moment you have no option to look for this kind of information unless someone has set something like MESSAGE_ID you can filter for. There are several use cases using this pattern of thinking: * there's no option to show all set FIELD keys in the current journal, although this information is encoded in the header of each journal file * there's no support for negated filtering, you can't easily hide output of a certain unit which is creating too much noise * there's no support for regular expressions (except for the --unit option), this is especially problematic when you're looking for certain MESSAGEs * there's no option to show all entries containing a certain field * logical expressions are somewhat hard to read/write because parenthesis can't be used to enforce certain logical expressions What do you think about a query language for journalctl that allows more powerful search options? This could be introduced without ignoring the capabilities the journal file format has to offer. Are there maybe already plans to introduce something alike into journalctl? Do some people here have experience with query languages for such a use case? Things come to mind like PCAP filter, SPARQL, Lucene or the SPHINX Query Language. ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/systemd-devel -- gpg @keyserver.linux.it: Goffredo Baroncelli kreijackATinwind.it Key fingerprint BBF5 1610 0B64 DAC6 5F7D 17B2 0EDA 9B37 8B82 E0B5 -- gpg @keyserver.linux.it: Goffredo Baroncelli kreijackATinwind.it Key fingerprint BBF5 1610 0B64 DAC6 5F7D 17B2 0EDA 9B37 8B82 E0B5 ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] RFC: filter and search journalctl
On Fri, Aug 07, 2015 at 11:53:13AM +0200, Sebastian Schindler wrote: Grep-ing seems to be the only solution to find log entries if you don't fully know what you're looking for. For example: You want to see all entries containing a certain MESSAGE that gets enriched with additional information during the logging process: MESSAGE=host HOST has closed connection CONNECTION_ID This is a bit contentious, but at least I would like to see some grep functionality implemented directly in journalctl. At the moment you have no option to look for this kind of information unless someone has set something like MESSAGE_ID you can filter for. There are several use cases using this pattern of thinking: * there's no option to show all set FIELD keys in the current journal, although this information is encoded in the header of each journal file This should be easy enough to add. * there's no support for negated filtering, you can't easily hide output of a certain unit which is creating too much noise This has been on the todo list for a long time. * there's no support for regular expressions (except for the --unit option), this is especially problematic when you're looking for certain MESSAGEs * there's no option to show all entries containing a certain field * logical expressions are somewhat hard to read/write because parenthesis can't be used to enforce certain logical expressions journalctl is supposed to be simple. Arbitrarily complex queries are not something that is ever going to be well supported. Like David said, there's ELK and other stacks for that. What do you think about a query language for journalctl that allows more powerful search options? This could be introduced without ignoring the capabilities the journal file format has to offer. Are there maybe already plans to introduce something alike into journalctl? Do some people here have experience with query languages for such a use case? Things come to mind like PCAP filter, SPARQL, Lucene or the SPHINX Query Language. It really depends. I think that anything which directly queries information from the journal headers yes, regexps should be discussed, anything more complicated no. Zbyszek ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/systemd-devel
[systemd-devel] RFC: filter and search journalctl
Hi all. The journal format offers powerful filter capabilities. Unfortunately this power is lost, if you have to use grep to find certain information. Example given (unscientific benchmark), count the number of entries for a (known) executable: $ journalctl --disk-usage Archived and active journals take up 344.1M on disk. $ $ time (journalctl _EXE=/usr/sbin/dhclient -o verbose | \ grep -F _EXE=/usr/sbin/dhclient | wc -l) 1233 real0m0.111s user0m0.007s sys 0m0.091s $ $ time (journalctl -o verbose | grep -F _EXE=/usr/sbin/dhclient | wc -l) 1233 real0m7.515s user0m5.088s sys 0m6.896s This shows that using grep-piping is magnitudes slower than journalctl. Grep-ing seems to be the only solution to find log entries if you don't fully know what you're looking for. For example: You want to see all entries containing a certain MESSAGE that gets enriched with additional information during the logging process: MESSAGE=host HOST has closed connection CONNECTION_ID At the moment you have no option to look for this kind of information unless someone has set something like MESSAGE_ID you can filter for. There are several use cases using this pattern of thinking: * there's no option to show all set FIELD keys in the current journal, although this information is encoded in the header of each journal file * there's no support for negated filtering, you can't easily hide output of a certain unit which is creating too much noise * there's no support for regular expressions (except for the --unit option), this is especially problematic when you're looking for certain MESSAGEs * there's no option to show all entries containing a certain field * logical expressions are somewhat hard to read/write because parenthesis can't be used to enforce certain logical expressions What do you think about a query language for journalctl that allows more powerful search options? This could be introduced without ignoring the capabilities the journal file format has to offer. Are there maybe already plans to introduce something alike into journalctl? Do some people here have experience with query languages for such a use case? Things come to mind like PCAP filter, SPARQL, Lucene or the SPHINX Query Language. ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] RFC: filter and search journalctl
I think you should look into forwarding your logs to a more sophisticated aggregator, like the ELK stack. ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/systemd-devel