Re: [systemd-devel] [ANNOUNCE] Journal File Format Documentation
On Sun, Oct 21, 2012 at 1:05 AM, Lennart Poettering lenn...@poettering.net wrote: Heya, I have now found the time to document the journal file format: http://www.freedesktop.org/wiki/Software/systemd/journal-files Comments welcome! (Oh, and it's in the fdo wiki, so if you see a typo or so, go ahead and fix it!) I've quickly read through the document [1] pointed above describing the log on disk format. (I've read it out of curiosity, not with the intention to implement it. Maybe to steal some ideas for the future.) :) (Just as an observation I've also read your essay about this subject [2], and I agree with most of it.) But what I couldn't find in any of these documents (maybe there is in another one), is a justification of the current technical (i.e. implementation) decisions. Mainly: Why did you resort to implementing a new database format, and didn't choose an existing embedded library like BerkeleyDB, LevelDB, etc.? (Advantages / disadvantages?) Just to be clear I don't mean the decision of a completely new logging format (i.e. not compatible with syslog, CEE, etc.) I'm focusing just on the storage engine. For example I could think of: * not relaying on an external embedded storage library, makes the resulting binaries smaller and simpler; (this doesn't seem to be it as systemd already incorporates some other libraries;) * having a custom log implementation makes this job more efficient; (in the long term I don't think beating a library like the ones listed above is feasible without a lot of work;) * log rotation is almost impossible with some of the above cited libraries; (there are other tricks that can be done to achieve such;) Please don't take this as critique of the journal work (I've read some of those too, especially from Rainer, but I'm neutral so far). I just want to understand the decisions. Ciprian. [1] http://www.freedesktop.org/wiki/Software/systemd/journal-files [2] https://docs.google.com/document/pub?id=1IC9yOXj7j6cdLLxWEBAGRL6wl97tFxgjLUEHIX3MSTspli=1 ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] [ANNOUNCE] Journal File Format Documentation
On Tue, 23.10.12 15:25, Ciprian Dorin Craciun (ciprian.crac...@gmail.com) wrote: But what I couldn't find in any of these documents (maybe there is in another one), is a justification of the current technical (i.e. implementation) decisions. Mainly: That's a valid question to raise. Why did you resort to implementing a new database format, and didn't choose an existing embedded library like BerkeleyDB, LevelDB, etc.? (Advantages / disadvantages?) There are a number of reasons, which one could summarize as: because there is no existing database implementation that would fit the bill: - we needed something small, embeddable, in pure C, so that we can pull it in everywhere. That has a somewhat stable API, is sanely managed upstream, and Free Software. We are OK to add deps to systemd, if there's a good reason to and the dep is well managed. It needed to be OOM safe. - The database should be typeless, and index by all fields, rather than require fixed schemas. It should efficient with large and binary data. - It should not require file locks or communication between multiple readers or between readers and the writer. This is primarily a question of security (we cannot allow users to lock out root or the writer from acessing the logs by taking a lock) and network transparency (file locks on network FS are very very flaky), but also performance. - We wanted something robust for IO failures that focusses on appending new data to the end, rather than overwriting data constantly. - We needed something with in-line compression, and where we can add stuff like FSS to These are the strong requirements, but there are other are ore things to keep in mind: because of the structure of log data, which knows no changes but only appends and the occasional deletion of large chunks, and were data is generally montonically ordered you can a lot of things you cannot do in normal databases. rsyslog apparently chose to use ElasticSearch. It think ElasticSearch is cool, but it already fails for us on the most superficial of things, in that it would be quite ridiculous to pull in Java into all systems for that... ;-) Lennart -- Lennart Poettering - Red Hat, Inc. ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] [ANNOUNCE] Journal File Format Documentation
On Tue, Oct 23, 2012 at 5:39 PM, Lennart Poettering lenn...@poettering.net wrote: On Tue, 23.10.12 15:25, Ciprian Dorin Craciun (ciprian.crac...@gmail.com) wrote: Why did you resort to implementing a new database format, and didn't choose an existing embedded library like BerkeleyDB, LevelDB, etc.? (Advantages / disadvantages?) There are a number of reasons, which one could summarize as: because there is no existing database implementation that would fit the bill: - we needed something small, embeddable, in pure C, so that we can pull it in everywhere. That has a somewhat stable API, is sanely managed upstream, and Free Software. We are OK to add deps to systemd, if there's a good reason to and the dep is well managed. It needed to be OOM safe. - We wanted something robust for IO failures that focusses on appending new data to the end, rather than overwriting data constantly. - We needed something with in-line compression, and where we can add stuff like FSS to Ok. I agree that there are very few libraries that fit here. All I can think of making into here would be BerkeleyDB (but it fails other requirements you've listed below). - The database should be typeless, and index by all fields, rather than require fixed schemas. It should efficient with large and binary data. One thing bothers me: why should it index all fields? (For example indexing by UID, executable, service, etc. makes sense, but I don't think indexing by message is that worthwhile... Moreover by PID or coredump (which I think it is hinted is stored in the journal) doesn't make too much sense either...) - It should not require file locks or communication between multiple readers or between readers and the writer. This is primarily a question of security (we cannot allow users to lock out root or the writer from acessing the logs by taking a lock) and network transparency (file locks on network FS are very very flaky), but also performance. From what I see this is the best reason for the current proposal. Indeed no embedded database library (that I know of) allows both reading and writing at the same time from multiple processes without locking. (Except maybe DJB's CDB and the TinyCDB implementation, but that wouldn't fit the bill here.) Maybe this should go at the top of that document as describing why?. These are the strong requirements, but there are other are ore things to keep in mind: because of the structure of log data, which knows no changes but only appends and the occasional deletion of large chunks, and were data is generally montonically ordered you can a lot of things you cannot do in normal databases. Although I partially agree about this increased flexibility, having a custom format means it is very easy to just start adding features, thus accumulating cruft... Thus maybe a general purpose system would have limited this tendency... rsyslog apparently chose to use ElasticSearch. It think ElasticSearch is cool, but it already fails for us on the most superficial of things, in that it would be quite ridiculous to pull in Java into all systems for that... ;-) I don't even want to imply such a thing solution. (Or at least not for a standalone computer logging system.) BTW, a little bit off-topic: * why didn't you try to implement this journal system as a standalone library, that could have been reused by other systems independently of systemd; (I know this was answered in [2], and that the focus is on systemd, but it seems it took quite a lot of work, and it's a pity it can't be individually reused); * how do you intend to implement something resembly syslog's log centralization? Ciprian. ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] [ANNOUNCE] Journal File Format Documentation
On Sun, Oct 21, 2012 at 1:05 AM, Lennart Poettering lenn...@poettering.net wrote: Heya, I have now found the time to document the journal file format: http://www.freedesktop.org/wiki/Software/systemd/journal-files Comments welcome! (Replying directly to this as I want to start another sub-thread...) I'm currently searching for a logging system that has the following feature, which I'm guessing could also be beneficial for systemd on larger systems: * I have multiple processes that I want to log individually; by multiple I mean about 100+ in total (not necessarily on the same system); * moreover these processes are quite dynamic (as in spawn / terminate) hourly or daily; * I need to control the retention policy per process not per entire system; * if needed I want to be able to archive these logs in a per-process (or process type) basis; * as bonus I would like to be able to migrate the logs for a particular process to another system; (In case anyone is wondering what I'm describing, it is a PaaS logging system similar with Heroku's logplex, etc.) The parallel with systemd: * think instead of my processes, of user-sessions and services; (I want to keep some service's (like `sshd`) logs for more time than I want it for DHCP, etc.); * then think about having a journal collecting journals from multiple machines in a central repository; As such, wouldn't a clustering key (like service type, or service type + pid, etc.) would make sense? This would imply: * splitting storage based on this clustering key; (not necessarily one per file, but maybe using some consistent hashing technique, etc.) * having the clustering key as a parameter for querying to restrict index search, etc. Of course all what I've described in the beginning could be emulated with the current journal, either by introducing a special field, or by using the journal library with multiple files (which I haven't checked if it is possible). Ciprian. ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] [ANNOUNCE] Journal File Format Documentation
On Tue, 23.10.12 18:48, Ciprian Dorin Craciun (ciprian.crac...@gmail.com) wrote: - We needed something with in-line compression, and where we can add stuff like FSS to Ok. I agree that there are very few libraries that fit here. All I can think of making into here would be BerkeleyDB (but it fails other requirements you've listed below). I used BDB in one other project and it makes me shudder. The constant ABI breakages and disk corruptions where aful. Heck, yum as one user of it breaks every second Fedora release with a BDB error where the usual recipe is to remove the yum BDB database so that it is regenerated on next invocation. I am pretty much through with BDB. - The database should be typeless, and index by all fields, rather than require fixed schemas. It should efficient with large and binary data. One thing bothers me: why should it index all fields? (For example indexing by UID, executable, service, etc. makes sense, but I don't think indexing by message is that worthwhile... Moreover by PID or coredump (which I think it is hinted is stored in the journal) doesn't make too much sense either...) Sure, not all fields make sense, but many many do, and we don't really know in advance which ones will, as the vocabulary can be extended by anybody. For example, if Apache decided to do structured logging for errors indexed vserver, then I'd love that, but of course the vserver match would be unknown to everybody else. And that's cool. So, the idea here is really to just index everything, and make it cheap so that if something actually never made sense to be indexed is cheap. In the journal file format indexing field objects that are only referenced once is practically free, as instead of storing an offset to the bsearch object we use for indexing we just store the offset of the referencing entry in-line. - It should not require file locks or communication between multiple readers or between readers and the writer. This is primarily a question of security (we cannot allow users to lock out root or the writer from acessing the logs by taking a lock) and network transparency (file locks on network FS are very very flaky), but also performance. From what I see this is the best reason for the current proposal. Indeed no embedded database library (that I know of) allows both reading and writing at the same time from multiple processes without locking. (Except maybe DJB's CDB and the TinyCDB implementation, but that wouldn't fit the bill here.) Maybe this should go at the top of that document as describing why?. I have added a link to this very thread to the document now to the first section of the document. Although I partially agree about this increased flexibility, having a custom format means it is very easy to just start adding features, thus accumulating cruft... Thus maybe a general purpose system would have limited this tendency... Well, we really try hard to stay focussed, and want the journal to do the things it does well, but not become a super-flexible store-anything database. Of course, we want to pave the way for future extensions, and that's why we built extensibility into the format (and there's a section about that in the spec). BTW, a little bit off-topic: * why didn't you try to implement this journal system as a standalone library, that could have been reused by other systems independently of systemd; (I know this was answered in [2], and that the focus is on systemd, but it seems it took quite a lot of work, and it's a pity it can't be individually reused); Well, for starters we want to be careful where we guarantee interface stability and where not. For example, the C API gives you a very high-level view on the journals, where interleaving of multiple files is hidden. However, if we'd split this out then we'd have to expose much more guts of the implementation, and provide stable API for that, and that's something I don't want. WE want the ability to change the internals of the implementation around and guarantee stability only at the most high level C API that hides it all. The other thing is simply that the stuff is really integrated with each other. The journal sources are small because we reuse a lot of internal C APIs of systemd, and the format exposes a lot of things that are specific to systemd, for example the vocabulary of well-known fields is closely bound to systemd. Also, we believe in the systemd, and in the journal and tight integration between the two, as we consider logging an essential facet of service management. It would be against our goals here to separate them out and turn them back into non-integrated components. * how do you intend to implement something resembly syslog's log centralization? The network model existing since day one is one where we rely on existing file sharing infrastructure to transfer/collect files. I.e. use NFS, SMB, FTP,
Re: [systemd-devel] [ANNOUNCE] Journal File Format Documentation
On Tue, Oct 23, 2012 at 7:33 PM, Lennart Poettering lenn...@poettering.net wrote: On Tue, 23.10.12 18:48, Ciprian Dorin Craciun (ciprian.crac...@gmail.com) wrote: - We needed something with in-line compression, and where we can add stuff like FSS to Ok. I agree that there are very few libraries that fit here. All I can think of making into here would be BerkeleyDB (but it fails other requirements you've listed below). I used BDB in one other project and it makes me shudder. The constant ABI breakages and disk corruptions where aful. Heck, yum as one user of it breaks every second Fedora release with a BDB error where the usual recipe is to remove the yum BDB database so that it is regenerated on next invocation. I am pretty much through with BDB. :) Yes, now I remember... Each time I upgrade BerkeleyDB `isync` and `bogofilter` gets me in hell... So, the idea here is really to just index everything, and make it cheap so that if something actually never made sense to be indexed is cheap. In the journal file format indexing field objects that are only referenced once is practically free, as instead of storing an offset to the bsearch object we use for indexing we just store the offset of the referencing entry in-line. I guess those offsets are quite cheap, and the in-line entry for the once-only data are ok. But (from what I understand) every value you store has to be searched through the file before storing (to see if it already exists as a value). Thus wouldn't this impact CPU usage? BTW, a little bit off-topic: * why didn't you try to implement this journal system as a standalone library, that could have been reused by other systems independently of systemd; (I know this was answered in [2], and that the focus is on systemd, but it seems it took quite a lot of work, and it's a pity it can't be individually reused); Well, for starters we want to be careful where we guarantee interface stability and where not. For example, the C API gives you a very high-level view on the journals, where interleaving of multiple files is hidden. However, if we'd split this out then we'd have to expose much more guts of the implementation, and provide stable API for that, and that's something I don't want. WE want the ability to change the internals of the implementation around and guarantee stability only at the most high level C API that hides it all. The other thing is simply that the stuff is really integrated with each other. The journal sources are small because we reuse a lot of internal C APIs of systemd, and the format exposes a lot of things that are specific to systemd, for example the vocabulary of well-known fields is closely bound to systemd. I understand this issue with the focus. Nevertheless your journal idea sounds nice, and I hope someone will take it and implement it in a standalone variant. (I hope in a native compilable language...) * how do you intend to implement something resembly syslog's log centralization? The network model existing since day one is one where we rely on existing file sharing infrastructure to transfer/collect files. I.e. use NFS, SMB, FTP, WebDAV, SCP, rsync whatever suits you, and make available at one spot, and journactl -m will interleave them as necessary. By interleave you mean only taking note of new files, not actually rewriting the contents. About the NFS (and other shared FS's) as storage backend I'm not that certain... (And the same about the rest of `scp`, `rsync`, etc.) Maybe a command / option (maybe backed by a `ssh` channel a-la `rsync`) to fetch efficiently journal from other machines. (Not as you describe below, which seems geared towards integration, but strictly geared towards collection.) I am curently working on getting log syncing via both a PUSH and PULL model done. This will be based one existing protocols and standards as much as we can (SSH or HTTP/HTTPS as transport, and JSON and more as payload), and is flexible for others to hook into. For example, I think it would be cool if greylog2 and similar software would just pull the data out of the journal on its own, simply via HTTP/JSON. We make everything available to make this smooth, i.e. we provide clients with stable cursors which they can use to restart operation. Aha. Kind of answers my previous question. Thanks for your time, Ciprian. ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] [ANNOUNCE] Journal File Format Documentation
On Tue, 23.10.12 19:11, Ciprian Dorin Craciun (ciprian.crac...@gmail.com) wrote: On Sun, Oct 21, 2012 at 1:05 AM, Lennart Poettering lenn...@poettering.net wrote: Heya, I have now found the time to document the journal file format: http://www.freedesktop.org/wiki/Software/systemd/journal-files Comments welcome! (Replying directly to this as I want to start another sub-thread...) I'm currently searching for a logging system that has the following feature, which I'm guessing could also be beneficial for systemd on larger systems: * I have multiple processes that I want to log individually; by multiple I mean about 100+ in total (not necessarily on the same system); * moreover these processes are quite dynamic (as in spawn / terminate) hourly or daily; * I need to control the retention policy per process not per entire system; * if needed I want to be able to archive these logs in a per-process (or process type) basis; * as bonus I would like to be able to migrate the logs for a particular process to another system; (In case anyone is wondering what I'm describing, it is a PaaS logging system similar with Heroku's logplex, etc.) The journal currently cannot do this for you, but what it already can is split up the journal per-user. This is done by default only for login users, (i.e. actual human users), but with the SplitMode= setting in journald.conf can be enabled for system users as well, or turned off entirely. We could extend this switch to allow other split-up schemes. But note that the price you pay for interleaving files on display grows with the more you split things up (O(n) being n number of files to interleave), hence we are a bit conservative here, we don't want to push people towards splitting up things too much, unless they have a really good reason to. BTW, are you sure you actually need processes to split up by? Wouldn't services be more appropriate? The parallel with systemd: * think instead of my processes, of user-sessions and services; (I want to keep some service's (like `sshd`) logs for more time than I want it for DHCP, etc.); * then think about having a journal collecting journals from multiple machines in a central repository; As such, wouldn't a clustering key (like service type, or service type + pid, etc.) would make sense? This would imply: * splitting storage based on this clustering key; (not necessarily one per file, but maybe using some consistent hashing technique, etc.) * having the clustering key as a parameter for querying to restrict index search, etc. Not sure I grok this. Of course all what I've described in the beginning could be emulated with the current journal, either by introducing a special field, or by using the journal library with multiple files (which I haven't checked if it is possible). In general our recommendation is to write as much as possible into the journal as payload, and do filtering afterwards rather then before. i.e. the journal should be the centralization point for things, where different views enable different uses. The main reason the current per-user split logic exists is access control, since by splitting things up in files we can easily use FS ACLs for this, instead of introducing a centralized arbitration engine that enforces access rights. Lennart -- Lennart Poettering - Red Hat, Inc. ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] [ANNOUNCE] Journal File Format Documentation
On Tue, 23.10.12 20:13, Ciprian Dorin Craciun (ciprian.crac...@gmail.com) wrote: In the journal file format indexing field objects that are only referenced once is practically free, as instead of storing an offset to the bsearch object we use for indexing we just store the offset of the referencing entry in-line. I guess those offsets are quite cheap, and the in-line entry for the once-only data are ok. But (from what I understand) every value you store has to be searched through the file before storing (to see if it already exists as a value). Thus wouldn't this impact CPU usage? Looking for pre-existing objects is cheap. It's a hashtable, and hence effectively O(1). The hash table should usually be cached in memory quickly. If the hash table gets too full (over 75% fill-level) we simply rotate the file and start anew. This should result in O(1) all across the hash table as collisions should be the exception. The other thing is simply that the stuff is really integrated with each other. The journal sources are small because we reuse a lot of internal C APIs of systemd, and the format exposes a lot of things that are specific to systemd, for example the vocabulary of well-known fields is closely bound to systemd. I understand this issue with the focus. Nevertheless your journal idea sounds nice, and I hope someone will take it and implement it in a standalone variant. (I hope in a native compilable language...) Why? Why would anybody want to use the journal but not systemd? People who have issues with the latter usually are not rational about these things, and probably have a more philosophical/religious issue with systemd, but then will also have the issues with the journal since it follows the same philosophy and thinking. Also, note that the journal file access in libsystemd-journal works fine on non-systemd too. People can just split this off if they want, and use it indepdently of systemd, the same way they already do it with udev. No need to implement anything anew. The network model existing since day one is one where we rely on existing file sharing infrastructure to transfer/collect files. I.e. use NFS, SMB, FTP, WebDAV, SCP, rsync whatever suits you, and make available at one spot, and journactl -m will interleave them as necessary. By interleave you mean only taking note of new files, not actually rewriting the contents. By interleaving I simply mean interleaving on display, i.e. taking various files from various sources, and presenting them as a continous stream, even though they actually come from many sources. Various sources can be: rotated journals, per-user journal files, journal files from containers of the local host, journal files from other hosts, and more. Lennart -- Lennart Poettering - Red Hat, Inc. ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] [ANNOUNCE] Journal File Format Documentation
On Tue, 23.10.12 23:43, Alexander E. Patrakov (patra...@gmail.com) wrote: 2012/10/21 Lennart Poettering lenn...@poettering.net: Heya, I have now found the time to document the journal file format: http://www.freedesktop.org/wiki/Software/systemd/journal-files Comments welcome! The doc says these two things: 1) The format is designed to be read and written via memory mapping using multiple mapped windows. 2) A reader should verify all offsets and other data as it reads it. This includes checking for alignment and range of offsets in the file, especially before trying to read it via a memory map. I am worried by the fact that it is not specified what happens if a reader tries to read a file manipulated by a bad writer. Namely, the one that repeatedly writes some valid data into the log in order to lure readers into this area, and then truncates or overwrites it in hope to trigger a SIGBUS or something worse in readers. IMHO if a reader cannot trust the concurrent writer of the file to behave nicely, mmap-based reading should be outright banned. So please - either establish and document some kind of trust model between the reader and the writer, or ban mmap-based reading of non-archived journal files completely. Yeah, we never made this explicit or documented this, but precisely this is the reason why per-user journal files which we split off for access control reasons are not actually owned by the users, but simply accessible to users via file systems ACLs: we want to ensure that users cannot truncate the files to cause journald (or another journalctl) to SIGBUS. With ACLs we can give users read access without the ability to modify their own files. With the journal we try to follow the rule that for mmap() there might be a security boundary between readers and writers, but if so, then it must be privileged on the writer side and unprivileged on the read side, and this must be reflected on the access rights of the file. This probably deserves documentation somewhere. (added to the todo list) Lennart -- Lennart Poettering - Red Hat, Inc. ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] [ANNOUNCE] Journal File Format Documentation
On Tue, Oct 23, 2012 at 9:40 PM, Lennart Poettering lenn...@poettering.net wrote: But note that the price you pay for interleaving files on display grows with the more you split things up (O(n) being n number of files to interleave), hence we are a bit conservative here, we don't want to push people towards splitting up things too much, unless they have a really good reason to. By interleaving I guess you mean: when querying for logs the system will have to open all files and read from them at the same time to give the impression of a merged log, sorted by timestamp (or a similar key). In this case in my use-case this is not an issue, as the real-time logs are required for a particular process, and not for the entire system. BTW, are you sure you actually need processes to split up by? Wouldn't services be more appropriate? When I say processes I actually mean: a couple of processes acting together as an integral logical unit. (Like PostgreSQL which has multiple processes which behave as one group.) And the way I see benefiting from systemd would be creating containers (like LXC) for each such process. As such, wouldn't a clustering key (like service type, or service type + pid, etc.) would make sense? This would imply: * splitting storage based on this clustering key; (not necessarily one per file, but maybe using some consistent hashing technique, etc.) * having the clustering key as a parameter for querying to restrict index search, etc. Not sure I grok this. By cluster key I mean a special key that would direct the entry to one log file or another. In the normal case such a cluster key would be the login user name, etc. (This would also allow events from the same source endup in different log files based on this key.) In one word: a way to partition entries into multiple log files, by setting this special field. ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] [ANNOUNCE] Journal File Format Documentation
On Tue, Oct 23, 2012 at 9:49 PM, Lennart Poettering lenn...@poettering.net wrote: On Tue, 23.10.12 20:13, Ciprian Dorin Craciun (ciprian.crac...@gmail.com) wrote: The other thing is simply that the stuff is really integrated with each other. The journal sources are small because we reuse a lot of internal C APIs of systemd, and the format exposes a lot of things that are specific to systemd, for example the vocabulary of well-known fields is closely bound to systemd. I understand this issue with the focus. Nevertheless your journal idea sounds nice, and I hope someone will take it and implement it in a standalone variant. (I hope in a native compilable language...) Why? Why would anybody want to use the journal but not systemd? People who have issues with the latter usually are not rational about these things, and probably have a more philosophical/religious issue with systemd, but then will also have the issues with the journal since it follows the same philosophy and thinking. Ok. Just to state my bias: I'm currently neutral in the SysV / BSD init vs systemd. I **really** do want to get rid of all the Bash-ism initializing my system (actually I would ban Bash from newer projects). But this is a totally different topic which has been discussed on almost all the mailing lists related to Linux that I'm subscribed to... Thus this is not the direction I want to head this discussion. My real motive for such a detached journaling system, I hope is clear from my other sub-thread of this one related with separating log files: that is I'm searching for a logging system suitable for a PaaS... Also, note that the journal file access in libsystemd-journal works fine on non-systemd too. People can just split this off if they want, and use it indepdently of systemd, the same way they already do it with udev. No need to implement anything anew. Aha. So I could reuse the `libsystemd-journal` without any systemd attachments. Good to know. Hope I didn't start a flame-war with this one. :) Ciprian. ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] [ANNOUNCE] Journal File Format Documentation
On Tue, 23.10.12 22:02, Ciprian Dorin Craciun (ciprian.crac...@gmail.com) wrote: On Tue, Oct 23, 2012 at 9:40 PM, Lennart Poettering lenn...@poettering.net wrote: But note that the price you pay for interleaving files on display grows with the more you split things up (O(n) being n number of files to interleave), hence we are a bit conservative here, we don't want to push people towards splitting up things too much, unless they have a really good reason to. By interleaving I guess you mean: when querying for logs the system will have to open all files and read from them at the same time to give the impression of a merged log, sorted by timestamp (or a similar key). Yes, turning a number of fragments in various files into one stream of monotonically increasing timestamps. BTW, are you sure you actually need processes to split up by? Wouldn't services be more appropriate? When I say processes I actually mean: a couple of processes acting together as an integral logical unit. (Like PostgreSQL which has multiple processes which behave as one group.) Yeah, on systemd that's called a service, and is implemented as as a cgroup on the lower layers. The journal automatically indexes by service. Try journalctl -u avahi-daemon.service to get all messages from avahi, and avahi only. And the way I see benefiting from systemd would be creating containers (like LXC) for each such process. Our story regarding containers (i.e. where a new PID 1 in the container is running on a host system) is that we suggest that each container runs its own journald instance, and generates is own files, but registers that in the host via symlinks in /var/log/journal. See http://www.freedesktop.org/wiki/Software/systemd/ContainerInterface for more info about that. That way journalctl -m on the host will show you all logs from all containers, nicely interleaved. * having the clustering key as a parameter for querying to restrict index search, etc. Not sure I grok this. By cluster key I mean a special key that would direct the entry to one log file or another. In the normal case such a cluster key would be the login user name, etc. (This would also allow events from the same source endup in different log files based on this key.) In one word: a way to partition entries into multiple log files, by setting this special field. As mentioned we have SplitMode= for this, but it is strictly for UIDs only, since we only need this for access control management, nothing else. Why precisely do you want to split up your log files per-service? That's the bit I don't get. Lennart -- Lennart Poettering - Red Hat, Inc. ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] [ANNOUNCE] Journal File Format Documentation
On Tue, 23.10.12 22:14, Ciprian Dorin Craciun (ciprian.crac...@gmail.com) wrote: Why? Why would anybody want to use the journal but not systemd? People who have issues with the latter usually are not rational about these things, and probably have a more philosophical/religious issue with systemd, but then will also have the issues with the journal since it follows the same philosophy and thinking. Ok. Just to state my bias: I'm currently neutral in the SysV / BSD init vs systemd. I **really** do want to get rid of all the Bash-ism initializing my system (actually I would ban Bash from newer projects). But this is a totally different topic which has been discussed on almost all the mailing lists related to Linux that I'm subscribed to... I am sorry I have to ask, but what's wrong with bash? I mean, it's a shell, but what is worse or better than any other shell about it? What's the benefit of dealing with multiple implementations of a shell? Do you want to waste memory, increase your test matrix, complicate the use, yadda, yadda? I really, really don't buy in this Debian ideology of bash is bad, but dash is awesome, that's just intense BS. Lennart -- Lennart Poettering - Red Hat, Inc. ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] [ANNOUNCE] Journal File Format Documentation
On Tue, Oct 23, 2012 at 10:18 PM, Lennart Poettering lenn...@poettering.net wrote: On Tue, 23.10.12 22:02, Ciprian Dorin Craciun (ciprian.crac...@gmail.com) wrote: And the way I see benefiting from systemd would be creating containers (like LXC) for each such process. Our story regarding containers (i.e. where a new PID 1 in the container is running on a host system) is that we suggest that each container runs its own journald instance, and generates is own files, but registers that in the host via symlinks in /var/log/journal. See http://www.freedesktop.org/wiki/Software/systemd/ContainerInterface for more info about that. That way journalctl -m on the host will show you all logs from all containers, nicely interleaved. Aha. Thanks for that pointer. (The only issue with this is that I must trust the service running inside the container to do the right thing, which could be a problem if I run untrusted code that I want to isolate.) But I'll give this one a look. In one word: a way to partition entries into multiple log files, by setting this special field. As mentioned we have SplitMode= for this, but it is strictly for UIDs only, since we only need this for access control management, nothing else. This could be another solution to my problem. Allocate a different UID to each service. Why precisely do you want to split up your log files per-service? That's the bit I don't get. Because in the envisaged PaaS, you have components (services) starting and stopping. Thus I want to be able to easily just remove logs for dead services, or maybe just move them to a different archival service where they get deleted after a period of time. It's purely for administrative purposes. Maybe even to allow the user to download these log files independently. But I understand now how to best solve this requirement without touching the core journald. Thanks, Ciprian. ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] [ANNOUNCE] Journal File Format Documentation
On Tue, 23.10.12 22:27, Ciprian Dorin Craciun (ciprian.crac...@gmail.com) wrote: In one word: a way to partition entries into multiple log files, by setting this special field. As mentioned we have SplitMode= for this, but it is strictly for UIDs only, since we only need this for access control management, nothing else. This could be another solution to my problem. Allocate a different UID to each service. This is what the Pantheon folks are doing. Instead of running full OSes in their containers they run large number of sandboxes services (using systemd's sandboxing logic), each under a UID of their own. Why precisely do you want to split up your log files per-service? That's the bit I don't get. Because in the envisaged PaaS, you have components (services) starting and stopping. Thus I want to be able to easily just remove logs for dead services, or maybe just move them to a different archival service where they get deleted after a period of time. It's purely for administrative purposes. Maybe even to allow the user to download these log files independently. But I understand now how to best solve this requirement without touching the core journald. Well, but with the journal you can easily filter by service. i.e. journalctl -u foobar will give you a stream that only includes messages form service foobar, but it will look otherwise like /var/log/messages looked. So, unless you have access mode restrictions or really really need to make sure that as soon as a container goes away its logs go away too you could just leave everything in a one pool and then filter on display/download. BTW, you with systemd 195 you can do this: snip systemctl enable systemd-journal-gatewayd.socket wget http://localhost:19531/entries?_SYSTEMD_UNIT=foobar.service /snip And this will give you all messages from foobar.service easy for download, even from another host. Or use: wget --header=Accept: application/json http://localhost:19531/entries?_HOSTNAME=waldo And you'll get a JSON formatted dump of all messages for host/container waldo. And you can easily process that then from your web app to present a per-container log stream to the user. Lennart -- Lennart Poettering - Red Hat, Inc. ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] [ANNOUNCE] Journal File Format Documentation
On 10/23/2012 06:40 PM, Lennart Poettering wrote: The journal currently cannot do this for you, but what it already can is split up the journal per-user. This is done by default only for login users, (i.e. actual human users), but with the SplitMode= setting in journald.conf can be enabled for system users as well, or turned off entirely. We could extend this switch to allow other split-up schemes. But note that the price you pay for interleaving files on display grows with the more you split things up (O(n) being n number of files to interleave), hence we are a bit conservative here, we don't want to push people towards splitting up things too much, unless they have a really good reason to. If I'm understanding this correctly would it not better simply/suffician support splitting the journal up via cli instead of doing it real time ? JBG ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] [ANNOUNCE] Journal File Format Documentation
On Tue, 23.10.12 20:52, Jóhann B. Guðmundsson (johan...@gmail.com) wrote: On 10/23/2012 06:40 PM, Lennart Poettering wrote: The journal currently cannot do this for you, but what it already can is split up the journal per-user. This is done by default only for login users, (i.e. actual human users), but with the SplitMode= setting in journald.conf can be enabled for system users as well, or turned off entirely. We could extend this switch to allow other split-up schemes. But note that the price you pay for interleaving files on display grows with the more you split things up (O(n) being n number of files to interleave), hence we are a bit conservative here, we don't want to push people towards splitting up things too much, unless they have a really good reason to. If I'm understanding this correctly would it not better simply/suffician support splitting the journal up via cli instead of doing it real time ? We might add this as a tool one day, but I think it's a good rule to write logs once, and not touch them afterwrads if at all possible, in order not to corrupt what is already written safely. Hence: it's probably a good idea to focus on writing things the right way the first time, instead of focusing on on rewriting them afterwards. Related to the tool you are suggesting I think a tool to merge split off files might be very useful too, to counter the scalability issues of interleaving too many separate files on display. Lennart -- Lennart Poettering - Red Hat, Inc. ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] [ANNOUNCE] Journal File Format Documentation
On 10/23/2012 09:19 PM, Lennart Poettering wrote: Related to the tool you are suggesting I think a tool to merge split off files might be very useful too, to counter the scalability issues of interleaving too many separate files on display. Yeah an extension to the journalctl and probably users would like to do as the part of the process when the journal files get rotated on disk ( rotate -- split ) JBG ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/systemd-devel
[systemd-devel] [ANNOUNCE] Journal File Format Documentation
Heya, I have now found the time to document the journal file format: http://www.freedesktop.org/wiki/Software/systemd/journal-files Comments welcome! (Oh, and it's in the fdo wiki, so if you see a typo or so, go ahead and fix it!) Lennart -- Lennart Poettering - Red Hat, Inc. ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/systemd-devel