Re: [systemd-devel] [ANNOUNCE] Journal File Format Documentation

2012-10-23 Thread Ciprian Dorin Craciun
On Sun, Oct 21, 2012 at 1:05 AM, Lennart Poettering
lenn...@poettering.net wrote:
 Heya,

 I have now found the time to document the journal file format:

 http://www.freedesktop.org/wiki/Software/systemd/journal-files

 Comments welcome!

 (Oh, and it's in the fdo wiki, so if you see a typo or so, go ahead and
 fix it!)


I've quickly read through the document [1] pointed above
describing the log on disk format. (I've read it out of curiosity, not
with the intention to implement it. Maybe to steal some ideas for
the future.) :)
(Just as an observation I've also read your essay about this
subject [2], and I agree with most of it.)

But what I couldn't find in any of these documents (maybe there is
in another one), is a justification of the current technical (i.e.
implementation) decisions. Mainly:

Why did you resort to implementing a new database format, and
didn't choose an existing embedded library like BerkeleyDB, LevelDB,
etc.? (Advantages / disadvantages?)

Just to be clear I don't mean the decision of a completely new
logging format (i.e. not compatible with syslog, CEE, etc.) I'm
focusing just on the storage engine.

For example I could think of:
* not relaying on an external embedded storage library, makes the
resulting binaries smaller and simpler; (this doesn't seem to be it as
systemd already incorporates some other libraries;)
* having a custom log implementation makes this job more
efficient; (in the long term I don't think beating a library like the
ones listed above is feasible without a lot of work;)
* log rotation is almost impossible with some of the above cited
libraries; (there are other tricks that can be done to achieve such;)

Please don't take this as critique of the journal work (I've read
some of those too, especially from Rainer, but I'm neutral so far). I
just want to understand the decisions.

Ciprian.


[1] http://www.freedesktop.org/wiki/Software/systemd/journal-files
[2] 
https://docs.google.com/document/pub?id=1IC9yOXj7j6cdLLxWEBAGRL6wl97tFxgjLUEHIX3MSTspli=1
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] [ANNOUNCE] Journal File Format Documentation

2012-10-23 Thread Lennart Poettering
On Tue, 23.10.12 15:25, Ciprian Dorin Craciun (ciprian.crac...@gmail.com) wrote:

 But what I couldn't find in any of these documents (maybe there is
 in another one), is a justification of the current technical (i.e.
 implementation) decisions. Mainly:

That's a valid question to raise.

 Why did you resort to implementing a new database format, and
 didn't choose an existing embedded library like BerkeleyDB, LevelDB,
 etc.? (Advantages / disadvantages?)

There are a number of reasons, which one could summarize as: because
there is no existing database implementation that would fit the bill:

- we needed something small, embeddable, in pure C, so that we can pull
  it in everywhere. That has a somewhat stable API, is sanely managed
  upstream, and Free Software. We are OK to add deps to systemd, if
  there's a good reason to and the dep is well managed. It needed to be
  OOM safe.

- The database should be typeless, and index by all fields, rather than
  require fixed schemas. It should efficient with large and binary data.

- It should not require file locks or communication between multiple
  readers or between readers and the writer. This is primarily a
  question of security (we cannot allow users to lock out root or the
  writer from acessing the logs by taking a lock) and network
  transparency (file locks on network FS are very very flaky), but also
  performance.

- We wanted something robust for IO failures that focusses on appending
  new data to the end, rather than overwriting data constantly. 

- We needed something with in-line compression, and where we can add
  stuff like FSS to

These are the strong requirements, but there are other are ore things to
keep in mind: because of the structure of log data, which knows no
changes but only appends and the occasional deletion of large chunks,
and were data is generally montonically ordered you can a lot of things
you cannot do in normal databases.

rsyslog apparently chose to use ElasticSearch. It think ElasticSearch is
cool, but it already fails for us on the most superficial of things, in
that it would be quite ridiculous to pull in Java into all systems for
that... ;-)

Lennart

-- 
Lennart Poettering - Red Hat, Inc.
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] [ANNOUNCE] Journal File Format Documentation

2012-10-23 Thread Ciprian Dorin Craciun
On Tue, Oct 23, 2012 at 5:39 PM, Lennart Poettering
lenn...@poettering.net wrote:
 On Tue, 23.10.12 15:25, Ciprian Dorin Craciun (ciprian.crac...@gmail.com) 
 wrote:
 Why did you resort to implementing a new database format, and
 didn't choose an existing embedded library like BerkeleyDB, LevelDB,
 etc.? (Advantages / disadvantages?)

 There are a number of reasons, which one could summarize as: because
 there is no existing database implementation that would fit the bill:

 - we needed something small, embeddable, in pure C, so that we can pull
   it in everywhere. That has a somewhat stable API, is sanely managed
   upstream, and Free Software. We are OK to add deps to systemd, if
   there's a good reason to and the dep is well managed. It needed to be
   OOM safe.

 - We wanted something robust for IO failures that focusses on appending
   new data to the end, rather than overwriting data constantly.

 - We needed something with in-line compression, and where we can add
   stuff like FSS to

Ok. I agree that there are very few libraries that fit here. All I
can think of making into here would be BerkeleyDB (but it fails other
requirements you've listed below).


 - The database should be typeless, and index by all fields, rather than
   require fixed schemas. It should efficient with large and binary data.

One thing bothers me: why should it index all fields? (For example
indexing by UID, executable, service, etc. makes sense, but I don't
think indexing by message is that worthwhile... Moreover by PID or
coredump (which I think it is hinted is stored in the journal) doesn't
make too much sense either...)


 - It should not require file locks or communication between multiple
   readers or between readers and the writer. This is primarily a
   question of security (we cannot allow users to lock out root or the
   writer from acessing the logs by taking a lock) and network
   transparency (file locks on network FS are very very flaky), but also
   performance.

From what I see this is the best reason for the current proposal.
Indeed no embedded database library (that I know of) allows both
reading and writing at the same time from multiple processes without
locking. (Except maybe DJB's CDB and the TinyCDB implementation, but
that wouldn't fit the bill here.)

Maybe this should go at the top of that document as describing why?.


 These are the strong requirements, but there are other are ore things to
 keep in mind: because of the structure of log data, which knows no
 changes but only appends and the occasional deletion of large chunks,
 and were data is generally montonically ordered you can a lot of things
 you cannot do in normal databases.

Although I partially agree about this increased flexibility,
having a custom format means it is very easy to just start adding
features, thus accumulating cruft... Thus maybe a general purpose
system would have limited this tendency...


 rsyslog apparently chose to use ElasticSearch. It think ElasticSearch is
 cool, but it already fails for us on the most superficial of things, in
 that it would be quite ridiculous to pull in Java into all systems for
 that... ;-)

I don't even want to imply such a thing solution. (Or at least not
for a standalone computer logging system.)


BTW, a little bit off-topic:
* why didn't you try to implement this journal system as a
standalone library, that could have been reused by other systems
independently of systemd; (I know this was answered in [2], and that
the focus is on systemd, but it seems it took quite a lot of work, and
it's a pity it can't be individually reused);
* how do you intend to implement something resembly syslog's log
centralization?

Ciprian.
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] [ANNOUNCE] Journal File Format Documentation

2012-10-23 Thread Ciprian Dorin Craciun
On Sun, Oct 21, 2012 at 1:05 AM, Lennart Poettering
lenn...@poettering.net wrote:
 Heya,

 I have now found the time to document the journal file format:

 http://www.freedesktop.org/wiki/Software/systemd/journal-files

 Comments welcome!


(Replying directly to this as I want to start another sub-thread...)

I'm currently searching for a logging system that has the
following feature, which I'm guessing could also be beneficial for
systemd on larger systems:
* I have multiple processes that I want to log individually; by
multiple I mean about 100+ in total (not necessarily on the same
system);
* moreover these processes are quite dynamic (as in spawn /
terminate) hourly or daily;
* I need to control the retention policy per process not per entire system;
* if needed I want to be able to archive these logs in a
per-process (or process type) basis;
* as bonus I would like to be able to migrate the logs for a
particular process to another system;
(In case anyone is wondering what I'm describing, it is a PaaS
logging system similar with Heroku's logplex, etc.)

The parallel with systemd:
* think instead of my processes, of user-sessions and services; (I
want to keep some service's (like `sshd`) logs for more time than I
want it for DHCP, etc.);
* then think about having a journal collecting journals from
multiple machines in a central repository;

As such, wouldn't a clustering key (like service type, or
service type + pid, etc.) would make sense? This would imply:
* splitting storage based on this clustering key; (not
necessarily one per file, but maybe using some consistent hashing
technique, etc.)
* having the clustering key as a parameter for querying to
restrict index search, etc.

Of course all what I've described in the beginning could be
emulated with the current journal, either by introducing a special
field, or by using the journal library with multiple files (which I
haven't checked if it is possible).

Ciprian.
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] [ANNOUNCE] Journal File Format Documentation

2012-10-23 Thread Lennart Poettering
On Tue, 23.10.12 18:48, Ciprian Dorin Craciun (ciprian.crac...@gmail.com) wrote:

  - We needed something with in-line compression, and where we can add
stuff like FSS to
 
 Ok. I agree that there are very few libraries that fit here. All I
 can think of making into here would be BerkeleyDB (but it fails other
 requirements you've listed below).

I used BDB in one other project and it makes me shudder. The constant
ABI breakages and disk corruptions where aful. Heck, yum as one user of
it breaks every second Fedora release with a BDB error where the usual
recipe is to remove the yum BDB database so that it is regenerated on
next invocation. I am pretty much through with BDB.

  - The database should be typeless, and index by all fields, rather than
require fixed schemas. It should efficient with large and binary data.
 
 One thing bothers me: why should it index all fields? (For example
 indexing by UID, executable, service, etc. makes sense, but I don't
 think indexing by message is that worthwhile... Moreover by PID or
 coredump (which I think it is hinted is stored in the journal) doesn't
 make too much sense either...)

Sure, not all fields make sense, but many many do, and we don't really
know in advance which ones will, as the vocabulary can be extended by
anybody. For example, if Apache decided to do structured logging for
errors indexed vserver, then I'd love that, but of course the vserver
match would be unknown to everybody else. And that's cool. 

So, the idea here is really to just index everything, and make it cheap
so that if something actually never made sense to be indexed is cheap.

In the journal file format indexing field objects that are only
referenced once is practically free, as instead of storing an offset to
the bsearch object we use for indexing we just store the offset of the
referencing entry in-line.

  - It should not require file locks or communication between multiple
readers or between readers and the writer. This is primarily a
question of security (we cannot allow users to lock out root or the
writer from acessing the logs by taking a lock) and network
transparency (file locks on network FS are very very flaky), but also
performance.
 
 From what I see this is the best reason for the current proposal.
 Indeed no embedded database library (that I know of) allows both
 reading and writing at the same time from multiple processes without
 locking. (Except maybe DJB's CDB and the TinyCDB implementation, but
 that wouldn't fit the bill here.)
 
 Maybe this should go at the top of that document as describing
 why?.

I have added a link to this very thread to the document now to the first
section of the document.

 Although I partially agree about this increased flexibility,
 having a custom format means it is very easy to just start adding
 features, thus accumulating cruft... Thus maybe a general purpose
 system would have limited this tendency...

Well, we really try hard to stay focussed, and want the journal to do
the things it does well, but not become a super-flexible store-anything
database. Of course, we want to pave the way for future extensions, and
that's why we built extensibility into the format (and there's a section
about that in the spec).

 BTW, a little bit off-topic:
 * why didn't you try to implement this journal system as a
 standalone library, that could have been reused by other systems
 independently of systemd; (I know this was answered in [2], and that
 the focus is on systemd, but it seems it took quite a lot of work, and
 it's a pity it can't be individually reused);

Well, for starters we want to be careful where we guarantee interface
stability and where not. For example, the C API gives you a very
high-level view on the journals, where interleaving of multiple files is
hidden. However, if we'd split this out then we'd have to expose much
more guts of the implementation, and provide stable API for that, and
that's something I don't want. WE want the ability to change the
internals of the implementation around and guarantee stability only at
the most high level C API that hides it all.

The other thing is simply that the stuff is really integrated with each
other. The journal sources are small because we reuse a lot of internal
C APIs of systemd, and the format exposes a lot of things that are
specific to systemd, for example the vocabulary of well-known fields is
closely bound to systemd.

Also, we believe in the systemd, and in the journal and tight
integration between the two, as we consider logging an essential facet
of service management. It would be against our goals here to separate
them out and turn them back into non-integrated components.

 * how do you intend to implement something resembly syslog's log
 centralization?

The network model existing since day one is one where we rely on
existing file sharing infrastructure to transfer/collect files. I.e. use
NFS, SMB, FTP, 

Re: [systemd-devel] [ANNOUNCE] Journal File Format Documentation

2012-10-23 Thread Ciprian Dorin Craciun
On Tue, Oct 23, 2012 at 7:33 PM, Lennart Poettering
lenn...@poettering.net wrote:
 On Tue, 23.10.12 18:48, Ciprian Dorin Craciun (ciprian.crac...@gmail.com) 
 wrote:

  - We needed something with in-line compression, and where we can add
stuff like FSS to

 Ok. I agree that there are very few libraries that fit here. All I
 can think of making into here would be BerkeleyDB (but it fails other
 requirements you've listed below).

 I used BDB in one other project and it makes me shudder. The constant
 ABI breakages and disk corruptions where aful. Heck, yum as one user of
 it breaks every second Fedora release with a BDB error where the usual
 recipe is to remove the yum BDB database so that it is regenerated on
 next invocation. I am pretty much through with BDB.

:) Yes, now I remember... Each time I upgrade BerkeleyDB `isync`
and `bogofilter` gets me in hell...


 So, the idea here is really to just index everything, and make it cheap
 so that if something actually never made sense to be indexed is cheap.

 In the journal file format indexing field objects that are only
 referenced once is practically free, as instead of storing an offset to
 the bsearch object we use for indexing we just store the offset of the
 referencing entry in-line.

I guess those offsets are quite cheap, and the in-line entry for
the once-only data are ok. But (from what I understand) every value
you store has to be searched through the file before storing (to see
if it already exists as a value). Thus wouldn't this impact CPU usage?


 BTW, a little bit off-topic:
 * why didn't you try to implement this journal system as a
 standalone library, that could have been reused by other systems
 independently of systemd; (I know this was answered in [2], and that
 the focus is on systemd, but it seems it took quite a lot of work, and
 it's a pity it can't be individually reused);

 Well, for starters we want to be careful where we guarantee interface
 stability and where not. For example, the C API gives you a very
 high-level view on the journals, where interleaving of multiple files is
 hidden. However, if we'd split this out then we'd have to expose much
 more guts of the implementation, and provide stable API for that, and
 that's something I don't want. WE want the ability to change the
 internals of the implementation around and guarantee stability only at
 the most high level C API that hides it all.

 The other thing is simply that the stuff is really integrated with each
 other. The journal sources are small because we reuse a lot of internal
 C APIs of systemd, and the format exposes a lot of things that are
 specific to systemd, for example the vocabulary of well-known fields is
 closely bound to systemd.

I understand this issue with the focus. Nevertheless your journal
idea sounds nice, and I hope someone will take it and implement it in
a standalone variant. (I hope in a native compilable language...)


 * how do you intend to implement something resembly syslog's log
 centralization?

 The network model existing since day one is one where we rely on
 existing file sharing infrastructure to transfer/collect files. I.e. use
 NFS, SMB, FTP, WebDAV, SCP, rsync whatever suits you, and make available
 at one spot, and journactl -m will interleave them as necessary.

By interleave you mean only taking note of new files, not
actually rewriting the contents.

About the NFS (and other shared FS's) as storage backend I'm not
that certain... (And the same about the rest of `scp`, `rsync`, etc.)

Maybe a command / option (maybe backed by a `ssh` channel a-la
`rsync`) to fetch efficiently journal from other machines. (Not as you
describe below, which seems geared towards integration, but strictly
geared towards collection.)


 I am curently working on getting log syncing via both a PUSH and PULL
 model done. This will be based one existing protocols and standards as
 much as we can (SSH or HTTP/HTTPS as transport, and JSON and more as
 payload), and is flexible for others to hook into. For example, I think
 it would be cool if greylog2 and similar software would just pull the
 data out of the journal on its own, simply via HTTP/JSON. We make
 everything available to make this smooth, i.e. we provide clients with
 stable cursors which they can use to restart operation.

Aha. Kind of answers my previous question.

Thanks for your time,
Ciprian.
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] [ANNOUNCE] Journal File Format Documentation

2012-10-23 Thread Lennart Poettering
On Tue, 23.10.12 19:11, Ciprian Dorin Craciun (ciprian.crac...@gmail.com) wrote:

 
 On Sun, Oct 21, 2012 at 1:05 AM, Lennart Poettering
 lenn...@poettering.net wrote:
  Heya,
 
  I have now found the time to document the journal file format:
 
  http://www.freedesktop.org/wiki/Software/systemd/journal-files
 
  Comments welcome!
 
 
 (Replying directly to this as I want to start another sub-thread...)
 
 I'm currently searching for a logging system that has the
 following feature, which I'm guessing could also be beneficial for
 systemd on larger systems:
 * I have multiple processes that I want to log individually; by
 multiple I mean about 100+ in total (not necessarily on the same
 system);
 * moreover these processes are quite dynamic (as in spawn /
 terminate) hourly or daily;
 * I need to control the retention policy per process not per entire 
 system;
 * if needed I want to be able to archive these logs in a
 per-process (or process type) basis;
 * as bonus I would like to be able to migrate the logs for a
 particular process to another system;
 (In case anyone is wondering what I'm describing, it is a PaaS
 logging system similar with Heroku's logplex, etc.)

The journal currently cannot do this for you, but what it already can is
split up the journal per-user. This is done by default only for login
users, (i.e. actual human users), but with the SplitMode= setting in
journald.conf can be enabled for system users as well, or turned off
entirely. We could extend this switch to allow other split-up schemes.

But note that the price you pay for interleaving files on display grows
with the more you split things up (O(n) being n number of files to
interleave), hence we are a bit conservative here, we don't want to push
people towards splitting up things too much, unless they have a really
good reason to.

BTW, are you sure you actually need processes to split up by? Wouldn't
services be more appropriate?

 The parallel with systemd:
 * think instead of my processes, of user-sessions and services; (I
 want to keep some service's (like `sshd`) logs for more time than I
 want it for DHCP, etc.);
 * then think about having a journal collecting journals from
 multiple machines in a central repository;
 
 As such, wouldn't a clustering key (like service type, or
 service type + pid, etc.) would make sense? This would imply:
 * splitting storage based on this clustering key; (not
 necessarily one per file, but maybe using some consistent hashing
 technique, etc.)
 * having the clustering key as a parameter for querying to
 restrict index search, etc.

Not sure I grok this.

 Of course all what I've described in the beginning could be
 emulated with the current journal, either by introducing a special
 field, or by using the journal library with multiple files (which I
 haven't checked if it is possible).

In general our recommendation is to write as much as possible into the
journal as payload, and do filtering afterwards rather then
before. i.e. the journal should be the centralization point for things,
where different views enable different uses.

The main reason the current per-user split logic exists is access
control, since by splitting things up in files we can easily use FS ACLs
for this, instead of introducing a centralized arbitration engine that
enforces access rights.

Lennart

-- 
Lennart Poettering - Red Hat, Inc.
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] [ANNOUNCE] Journal File Format Documentation

2012-10-23 Thread Lennart Poettering
On Tue, 23.10.12 20:13, Ciprian Dorin Craciun (ciprian.crac...@gmail.com) wrote:

  In the journal file format indexing field objects that are only
  referenced once is practically free, as instead of storing an offset to
  the bsearch object we use for indexing we just store the offset of the
  referencing entry in-line.
 
 I guess those offsets are quite cheap, and the in-line entry for
 the once-only data are ok. But (from what I understand) every value
 you store has to be searched through the file before storing (to see
 if it already exists as a value). Thus wouldn't this impact CPU usage?

Looking for pre-existing objects is cheap. It's a hashtable, and hence
effectively O(1). The hash table should usually be cached in memory
quickly.

If the hash table gets too full (over 75% fill-level) we simply rotate
the file and start anew. This should result in O(1) all across the hash
table as collisions should be the exception.

  The other thing is simply that the stuff is really integrated with each
  other. The journal sources are small because we reuse a lot of internal
  C APIs of systemd, and the format exposes a lot of things that are
  specific to systemd, for example the vocabulary of well-known fields is
  closely bound to systemd.
 
 I understand this issue with the focus. Nevertheless your journal
 idea sounds nice, and I hope someone will take it and implement it in
 a standalone variant. (I hope in a native compilable language...)

Why? Why would anybody want to use the journal but not systemd? People
who have issues with the latter usually are not rational about these
things, and probably have a more philosophical/religious issue with
systemd, but then will also have the issues with the journal since it
follows the same philosophy and thinking.

Also, note that the journal file access in libsystemd-journal works fine
on non-systemd too. People can just split this off if they want, and use
it indepdently of systemd, the same way they already do it with udev. No
need to implement anything anew.

  The network model existing since day one is one where we rely on
  existing file sharing infrastructure to transfer/collect files. I.e. use
  NFS, SMB, FTP, WebDAV, SCP, rsync whatever suits you, and make available
  at one spot, and journactl -m will interleave them as necessary.
 
 By interleave you mean only taking note of new files, not
 actually rewriting the contents.

By interleaving I simply mean interleaving on display, i.e. taking
various files from various sources, and presenting them as a continous
stream, even though they actually come from many sources. Various
sources can be: rotated journals, per-user journal files, journal files
from containers of the local host, journal files from other hosts, and
more.

Lennart

-- 
Lennart Poettering - Red Hat, Inc.
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] [ANNOUNCE] Journal File Format Documentation

2012-10-23 Thread Lennart Poettering
On Tue, 23.10.12 23:43, Alexander E. Patrakov (patra...@gmail.com) wrote:

 
 2012/10/21 Lennart Poettering lenn...@poettering.net:
  Heya,
 
  I have now found the time to document the journal file format:
 
  http://www.freedesktop.org/wiki/Software/systemd/journal-files
 
  Comments welcome!
 
 The doc says these two things:
 
 1) The format is designed to be read and written via memory mapping
 using multiple mapped windows.
 2) A reader should verify all offsets and other data as it reads it.
 This includes checking for alignment and range of offsets in the file,
 especially before trying to read it via a memory map.
 
 I am worried by the fact that it is not specified what happens if a
 reader tries to read a file manipulated by a bad writer. Namely, the
 one that repeatedly writes some valid data into the log in order to
 lure readers into this area, and then truncates or overwrites it in
 hope to trigger a SIGBUS or something worse in readers.
 
 IMHO if a reader cannot trust the concurrent writer of the file to
 behave nicely, mmap-based reading should be outright banned. So please
 - either establish and document some kind of trust model between the
 reader and the writer, or ban mmap-based reading of non-archived
 journal files completely.

Yeah, we never made this explicit or documented this, but precisely this
is the reason why per-user journal files which we split off for access
control reasons are not actually owned by the users, but simply
accessible to users via file systems ACLs: we want to ensure that users
cannot truncate the files to cause journald (or another journalctl) to
SIGBUS. With ACLs we can give users read access without the ability to
modify their own files.

With the journal we try to follow the rule that for mmap() there might be
a security boundary between readers and writers, but if so, then it must
be privileged on the writer side and unprivileged on the read side, and
this must be reflected on the access rights of the file.

This probably deserves documentation somewhere. (added to the todo list)

Lennart

-- 
Lennart Poettering - Red Hat, Inc.
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] [ANNOUNCE] Journal File Format Documentation

2012-10-23 Thread Ciprian Dorin Craciun
On Tue, Oct 23, 2012 at 9:40 PM, Lennart Poettering
lenn...@poettering.net wrote:
 But note that the price you pay for interleaving files on display grows
 with the more you split things up (O(n) being n number of files to
 interleave), hence we are a bit conservative here, we don't want to push
 people towards splitting up things too much, unless they have a really
 good reason to.

By interleaving I guess you mean: when querying for logs the
system will have to open all files and read from them at the same time
to give the impression of a merged log, sorted by timestamp (or a
similar key).

In this case in my use-case this is not an issue, as the
real-time logs are required for a particular process, and not for
the entire system.


 BTW, are you sure you actually need processes to split up by? Wouldn't
 services be more appropriate?

When I say processes I actually mean: a couple of processes
acting together as an integral logical unit. (Like PostgreSQL which
has multiple processes which behave as one group.)

And the way I see benefiting from systemd would be creating
containers (like LXC) for each such process.


 As such, wouldn't a clustering key (like service type, or
 service type + pid, etc.) would make sense? This would imply:
 * splitting storage based on this clustering key; (not
 necessarily one per file, but maybe using some consistent hashing
 technique, etc.)
 * having the clustering key as a parameter for querying to
 restrict index search, etc.

 Not sure I grok this.

By cluster key I mean a special key that would direct the entry
to one log file or another. In the normal case such a cluster key
would be the login user name, etc. (This would also allow events from
the same source endup in different log files based on this key.)

In one word: a way to partition entries into multiple log files,
by setting this special field.
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] [ANNOUNCE] Journal File Format Documentation

2012-10-23 Thread Ciprian Dorin Craciun
On Tue, Oct 23, 2012 at 9:49 PM, Lennart Poettering
lenn...@poettering.net wrote:
 On Tue, 23.10.12 20:13, Ciprian Dorin Craciun (ciprian.crac...@gmail.com) 
 wrote:
  The other thing is simply that the stuff is really integrated with each
  other. The journal sources are small because we reuse a lot of internal
  C APIs of systemd, and the format exposes a lot of things that are
  specific to systemd, for example the vocabulary of well-known fields is
  closely bound to systemd.

 I understand this issue with the focus. Nevertheless your journal
 idea sounds nice, and I hope someone will take it and implement it in
 a standalone variant. (I hope in a native compilable language...)

 Why? Why would anybody want to use the journal but not systemd? People
 who have issues with the latter usually are not rational about these
 things, and probably have a more philosophical/religious issue with
 systemd, but then will also have the issues with the journal since it
 follows the same philosophy and thinking.

Ok. Just to state my bias: I'm currently neutral in the SysV / BSD
init vs systemd. I **really** do want to get rid of all the Bash-ism
initializing my system (actually I would ban Bash from newer
projects). But this is a totally different topic which has been
discussed on almost all the mailing lists related to Linux that I'm
subscribed to...

Thus this is not the direction I want to head this discussion.


My real motive for such a detached journaling system, I hope is
clear from my other sub-thread of this one related with separating
log files: that is I'm searching for a logging system suitable for a
PaaS...


 Also, note that the journal file access in libsystemd-journal works fine
 on non-systemd too. People can just split this off if they want, and use
 it indepdently of systemd, the same way they already do it with udev. No
 need to implement anything anew.

Aha. So I could reuse the `libsystemd-journal` without any systemd
attachments. Good to know.


Hope I didn't start a flame-war with this one. :)
Ciprian.
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] [ANNOUNCE] Journal File Format Documentation

2012-10-23 Thread Lennart Poettering
On Tue, 23.10.12 22:02, Ciprian Dorin Craciun (ciprian.crac...@gmail.com) wrote:

 
 On Tue, Oct 23, 2012 at 9:40 PM, Lennart Poettering
 lenn...@poettering.net wrote:
  But note that the price you pay for interleaving files on display grows
  with the more you split things up (O(n) being n number of files to
  interleave), hence we are a bit conservative here, we don't want to push
  people towards splitting up things too much, unless they have a really
  good reason to.
 
 By interleaving I guess you mean: when querying for logs the
 system will have to open all files and read from them at the same time
 to give the impression of a merged log, sorted by timestamp (or a
 similar key).

Yes, turning a number of fragments in various files into one stream of
monotonically increasing timestamps.

  BTW, are you sure you actually need processes to split up by? Wouldn't
  services be more appropriate?
 
 When I say processes I actually mean: a couple of processes
 acting together as an integral logical unit. (Like PostgreSQL which
 has multiple processes which behave as one group.)

Yeah, on systemd that's called a service, and is implemented as as a
cgroup on the lower layers. The journal automatically indexes by
service. Try journalctl -u avahi-daemon.service to get all messages
from avahi, and avahi only.

 And the way I see benefiting from systemd would be creating
 containers (like LXC) for each such process.

Our story regarding containers (i.e. where a new PID 1 in the container
is running on a host system) is that we suggest that each container runs
its own journald instance, and generates is own files, but registers
that in the host via symlinks in /var/log/journal. See 

http://www.freedesktop.org/wiki/Software/systemd/ContainerInterface

for more info about that. That way journalctl -m on the host will show
you all logs from all containers, nicely interleaved.

  * having the clustering key as a parameter for querying to
  restrict index search, etc.
 
  Not sure I grok this.
 
 By cluster key I mean a special key that would direct the entry
 to one log file or another. In the normal case such a cluster key
 would be the login user name, etc. (This would also allow events from
 the same source endup in different log files based on this key.)
 
 In one word: a way to partition entries into multiple log files,
 by setting this special field.

As mentioned we have SplitMode= for this, but it is strictly for UIDs
only, since we only need this for access control management, nothing
else.

Why precisely do you want to split up your log files per-service? That's
the bit I don't get.

Lennart

-- 
Lennart Poettering - Red Hat, Inc.
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] [ANNOUNCE] Journal File Format Documentation

2012-10-23 Thread Lennart Poettering
On Tue, 23.10.12 22:14, Ciprian Dorin Craciun (ciprian.crac...@gmail.com) wrote:

  Why? Why would anybody want to use the journal but not systemd? People
  who have issues with the latter usually are not rational about these
  things, and probably have a more philosophical/religious issue with
  systemd, but then will also have the issues with the journal since it
  follows the same philosophy and thinking.
 
 Ok. Just to state my bias: I'm currently neutral in the SysV / BSD
 init vs systemd. I **really** do want to get rid of all the Bash-ism
 initializing my system (actually I would ban Bash from newer
 projects). But this is a totally different topic which has been
 discussed on almost all the mailing lists related to Linux that I'm
 subscribed to...

I am sorry I have to ask, but what's wrong with bash? I mean, it's a
shell, but what is worse or better than any other shell about it? What's
the benefit of dealing with multiple implementations of a shell? Do you
want to waste memory, increase your test matrix, complicate the use,
yadda, yadda? 

I really, really don't buy in this Debian ideology of bash is bad, but
dash is awesome, that's just intense BS.

Lennart

-- 
Lennart Poettering - Red Hat, Inc.
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] [ANNOUNCE] Journal File Format Documentation

2012-10-23 Thread Ciprian Dorin Craciun
On Tue, Oct 23, 2012 at 10:18 PM, Lennart Poettering
lenn...@poettering.net wrote:
 On Tue, 23.10.12 22:02, Ciprian Dorin Craciun (ciprian.crac...@gmail.com) 
 wrote:
 And the way I see benefiting from systemd would be creating
 containers (like LXC) for each such process.

 Our story regarding containers (i.e. where a new PID 1 in the container
 is running on a host system) is that we suggest that each container runs
 its own journald instance, and generates is own files, but registers
 that in the host via symlinks in /var/log/journal. See

 http://www.freedesktop.org/wiki/Software/systemd/ContainerInterface

 for more info about that. That way journalctl -m on the host will show
 you all logs from all containers, nicely interleaved.

Aha. Thanks for that pointer. (The only issue with this is that I
must trust the service running inside the container to do the right
thing, which could be a problem if I run untrusted code that I want to
isolate.)

But I'll give this one a look.


 In one word: a way to partition entries into multiple log files,
 by setting this special field.

 As mentioned we have SplitMode= for this, but it is strictly for UIDs
 only, since we only need this for access control management, nothing
 else.

This could be another solution to my problem. Allocate a different
UID to each service.


 Why precisely do you want to split up your log files per-service? That's
 the bit I don't get.

Because in the envisaged PaaS, you have components (services)
starting and stopping. Thus I want to be able to easily just remove
logs for dead services, or maybe just move them to a different
archival service where they get deleted after a period of time.

It's purely for administrative purposes. Maybe even to allow the
user to download these log files independently.


But I understand now how to best solve this requirement without
touching the core journald.

Thanks,
Ciprian.
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] [ANNOUNCE] Journal File Format Documentation

2012-10-23 Thread Lennart Poettering
On Tue, 23.10.12 22:27, Ciprian Dorin Craciun (ciprian.crac...@gmail.com) wrote:

  In one word: a way to partition entries into multiple log files,
  by setting this special field.
 
  As mentioned we have SplitMode= for this, but it is strictly for UIDs
  only, since we only need this for access control management, nothing
  else.
 
 This could be another solution to my problem. Allocate a different
 UID to each service.

This is what the Pantheon folks are doing. Instead of running full OSes
in their containers they run large number of sandboxes services (using
systemd's sandboxing logic), each under a UID of their own.

  Why precisely do you want to split up your log files per-service? That's
  the bit I don't get.
 
 Because in the envisaged PaaS, you have components (services)
 starting and stopping. Thus I want to be able to easily just remove
 logs for dead services, or maybe just move them to a different
 archival service where they get deleted after a period of time.
 
 It's purely for administrative purposes. Maybe even to allow the
 user to download these log files independently.
 
 But I understand now how to best solve this requirement without
 touching the core journald.

Well, but with the journal you can easily filter by
service. i.e. journalctl -u foobar will give you a stream that only
includes messages form service foobar, but it will look otherwise like
/var/log/messages looked.

So, unless you have access mode restrictions or really really need to
make sure that as soon as a container goes away its logs go away too you
could just leave everything in a one pool and then filter on
display/download.

BTW, you with systemd 195 you can do this:

snip
systemctl enable systemd-journal-gatewayd.socket
wget http://localhost:19531/entries?_SYSTEMD_UNIT=foobar.service
/snip

And this will give you all messages from foobar.service easy for
download, even from another host. Or use:

wget --header=Accept: application/json 
http://localhost:19531/entries?_HOSTNAME=waldo

And you'll get a JSON formatted dump of all messages for host/container
waldo. And you can easily process that then from your web app to present
a per-container log stream to the user.

Lennart

-- 
Lennart Poettering - Red Hat, Inc.
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] [ANNOUNCE] Journal File Format Documentation

2012-10-23 Thread Jóhann B. Guðmundsson

On 10/23/2012 06:40 PM, Lennart Poettering wrote:

The journal currently cannot do this for you, but what it already can is
split up the journal per-user. This is done by default only for login
users, (i.e. actual human users), but with the SplitMode= setting in
journald.conf can be enabled for system users as well, or turned off
entirely. We could extend this switch to allow other split-up schemes.

But note that the price you pay for interleaving files on display grows
with the more you split things up (O(n) being n number of files to
interleave), hence we are a bit conservative here, we don't want to push
people towards splitting up things too much, unless they have a really
good reason to.


If I'm understanding this correctly would it not better simply/suffician 
support splitting the journal up via cli instead of doing it real time ?


JBG
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] [ANNOUNCE] Journal File Format Documentation

2012-10-23 Thread Lennart Poettering
On Tue, 23.10.12 20:52, Jóhann B. Guðmundsson (johan...@gmail.com) wrote:

 On 10/23/2012 06:40 PM, Lennart Poettering wrote:
 The journal currently cannot do this for you, but what it already can is
 split up the journal per-user. This is done by default only for login
 users, (i.e. actual human users), but with the SplitMode= setting in
 journald.conf can be enabled for system users as well, or turned off
 entirely. We could extend this switch to allow other split-up schemes.
 
 But note that the price you pay for interleaving files on display grows
 with the more you split things up (O(n) being n number of files to
 interleave), hence we are a bit conservative here, we don't want to push
 people towards splitting up things too much, unless they have a really
 good reason to.
 
 If I'm understanding this correctly would it not better
 simply/suffician support splitting the journal up via cli instead of
 doing it real time ?

We might add this as a tool one day, but I think it's a good rule to
write logs once, and not touch them afterwrads if at all possible, in
order not to corrupt what is already written safely. Hence: it's
probably a good idea to focus on writing things the right way the first
time, instead of focusing on on rewriting them afterwards.

Related to the tool you are suggesting I think a tool to merge split off
files might be very useful too, to counter the scalability issues of
interleaving too many separate files on display.

Lennart

-- 
Lennart Poettering - Red Hat, Inc.
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] [ANNOUNCE] Journal File Format Documentation

2012-10-23 Thread Jóhann B. Guðmundsson

On 10/23/2012 09:19 PM, Lennart Poettering wrote:

Related to the tool you are suggesting I think a tool to merge split off
files might be very useful too, to counter the scalability issues of
interleaving too many separate files on display.
Yeah an extension to the journalctl and probably users would like to do 
as the part of the process when the journal files get rotated on disk ( 
rotate -- split )


JBG
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


[systemd-devel] [ANNOUNCE] Journal File Format Documentation

2012-10-20 Thread Lennart Poettering
Heya,

I have now found the time to document the journal file format:

http://www.freedesktop.org/wiki/Software/systemd/journal-files

Comments welcome!

(Oh, and it's in the fdo wiki, so if you see a typo or so, go ahead and
fix it!)

Lennart

-- 
Lennart Poettering - Red Hat, Inc.
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel