Re: Merging relayfs?
Tom Zanussi wrote: > In userspace, the sub-buffer reading loop looks at the commit value in > the sub-buffer, and if it matches (sub-buffer size - padding), the > buffer has been completely written and can be saved, otherwise it's > not yet complete and is checked again the next time around. This way, > there's no need for a deliver() callback, the relay_commit() is > replaced with the increment of the reserved commit value, the arrays > aren't needed and you get the same result in the end in a much simpler > way, IMHO. Actually this has a much greater potential of loosing buffers because we have to poll the buffer for completion. Seen another way, the kernel- side has got to wait until the user-side has "figured out" that it needs to commit content to disk. As it was originally, it was relatively straightforward to dertermine why data was lost: ok, we've signaled it from kernel space, but the daemon never flushed it out. Without commit/ deliver, things are much less clear, and I still miss what gain we are making by removing them. I would very much like to see the commit/deliver functionality back. Such mechanisms are required for any sane producer-consumer model. Karim -- Author, Speaker, Developer, Consultant Pushing Embedded and Real-Time Linux Systems Beyond the Limits http://www.opersys.com || [EMAIL PROTECTED] || 1-866-677-4546 - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Merging relayfs?
Karim Yaghmour writes: > > Tom Zanussi wrote: > > - removed the deliver() callback > > - removed the relay_commit() function > > This breaks LTT. Any reason why this needed to be removed? In the end, > the code will just end up being duplicated in ltt and all other users. > IOW, this is not some potential future use, but something that's > currently being used. Because I realized that like the padding and commit arrays, they're not really necessary. In all the examples, the padding is saved in space reserved at the beginning of the sub-buffer via subbuf_start_reserve(), except that now the padding is passed into the subbuf_start() callback rather than kept in an array. The padding value passed in is then directly saved in the reserved padding space. Similarly, in the case of the reserve/commit example, extra space is also reserved for the commit count using subbuf_start_reserve(). After space for an event is reserved using relay_reserve() and completely written, the event length is added to that commit value. In userspace, the sub-buffer reading loop looks at the commit value in the sub-buffer, and if it matches (sub-buffer size - padding), the buffer has been completely written and can be saved, otherwise it's not yet complete and is checked again the next time around. This way, there's no need for a deliver() callback, the relay_commit() is replaced with the increment of the reserved commit value, the arrays aren't needed and you get the same result in the end in a much simpler way, IMHO. But if you see a problem with it or have any suggestions to make it better/different, please let me know... Tom - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Merging relayfs?
Tom Zanussi wrote: > - removed the deliver() callback > - removed the relay_commit() function This breaks LTT. Any reason why this needed to be removed? In the end, the code will just end up being duplicated in ltt and all other users. IOW, this is not some potential future use, but something that's currently being used. Karim -- Author, Speaker, Developer, Consultant Pushing Embedded and Real-Time Linux Systems Beyond the Limits http://www.opersys.com || [EMAIL PROTECTED] || 1-866-677-4546 - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Merging relayfs?
Tom Zanussi writes: > > OK, if we got rid of the padding counts and commit counts and let the > client manage those, we can simplify the buffer switch slow path and > make the API simpler in the process. Here's a first proposal for > doing that - I won't know until I actually do it what snags I may run > into, but if this looks like the right direction to go, I'll go ahead > with it... > And here's a patch to update the Documentation... diff -urpN -X dontdiff linux-2.6.13-rc3-mm1/Documentation/filesystems/relayfs.txt linux-2.6.13-rc3-mm1-cur/Documentation/filesystems/relayfs.txt --- linux-2.6.13-rc3-mm1/Documentation/filesystems/relayfs.txt 2005-07-16 11:47:32.0 -0500 +++ linux-2.6.13-rc3-mm1-cur/Documentation/filesystems/relayfs.txt 2005-07-23 12:50:46.0 -0500 @@ -23,6 +23,47 @@ This document provides an overview of th the function parameters are documented along with the functions in the filesystem code - please see that for details. +Semantics += + +Each relayfs channel has one buffer per CPU, each buffer has one or +more sub-buffers. Messages are written to the first sub-buffer until +it is too full to contain a new message, in which case it it is +written to the next (if available). Messages are never split across +sub-buffers. At this point, userspace can be notified so it empties +the first sub-buffer, while the kernel continues writing to the next. + +When notified that a sub-buffer is full, the kernel knows how many +bytes of it are padding i.e. unused. Userspace can use this knowledge +to copy only valid data. + +After copying it, userspace can notify the kernel that a sub-buffer +has been consumed. + +relayfs can operate in a mode where it will overwrite data not yet +collected by userspace, and not wait for it to consume it. + +relayfs itself does not provide for communication of such data between +userspace and kernel, allowing the kernel side to remain simple and not +impose a single interface on userspace. It does provide a separate +helper though, described below. + +klog, relay-app & librelay +== + +relayfs itself is ready to use, but to make things easier, two +additional systems are provided. klog is a simple wrapper to make +sending data to a channel simpler, regardless of whether a channel to +write to exists or not. relay-app is the kernel counterpart of +userspace librelay.c, combined these two files provide glue to easily +stream data, without having to bother with housekeeping. + +It is possible to use relayfs without relay-app & librelay, but you'll +have to implement communication between userspace and kernel, allowing +both to convey the state of buffers (full, empty, amount of padding). + +klog, relay-app and librelay can be found in the relay-apps tarball on +http://relayfs.sourceforge.net The relayfs user space API == @@ -34,7 +75,8 @@ available and some comments regarding th open() enables user to open an _existing_ buffer. mmap() results in channel buffer being mapped into the caller's -memory space. +memory space. Note that you can't do a partial mmap - you must +map the entire file, which is NRBUF * SUBBUFSIZE. poll() POLLIN/POLLRDNORM/POLLERR supported. User applications are notified when sub-buffer boundaries are crossed. @@ -63,13 +105,15 @@ Here's a summary of the API relayfs prov channel management functions: relay_open(base_filename, parent, subbuf_size, n_subbufs, - overwrite, callbacks) + callbacks) relay_close(chan) relay_flush(chan) relay_reset(chan) relayfs_create_dir(name, parent) relayfs_remove_dir(dentry) -relay_commit(buf, reserved, count) + + channel management typically called on instigation of userspace: + relay_subbufs_consumed(chan, cpu, subbufs_consumed) write functions: @@ -77,19 +121,22 @@ Here's a summary of the API relayfs prov relay_write(chan, data, length) __relay_write(chan, data, length) relay_reserve(chan, length) +__relay_reserve(buf, length) callbacks: -subbuf_start(buf, subbuf, prev_subbuf_idx, prev_subbuf) -deliver(buf, subbuf_idx, subbuf) +subbuf_start(buf, subbuf, prev_subbuf, prev_padding) buf_mapped(buf, filp) buf_unmapped(buf, filp) -buf_full(buf, subbuf_idx) + helper functions: + +relay_buf_full(buf) +subbuf_start_reserve(buf, length) -A relayfs channel is made of up one or more per-cpu channel buffers, -each implemented as a circular buffer subdivided into one or more -sub-buffers. + +Creating a channel +-- relay_open() is used to create a channel, along with its per-cpu channel buffers. Each channel buffer will have an associated file @@ -117,30 +164,106 @@ though, it's safe to assume that having idea - you're guaranteed to either overwrite data or lose events depending on the channel mode being used. -relayfs cha
Re: Merging relayfs?
Tom Zanussi writes: > > OK, if we got rid of the padding counts and commit counts and let the > client manage those, we can simplify the buffer switch slow path and > make the API simpler in the process. Here's a first proposal for > doing that - I won't know until I actually do it what snags I may run > into, but if this looks like the right direction to go, I'll go ahead > with it... > Here's a preliminary patch that does this cleanup. It ends up being a nice little simplification of the API and the buffer switch path. Despite the size of the patch, the changes aren't that significant and they don't reduce the functionality at all - I've tested using an updated version of the relay-apps examples - those are still in flux at the moment, but if anyone wants to see them now, I'll clean them up and make them available. I have tested that things basically work, but I still need to do more testing; I'm posting now just in case anyone disagrees with the changes. I'll also be posting an update to the documentation shortly. Here are the changes made by this patch: - changed unsigned to unsigned int, and also changed several uses of int to unsigned int where it made sense - removed the padding counts and commit counts - changed the subbuf_start() callback to add a prev_padding param, and return a boolean value to indicate whether or not the buffer switch should occur - added a subbuf_start_reserve() helper function - removed the deliver() callback - removed the relay_commit() function - removed the buf_full() callback - added __relay_reserve(), which is used by relay_reserve() but allows a client that already has a buffer pointer to use that instead Tom diff -urpN -X dontdiff linux-2.6.13-rc3-mm1/fs/relayfs/buffers.c linux-2.6.13-rc3-mm1-cur/fs/relayfs/buffers.c --- linux-2.6.13-rc3-mm1/fs/relayfs/buffers.c 2005-07-16 11:47:34.0 -0500 +++ linux-2.6.13-rc3-mm1-cur/fs/relayfs/buffers.c 2005-07-22 01:10:21.0 -0500 @@ -95,7 +95,7 @@ int relay_mmap_buf(struct rchan_buf *buf static void *relay_alloc_buf(struct rchan_buf *buf, unsigned long size) { void *mem; - int i, j, n_pages; + unsigned int i, j, n_pages; size = PAGE_ALIGN(size); n_pages = size >> PAGE_SHIFT; @@ -137,27 +137,15 @@ struct rchan_buf *relay_create_buf(struc if (!buf) return NULL; - buf->padding = kmalloc(chan->n_subbufs * sizeof(unsigned *), GFP_KERNEL); - if (!buf->padding) - goto free_buf; - - buf->commit = kmalloc(chan->n_subbufs * sizeof(unsigned *), GFP_KERNEL); - if (!buf->commit) - goto free_buf; - buf->start = relay_alloc_buf(buf, chan->alloc_size); - if (!buf->start) - goto free_buf; - + if (!buf->start) { + kfree(buf); + return NULL; + } + buf->chan = chan; kref_get(&buf->chan->kref); return buf; - -free_buf: - kfree(buf->commit); - kfree(buf->padding); - kfree(buf); - return NULL; } /** @@ -167,7 +155,7 @@ free_buf: void relay_destroy_buf(struct rchan_buf *buf) { struct rchan *chan = buf->chan; - int i; + unsigned int i; if (likely(buf->start)) { vunmap(buf->start); @@ -175,8 +163,6 @@ void relay_destroy_buf(struct rchan_buf __free_page(buf->page_array[i]); kfree(buf->page_array); } - kfree(buf->padding); - kfree(buf->commit); kfree(buf); kref_put(&chan->kref, relay_destroy_channel); } diff -urpN -X dontdiff linux-2.6.13-rc3-mm1/fs/relayfs/relay.c linux-2.6.13-rc3-mm1-cur/fs/relayfs/relay.c --- linux-2.6.13-rc3-mm1/fs/relayfs/relay.c 2005-07-16 11:47:34.0 -0500 +++ linux-2.6.13-rc3-mm1-cur/fs/relayfs/relay.c 2005-07-23 09:27:40.0 -0500 @@ -26,10 +26,7 @@ */ int relay_buf_empty(struct rchan_buf *buf) { - int produced = atomic_read(&buf->subbufs_produced); - int consumed = atomic_read(&buf->subbufs_consumed); - - return (produced - consumed) ? 0 : 1; + return (buf->subbufs_produced - buf->subbufs_consumed) ? 0 : 1; } /** @@ -38,17 +35,10 @@ int relay_buf_empty(struct rchan_buf *bu * * Returns 1 if the buffer is full, 0 otherwise. */ -static inline int relay_buf_full(struct rchan_buf *buf) +int relay_buf_full(struct rchan_buf *buf) { - int produced, consumed; - - if (buf->chan->overwrite) - return 0; - - produced = atomic_read(&buf->subbufs_produced); - consumed = atomic_read(&buf->subbufs_consumed); - - return (produced - consumed > buf->chan->n_subbufs - 1) ? 1 : 0; + unsigned int ready = buf->subbufs_produced - buf->subbufs_consumed; + return (ready >= buf->chan->n_subbufs) ? 1 : 0; } /* @@ -65,22 +55,13 @@ static inline int relay_buf_full(struct */ static int subbuf_start_default_callback (struct rchan_
Re: Merging relayfs?
Roman Zippel writes: > Hi, > > On Mon, 18 Jul 2005, Karim Yaghmour wrote: > > > I guess I just don't get the point here. Why cut something away if many > > users will need it. If it's that popular that you're ready to provide a > > library function to do it, then why not just leave it to boot? One of the > > goals of relayfs is to avoid code duplication with regards to buffering > > in general. > > The road to bloatness is paved with lots of little features. > There aren't that many users anyway (none of the examples use that > feature). I'd prefer to concentrate on a simple and correct relayfs layer > and we can still think about other features as more users appear. > Starting a design by implementing every little feature which _might_ be > needed is a really bad idea. > OK, if we got rid of the padding counts and commit counts and let the client manage those, we can simplify the buffer switch slow path and make the API simpler in the process. Here's a first proposal for doing that - I won't know until I actually do it what snags I may run into, but if this looks like the right direction to go, I'll go ahead with it... - get rid of the padding counts - the client can manage those if it wants to, but in any case pass the padding for the previous sub-buffer in to the subbuf_start callback. - get rid of the commit counts - the client can manage those. Also, get rid of the related API functions that deal with those i.e. relay_commit() and the deliver() callback. - change the buffer_start() callback to something like the following (the body shows an example of what would typically be done by a client): /* * subbuf_start() callback. * * Return 1 to allow logging to continue, 0 to stop. */ static int subbuf_start_default_callback (struct rchan_buf *buf, void *subbuf, void *prev_subbuf, int prev_padding) { *((int *)prev_subbuf) = prev_padding; if (relay_buf_full(buf)) return 0; relay_reserve(subbuf, sizeof (int)); return 1; } - add a relay_reserve() function for the client to use to reserve space at the beginning of the sub-buffer (it can use this reserved space to save the padding among other things). This would be used by the client in the subbuf_start callback, rather than returning it via an outparam or struct. - remove the buf_full() callback - the client can determine this in the subbuf_start() callback. Also, as far as the netlink/ioctl/proc file communication, I'll have to think more about it, but will play around with something when I update the example code. Let me know if this sounds ok, or if you have better suggestions. Thanks, Tom - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Merging relayfs?
Hi, On Mon, 18 Jul 2005, Karim Yaghmour wrote: > I guess I just don't get the point here. Why cut something away if many > users will need it. If it's that popular that you're ready to provide a > library function to do it, then why not just leave it to boot? One of the > goals of relayfs is to avoid code duplication with regards to buffering > in general. The road to bloatness is paved with lots of little features. There aren't that many users anyway (none of the examples use that feature). I'd prefer to concentrate on a simple and correct relayfs layer and we can still think about other features as more users appear. Starting a design by implementing every little feature which _might_ be needed is a really bad idea. bye, Roman - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Merging relayfs?
Hi, On Mon, 18 Jul 2005, Steven Rostedt wrote: > > What exactly would be slowed down? > > It would just move around some code and even avoid the overwrite mode > > check. > > Yes, you're adding a jump to another function via a function pointer, > that would kill the cache line of execution, to avoid a simple check, or > some other way of handling it. RTFS. (deliver_default_callback) > Since I don't want to know the internals > of relayfs, You have to anyway, currently relayfs client need some knowledge about how buffers are managed. > the overwrite mode could be implemented in a more officient way. I wouldn't call the buffer switch routine efficient, yet. > > > I don't see the problem with having an overwrite mode or not. Why > > > can't relayfs know this? > > > > The point is to design a simple and flexible relayfs layer, which means > > not every possible function has to be done in the relayfs layer, as long > > it's flexible enough to build additional functionality on top of it (for > > which it can again provide some library functions). > > The overwrite mode isn't that complex. You don't want to make something > so flexible that it becomes more complex. Assembly is more flexible > than C but I wouldn't want to code a lot with it. A library function > for me is out of the question, since what I build on top of relayfs is > mostly in the kernel. The overwrite mode would then have to be > implemented through another kernel activity. I might as well keep my > own ring buffers and forget about using relayfs, and all my points in > which I argue for it being merged is mute. I must admit I have no clue, what you're talking about here... The keywords above are "_simple_ _and_ _flexible_". bye, Roman - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Merging relayfs?
Steven Rostedt writes: > On Sun, 2005-07-17 at 10:52 -0500, Tom Zanussi wrote: > > > > > > > > > - overwrite mode can be implemented via the buffer switch callback > > > > > > > > The buffer switch callback is already where this is handled, unless > > > > you're thinking of something else - one of the first checks in the > > > > buffer switch is relay_buf_full(), which always returns 0 if the > > > > buffer is in overwrite mode. > > > > > > I mean, relayfs doesn't has to know about this, the client itself can > > do > > > it (e.g. via helper functions). > > > > In a previous version, we did something like having the client pass > > back a return value from the callback indicating whether or not to > > continue or stop. I can try doing something like that instead again. > > Tom, > > I'm actually very much against this. Looking at a point of view from the > logdev device. Having a callback to know to continue at every buffer > switch would just be slowing down something that is expected to be very > fast. I don't see the problem with having an overwrite mode or not. Why > can't relayfs know this? It _is_ an operation of relayfs, and having it > pushed to the client would seem to make the client need to know more > about how relayfs works that it needs to. Because, the logdev device > doesn't care about buffer switches. I don't think it would slow anything down - it would be pretty much the same code being executed as before e.g. the buffer_start() callback for overwrite mode could look like this: int buffer_start() { ... return 1; // continue unconditionally } And for no-overwrite mode: int buffer_start() { ... return !relay_buf_full(buf); // continue if not full } Since the buffer start callback already returns the amount that's supposed to be reserved at the start of the sub-buffer, I'd have to make that an outparam instead, I guess, but it's basically the same code handling the overwrite/no-overwrite condition. Tom - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Merging relayfs?
Roman Zippel wrote: > The point is to design a simple and flexible relayfs layer, which means > not every possible function has to be done in the relayfs layer, as long > it's flexible enough to build additional functionality on top of it (for > which it can again provide some library functions). I guess I just don't get the point here. Why cut something away if many users will need it. If it's that popular that you're ready to provide a library function to do it, then why not just leave it to boot? One of the goals of relayfs is to avoid code duplication with regards to buffering in general. Karim -- Author, Speaker, Developer, Consultant Pushing Embedded and Real-Time Linux Systems Beyond the Limits http://www.opersys.com || [EMAIL PROTECTED] || 1-866-677-4546 - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Merging relayfs?
On Mon, 2005-07-18 at 16:16 +0200, Roman Zippel wrote: > Hi, > > On Mon, 18 Jul 2005, Steven Rostedt wrote: > > > I'm actually very much against this. Looking at a point of view from the > > logdev device. Having a callback to know to continue at every buffer > > switch would just be slowing down something that is expected to be very > > fast. > > What exactly would be slowed down? > It would just move around some code and even avoid the overwrite mode > check. Yes, you're adding a jump to another function via a function pointer, that would kill the cache line of execution, to avoid a simple check, or some other way of handling it. Since I don't want to know the internals of relayfs, the overwrite mode could be implemented in a more officient way. Granted, this probably isn't much of a slowdown since the copying of data would be much longer. > > > I don't see the problem with having an overwrite mode or not. Why > > can't relayfs know this? > > The point is to design a simple and flexible relayfs layer, which means > not every possible function has to be done in the relayfs layer, as long > it's flexible enough to build additional functionality on top of it (for > which it can again provide some library functions). The overwrite mode isn't that complex. You don't want to make something so flexible that it becomes more complex. Assembly is more flexible than C but I wouldn't want to code a lot with it. A library function for me is out of the question, since what I build on top of relayfs is mostly in the kernel. The overwrite mode would then have to be implemented through another kernel activity. I might as well keep my own ring buffers and forget about using relayfs, and all my points in which I argue for it being merged is mute. -- Steve - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Merging relayfs?
Hareesh Nagarajan writes: > Tom Zanussi wrote: > > Roman Zippel writes: > > > Hi, > > > > > > On Thu, 14 Jul 2005, Tom Zanussi wrote: > > > > > > > The netlink control channel seems to work very well, but I can > > > > certainly change the examples to use something different. Could you > > > > suggest something? > > > > > > It just looks like a complicated way to do an ioctl, a control file > > that > > > you can read/write would be a lot simpler and faster. > > > > You're right - in previous versions, we did use ioctl - we ended up > > using netlink as it seemed like least offensive option to most people. > > I'll try modifying the example code to use a control file or something > > like that instead though. > > Having an ioctl() interface will definitely make things less > complicated. Are the older versions which use ioctl available off the > relayfs website? Yes, the 'old relayfs' patches on the website implement ioctl. Tom - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Merging relayfs?
Hi, On Mon, 18 Jul 2005, Steven Rostedt wrote: > I'm actually very much against this. Looking at a point of view from the > logdev device. Having a callback to know to continue at every buffer > switch would just be slowing down something that is expected to be very > fast. What exactly would be slowed down? It would just move around some code and even avoid the overwrite mode check. > I don't see the problem with having an overwrite mode or not. Why > can't relayfs know this? The point is to design a simple and flexible relayfs layer, which means not every possible function has to be done in the relayfs layer, as long it's flexible enough to build additional functionality on top of it (for which it can again provide some library functions). bye, Roman - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Merging relayfs?
On Sun, 2005-07-17 at 10:52 -0500, Tom Zanussi wrote: > > > > > > - overwrite mode can be implemented via the buffer switch callback > > > > > > The buffer switch callback is already where this is handled, unless > > > you're thinking of something else - one of the first checks in the > > > buffer switch is relay_buf_full(), which always returns 0 if the > > > buffer is in overwrite mode. > > > > I mean, relayfs doesn't has to know about this, the client itself can do > > it (e.g. via helper functions). > > In a previous version, we did something like having the client pass > back a return value from the callback indicating whether or not to > continue or stop. I can try doing something like that instead again. Tom, I'm actually very much against this. Looking at a point of view from the logdev device. Having a callback to know to continue at every buffer switch would just be slowing down something that is expected to be very fast. I don't see the problem with having an overwrite mode or not. Why can't relayfs know this? It _is_ an operation of relayfs, and having it pushed to the client would seem to make the client need to know more about how relayfs works that it needs to. Because, the logdev device doesn't care about buffer switches. -- Steve - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Merging relayfs?
Tom Zanussi <[EMAIL PROTECTED]> wrote on 14/07/2005 16:01:25: > The only things that are atomic are the counts of produced and > consumed buffers and these are only ever updated or read in the slow > buffer-switch path. They're atomic because if they weren't, wouldn't > it be possible for the client to read an unfinished value if the > producer was in the middle of updating it? This depends on architecture. It is possible under some architectures to see the so-called score-boarding effect when reading on one processor while writing on another when not having imposed any atomicity. From memory, I believe this might be possible with zSeries, but I'll need to check the microarchitecture docs. It's been a long time since I read them but I do recall a reference to the score-boarding effect. > ... Richard - - Richard J Moore IBM Advanced Linux Response Team - Linux Technology Centre MOBEX: 264807; Mobile (+44) (0)7739-875237 Office: (+44) (0)1962-817072 - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Merging relayfs?
Tom Zanussi wrote: Roman Zippel writes: > Hi, > > On Thu, 14 Jul 2005, Tom Zanussi wrote: > > > The netlink control channel seems to work very well, but I can > > certainly change the examples to use something different. Could you > > suggest something? > > It just looks like a complicated way to do an ioctl, a control file that > you can read/write would be a lot simpler and faster. You're right - in previous versions, we did use ioctl - we ended up using netlink as it seemed like least offensive option to most people. I'll try modifying the example code to use a control file or something like that instead though. Having an ioctl() interface will definitely make things less complicated. Are the older versions which use ioctl available off the relayfs website? I'm not quite sure if my opinion matters but I'd like to see relayfs merged. To me it appears to be the quickest and cleanest way to export trace data from the kernel to userspace. Thanks, Hareesh Nagarajan -= Engineering Intern =- - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Merging relayfs?
Roman Zippel writes: > Hi, > > On Thu, 14 Jul 2005, Tom Zanussi wrote: > > > The netlink control channel seems to work very well, but I can > > certainly change the examples to use something different. Could you > > suggest something? > > It just looks like a complicated way to do an ioctl, a control file that > you can read/write would be a lot simpler and faster. You're right - in previous versions, we did use ioctl - we ended up using netlink as it seemed like least offensive option to most people. I'll try modifying the example code to use a control file or something like that instead though. > > > > Looking through the patch there are still a few areas I'm concerned > > about: > > > - the usage of atomic_t look a little silly, there is only a single > > > writer and probably needs some cache line optimisations > > > > The only things that are atomic are the counts of produced and > > consumed buffers and these are only ever updated or read in the slow > > buffer-switch path. They're atomic because if they weren't, wouldn't > > it be possible for the client to read an unfinished value if the > > producer was in the middle of updating it? > > No. > > > > - I would prefer "unsigned int" over just "unsigned" > > > - the padding/commit arrays can be easily managed by the client > > > > Yes, I can move them out and update the examples to reflect that, but > > I thought that if this was something that most clients would need to > > do, it made some sense to keep it in relayfs and avoid duplication in > > the clients. > > If a lot of clients needs this, there a different ways to do this, e.g. by > introducing some helper functions that clients can use. This way you can > keep the core simple and allow the client to modify its behaviour. OK, I'll think about the best way to change this. > > > > - overwrite mode can be implemented via the buffer switch callback > > > > The buffer switch callback is already where this is handled, unless > > you're thinking of something else - one of the first checks in the > > buffer switch is relay_buf_full(), which always returns 0 if the > > buffer is in overwrite mode. > > I mean, relayfs doesn't has to know about this, the client itself can do > it (e.g. via helper functions). In a previous version, we did something like having the client pass back a return value from the callback indicating whether or not to continue or stop. I can try doing something like that instead again. Tom - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Merging relayfs?
Hi, On Thu, 14 Jul 2005, Tom Zanussi wrote: > The netlink control channel seems to work very well, but I can > certainly change the examples to use something different. Could you > suggest something? It just looks like a complicated way to do an ioctl, a control file that you can read/write would be a lot simpler and faster. > > Looking through the patch there are still a few areas I'm concerned about: > > - the usage of atomic_t look a little silly, there is only a single > > writer and probably needs some cache line optimisations > > The only things that are atomic are the counts of produced and > consumed buffers and these are only ever updated or read in the slow > buffer-switch path. They're atomic because if they weren't, wouldn't > it be possible for the client to read an unfinished value if the > producer was in the middle of updating it? No. > > - I would prefer "unsigned int" over just "unsigned" > > - the padding/commit arrays can be easily managed by the client > > Yes, I can move them out and update the examples to reflect that, but > I thought that if this was something that most clients would need to > do, it made some sense to keep it in relayfs and avoid duplication in > the clients. If a lot of clients needs this, there a different ways to do this, e.g. by introducing some helper functions that clients can use. This way you can keep the core simple and allow the client to modify its behaviour. > > - overwrite mode can be implemented via the buffer switch callback > > The buffer switch callback is already where this is handled, unless > you're thinking of something else - one of the first checks in the > buffer switch is relay_buf_full(), which always returns 0 if the > buffer is in overwrite mode. I mean, relayfs doesn't has to know about this, the client itself can do it (e.g. via helper functions). bye, Roman - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Merging relayfs?
Roman Zippel writes: > Hi, > > On Mon, 11 Jul 2005, Andrew Morton wrote: > > > > > Hi Andrew, can you please merge relayfs? It provides a low-overhead > > > > logging and buffering capability, which does not currently exist in > > > > the kernel. > > > > > > While the code is pretty nicely in shape it seems rather pointless to > > > merge until an actual user goes with it. > > > > Ordinarily I'd agree. But this is a bit like kprobes - it's a funny thing > > which other kernel features rely upon, but those features are often ad-hoc > > and aren't intended for merging. > > I agree with Christoph, I'd like to see a small (and useful) example > included, which can be used as reference. relayfs client still need some > code of their own to communicate with user space. If I look at the example > code I'm not really sure netlink is a good way to go as control channel. > kprobes has a rather simple interface, relayfs is more complex and I think > it's a good idea to provide some sane and complete example code to copy > from. > The netlink control channel seems to work very well, but I can certainly change the examples to use something different. Could you suggest something? > Looking through the patch there are still a few areas I'm concerned about: > - the usage of atomic_t look a little silly, there is only a single > writer and probably needs some cache line optimisations The only things that are atomic are the counts of produced and consumed buffers and these are only ever updated or read in the slow buffer-switch path. They're atomic because if they weren't, wouldn't it be possible for the client to read an unfinished value if the producer was in the middle of updating it? > - I would prefer "unsigned int" over just "unsigned" > - the padding/commit arrays can be easily managed by the client Yes, I can move them out and update the examples to reflect that, but I thought that if this was something that most clients would need to do, it made some sense to keep it in relayfs and avoid duplication in the clients. > - overwrite mode can be implemented via the buffer switch callback The buffer switch callback is already where this is handled, unless you're thinking of something else - one of the first checks in the buffer switch is relay_buf_full(), which always returns 0 if the buffer is in overwrite mode. Tom - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Merging relayfs?
Hi, On Mon, 11 Jul 2005, Andrew Morton wrote: > > > Hi Andrew, can you please merge relayfs? It provides a low-overhead > > > logging and buffering capability, which does not currently exist in > > > the kernel. > > > > While the code is pretty nicely in shape it seems rather pointless to > > merge until an actual user goes with it. > > Ordinarily I'd agree. But this is a bit like kprobes - it's a funny thing > which other kernel features rely upon, but those features are often ad-hoc > and aren't intended for merging. I agree with Christoph, I'd like to see a small (and useful) example included, which can be used as reference. relayfs client still need some code of their own to communicate with user space. If I look at the example code I'm not really sure netlink is a good way to go as control channel. kprobes has a rather simple interface, relayfs is more complex and I think it's a good idea to provide some sane and complete example code to copy from. Looking through the patch there are still a few areas I'm concerned about: - the usage of atomic_t look a little silly, there is only a single writer and probably needs some cache line optimisations - I would prefer "unsigned int" over just "unsigned" - the padding/commit arrays can be easily managed by the client - overwrite mode can be implemented via the buffer switch callback In general I'm not against merging, but I have a few ideas for further cleanups/optimisations and it really would help to have some useful example code (e.g. a _simple_ event tracer). bye, Roman - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Merging relayfs?
On Wed, 13 Jul 2005, Vara Prasad wrote: [..] Looks like you have not looked at systemtap project although Tom pointed about it to you in his previous postings. The URL for systemtap is http://sourceware.org/systemtap/, i strongly suggest you to look at that project. I'm just fill this gap. Sorry but I cant't find in this document even single word about assumption about agregatre data possibly in short range from probe. But point 6.1 this document says: "Kernel-to-user transport Data collected from systemtap in the kernel must ^^ somehow be transmitted to userspace. This transport must ^^^ have high performance and minimal performance impact on the monitored system. One candidate is relayfs. Relayfs provides an efficient way to move large blocks of data from the kernel to userspace. The data is sent ^ in per-cpu beffers which a userspace program can save or display. ^^ Drawbacks are that the data arrives in blocks and is separated into per-cpu blocks, possibly requiring a post-processing step that stitches the data into an integrated steam. Relayfs is included in some recent -mm kernels. It can be built as a loadable module and is currently checked into CVS under src/runtime/relayfs. The other candidate is netlink. Netlink is included in the kernel. It allows a simple stream of data to be sent using the familiar socket APIs. It is unlikely to be as fast as ^^^ relayfs. Relayfs typically makes use of netlink as a control channel. With ^^^ some simple extensions, the runtime can use netlink as the main transport too. So we can currently select in the runtime between relayfs and netlink, allowing us to support streams of data or blocks. And allowing us to perform direct comparisons of efficiency. [..]" So .. using relayfs is neccessary because all collected data "must somehow be transmitted to userspace" and this why must be transfered huge amout of data. But if transering big amout of data will not be an issue seems netlink can be used for transfer data (generaly agregated) from kernel probes (?). But also "with some simple extensions, the runtime can use netlink as the main transport too". Even this document says "relayfs isn't neccessary fundament for systemtap". So .. why try to push for merge relayfs *NOW* ? Because KProbes do not have expressions and some base agregators like couters isn't possibe to check NOW in real examples is realy realyfs is neccessary (?) :) kloczek -- --- *Ludzie nie mają problemów, tylko sobie sami je stwarzają* --- Tomasz Kłoczko, sys adm @zie.pg.gda.pl|*e-mail: [EMAIL PROTECTED]
Re: Merging relayfs?
On Wed, 13 Jul 2005, Vara Prasad wrote: [..] O.K, looks like you are agreeing that we need a buffering mechanism in the kernel to implement speculative tracing, right. Each agregator have own data. This data are buffered .. In this sense: yes, it infrastructure for allocate, deallocate, copy .. (generaly) operate on this buffers is needed. Once we have the buffering mechanism we need to create an efficient API for producers of the data to write to that buffering scheme. To my knowledge there is no such generic buffering mechanism already in the kernel, Relayfs implements that buffering scheme and an efficient API to write to it. Isn't that a good reason to have Relayfs merged? Sorry but not. Relayfs this is much more than it is required for simple manage buffers (better will be say in this point "probes data containers"). All this kind operation can be performed using reference/index. Once the data in the buffer is decided to be committed you need a mechanism to get that data from the kernel to userspace. If you don't like Relayfs transfer mechanism, what do you suggest using? Correct me if I'm wrong .. ant try fill all this area where you see my worse knowledge then yours or other strict kernel developers. 1) relayfs was prepared for low latency on move data outside kernel space, 2) getting data from probes do not require organize all them in regular file system structure also in most cases will do not require low latency. Only in all cases where buffer must be neccessarly moved outside kernel space will require minimal overhead. Many other kernel sugbsystem allow transfer data as result of simple request with argument as reference/index. Organize all data stored/used by probes in named structure (if it is *realy* neccesary) can be IMO moved outside kernel space. Why ? becase *all operations on kernel side on this data* seems can be performed without addidional namig abstraction (buffer number, buffer size and data type stored in buffer it will be all what is neccessary in probably all cases even in case operate on complex data). If you realy want get data from probes via fopen()/read() why not map "probes data containers" to procfs/sysfs ? For reciving signals from perobes for move out of kernel space mapped buffer content and/or ALSO reciving signals with DATA (on request from user space) probably can be performed via existing netlink infrastrucrure or (higher) event notification. (?) Allow me ask you: do you try test is using netlink will allow perform operations in neccessary time frame ? (with additional assumption agregate maximum data possibly in "short range" from probe) .. probably not because most of skeleton ussages of KProbes and also LTT interface was prepared with assumption agregate data outside kernel space. Do you see this ? This was and sill is core cause of LTT problems and why it will never will be so usefull as DTrace. Agregate data in possible "short distance" from probe is *core DTrace assumption*. Simple .. this why using DTrace is *very light* even if you are enable/hang thousands of probes inside kernel space and still it allwo use this kind of technik evel in very fragile (from point of view stabilyty) or under very high presure systems. kloczek -- --- *Ludzie nie mają problemów, tylko sobie sami je stwarzają* --- Tomasz Kłoczko, sys adm @zie.pg.gda.pl|*e-mail: [EMAIL PROTECTED]
Re: Merging relayfs?
Tomasz Kłoczko wrote: > *NOT using realyfs* if it is not neccessary for possibly big amout > of feactures future KProbes IMO in this case is *fundamental*. > > To time where this base not requiring relayfs feactures will not be > integrated in kernel code better IMO will be stop merging relayfs. This part of the thread is really veering off-topic. This counters thing is your own personal crusade and has nothing to do with the fundamental need for a generic buffering mechanism such as relayfs. I would suggest you start a separate thread to discuss the implementation of a generic counters mechanism, if that's indeed what you're interested in. Karim -- Author, Speaker, Developer, Consultant Pushing Embedded and Real-Time Linux Systems Beyond the Limits http://www.opersys.com || [EMAIL PROTECTED] || 1-866-677-4546 - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Merging relayfs?
Tomasz Kłoczko wrote: On Tue, 12 Jul 2005, Vara Prasad wrote: [..] [..] If I can suggest something about order prepare some feactures: 1) prepare base infrastructure for counters, this "tool" will take very small amount of data and can be performad by very small pieces of binary codes. Even this will allow perform some *very* interesting experinments on existing kernel code. And after above: 2) prepare base infrastructure for association tables of couters (for collecting data for example about I/O operations or other two or more arguments operations), 3) prepare user space tool with some kind of language which will allow hanging ptrobes with aboove tho (simple counters and association tables of couters) 4) base functions for measure time (with KProbes overhead and without) and store them in couters and association tables, All above base "tools" for above will take small or medium amount of data and can be performad small or medium pieces of binary codes. And after above: 5) prepare infrastrucrute for probes which will store data in diffrent containers depending on initiator process and/or thread (and maybe in next etap also will be good have something more common which will depend on stack path), 6) prepare base functions for tracing stack paths (counting them and store in association tables), 7) make some kind of study where is it will be good compute something more complicated like base "speculative probes" (lookin on working DTrace probably answer in this point will be "yes"). Looks like you have not looked at systemtap project although Tom pointed about it to you in his previous postings. The URL for systemtap is http://sourceware.org/systemtap/, i strongly suggest you to look at that project. We are implementing most of the above what you are suggesting in the systemtap project. I don't agree with you that implementing the above features is trivial and takes small amount of code, can you submit patches to show the simple implementation you are talking about. All to this moment will not require relayfs because amount of transfered data will be _very low_. I think you are forgetting the fact that relayfs has two different portions one is the buffering scheme another is the data transfer mechanism. Some of the above features you are talking of needs a buffering scheme. Details of above will be probably different (I have only some very common knowledge about DTrace implementations details and some avarange about using dtrace tool) but I want count/pint *only* feactutres which will not require using relayfs. I beg to differ, as i mentioned in my earlier postings Dtrace has a similar per-CPU buffering scheme according to their USENIX paper http://www.sun.com/bigadmin/content/dtrace/dtrace_usenix.pdf refer to section 3.3, can you explain why? [...] But if you will build all infrastructure even for simple couters on relayfs fundament it will be (IMO) badly/incorrectly designed .. and using even simple couters will introduce to high overhead for system. Do you have any performance data to justify your claim of high overhead? [...] regards kloczek bye, Vara Prasad - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Merging relayfs?
Tomasz Kłoczko wrote: On Tue, 12 Jul 2005, Vara Prasad wrote: Tomasz Kłoczko wrote: On Tue, 12 Jul 2005, Tom Zanussi wrote: =?ISO-8859-2?Q?Tomasz_K=B3oczko?= writes: > On Tue, 12 Jul 2005, Tom Zanussi wrote: [...] > > OK .. "so you can say better is stop flushing buffers on measure which > wil take day or more" ? :_) > Some DTrace probes/technik are specialy prepared for long or evel very > long time experiment wich will only prodyce few lines results on end of > experiment. > Look at DTrace documentation for speculative tracing: > http://docs.sun.com/app/docs/doc/817-6223/6mlkidli7?a=view How do you propose to implement speculative tracing without a buffer to hold the data, when data needs to stay in the kernel for a while before we decide to commit or discard? Buffering some data inside kernel space and buffering with infrastructure for transfer to user space this are two diffrent things. kloczek O.K, looks like you are agreeing that we need a buffering mechanism in the kernel to implement speculative tracing, right. Once we have the buffering mechanism we need to create an efficient API for producers of the data to write to that buffering scheme. To my knowledge there is no such generic buffering mechanism already in the kernel, Relayfs implements that buffering scheme and an efficient API to write to it. Isn't that a good reason to have Relayfs merged? Once the data in the buffer is decided to be committed you need a mechanism to get that data from the kernel to userspace. If you don't like Relayfs transfer mechanism, what do you suggest using? - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Merging relayfs?
On Tue, 12 Jul 2005, Vara Prasad wrote: [..] O.K, Tomasz your point is we can do aggregation in the kernel and cut down the amount of data that needs to be sent out from the kernel hence we don't need an efficient, low overhead mechanism like relayfs to get the data out of the kernel. Having relayfs doesn't prevent someone in aggregating the data in the kernel, so it is not an argument for not including relayfs in the kernel when it fills the need for those who needs raw data. Of course you are right and (look again) this is what I told in first mail in this thread :) I am part of a team working on systemtap where we are are developing a tool similar to Dtrace that does some aggregation where appropriate but nothing like fancy statistics etc. We use relayfs in our systemtap project and based on my reading of Dtrace paper they use exactly similar to relayfs buffering mechanism as well. If I can suggest something about order prepare some feactures: 1) prepare base infrastructure for counters, this "tool" will take very small amount of data and can be performad by very small pieces of binary codes. Even this will allow perform some *very* interesting experinments on existing kernel code. And after above: 2) prepare base infrastructure for association tables of couters (for collecting data for example about I/O operations or other two or more arguments operations), 3) prepare user space tool with some kind of language which will allow hanging ptrobes with aboove tho (simple counters and association tables of couters) 4) base functions for measure time (with KProbes overhead and without) and store them in couters and association tables, All above base "tools" for above will take small or medium amount of data and can be performad small or medium pieces of binary codes. And after above: 5) prepare infrastrucrute for probes which will store data in diffrent containers depending on initiator process and/or thread (and maybe in next etap also will be good have something more common which will depend on stack path), 6) prepare base functions for tracing stack paths (counting them and store in association tables), 7) make some kind of study where is it will be good compute something more complicated like base "speculative probes" (lookin on working DTrace probably answer in this point will be "yes"). All to this moment will not require relayfs because amount of transfered data will be _very low_. Details of above will be probably different (I have only some very common knowledge about DTrace implementations details and some avarange about using dtrace tool) but I want count/pint *only* feactutres which will not require using relayfs. And *after finish above* will be much easier perform some kind of study about "is relayfs is still neccessary ?" and *if* answer will be still "YES" try to integrate neccessary patches (or maybe something other .. maybe better adjusted to all non-above cases). Also add something like relayfs at this moment _will not require_ changes in existing code (if will require changes will be very small but maybe will ollow reduce existin now relayfs (?)). But if you will build all infrastructure even for simple couters on relayfs fundament it will be (IMO) badly/incorrectly designed .. and using even simple couters will introduce to high overhead for system. *NOT using realyfs* if it is not neccessary for possibly big amout of feactures future KProbes IMO in this case is *fundamental*. To time where this base not requiring relayfs feactures will not be integrated in kernel code better IMO will be stop merging relayfs. There are tools like itrace and Intel has one (i forgot the name) they would like to get the raw data into user space and do all kinds of fancy statistical analysis, visualization etc. Their value add is the analysis of the data. I am sure you are not suggesting pushing capabilities of those tools to the kernel, right. I don't know any thing about this tool (can you sent URL?) but please .. dont't be fool and do not try as first prepare something eye candy :) Rest this area for other developers and focus on fundaments :) regards kloczek -- --- *Ludzie nie mają problemów, tylko sobie sami je stwarzają* --- Tomasz Kłoczko, sys adm @zie.pg.gda.pl|*e-mail: [EMAIL PROTECTED]
Re: Merging relayfs?
On Tue, 12 Jul 2005, Vara Prasad wrote: Tomasz Kłoczko wrote: On Tue, 12 Jul 2005, Tom Zanussi wrote: =?ISO-8859-2?Q?Tomasz_K=B3oczko?= writes: > On Tue, 12 Jul 2005, Tom Zanussi wrote: [...] > > OK .. "so you can say better is stop flushing buffers on measure which > wil take day or more" ? :_) > Some DTrace probes/technik are specialy prepared for long or evel very > long time experiment wich will only prodyce few lines results on end of > experiment. > Look at DTrace documentation for speculative tracing: > http://docs.sun.com/app/docs/doc/817-6223/6mlkidli7?a=view How do you propose to implement speculative tracing without a buffer to hold the data, when data needs to stay in the kernel for a while before we decide to commit or discard? Buffering some data inside kernel space and buffering with infrastructure for transfer to user space this are two diffrent things. kloczek -- --- *Ludzie nie mają problemów, tylko sobie sami je stwarzają* --- Tomasz Kłoczko, sys adm @zie.pg.gda.pl|*e-mail: [EMAIL PROTECTED]
RE: Merging relayfs?
I believe the Intel tool that Vara is referencing is the Vtune tool (which has an open source, GPL'ed statistical sampling driver). It keeps a trace history (instead of aggregating the data) that is passed into user space so that it can do post processing analysis from user space. The most common method of aggregating data for sampling/profiling is to lose the time information of when a sample is taken (for example, that is what oprofile does). For many people, this is fine. For others, they want the time information so they can visualize the sequence of events. Having relayfs merged into the kernel would allow us to have a consistent and reliable way of passing the data we need from kernel space into user space. In essence, relayfs is a basic infrastructure upon which other tools can be built - whether that's profiling, debugging, logging, etc. -- charles > -Original Message- > From: [EMAIL PROTECTED] > [mailto:[EMAIL PROTECTED] On Behalf Of Vara Prasad > Sent: Tuesday, July 12, 2005 9:30 PM > To: unlisted-recipients > Cc: linux-kernel@vger.kernel.org > Subject: Re: Merging relayfs? > > *** snip *** > > There are tools like itrace and Intel has one (i forgot the > name) they would like to get the raw data into user space and > do all kinds of fancy statistical analysis, visualization > etc. Their value add is the analysis of the data. I am sure > you are not suggesting pushing capabilities of those tools to > the kernel, right. > > As Steven Rostedt mentioned in his initial reply in this > thread, many of us have written adhoc buffering scheme > similar to what relayfs provides to debug kernel problems > that happen after a long running test, if such facility > already exists in the kernel everyone doesn't have to develop one. > > I would like to see relayfs merged. > > - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Merging relayfs?
Tomasz Kłoczko wrote: On Tue, 12 Jul 2005, Tom Zanussi wrote: [..] > DTrace real examples shows something completly diffret. > MANY things (if not ~almost all) can be kept only in aggregated form > during experiments. But you can also do the aggregation in user space if you have a cheap way of getting it there, as we've shown with some of the examples. Sorry but real life examples shows that store chunk of data in agregator is less expensive than context switch neccessary for store data or time neccasy for send and handle signal from buffer like "I'm full! let me out of here ..". [..] > store raw data. What you need ? only one counter (few bytes) instead of huge > amount of memeory for buffer and store logs. Try measure something like > scheduler with possible small system distruption. Most of the time the data is just being buffered and only when the buffer is full is it written to disk, as one write. If that's too disruptive, then maybe you do need to do some aggregation in the kernel, but it sounds like a special case. OK .. "so you can say better is stop flushing buffers on measure which wil take day or more" ? :_) Some DTrace probes/technik are specialy prepared for long or evel very long time experiment wich will only prodyce few lines results on end of experiment. Look at DTrace documentation for speculative tracing: http://docs.sun.com/app/docs/doc/817-6223/6mlkidli7?a=view Some experiments do not have deterinistic time and must be finished after i. e. "occasional failing". What if it will take so long so you will fill all avalaible storage in relayfs way ? OK, never mind .. you have discontinued storage. Using kind speculative tracing way I'll have result *just after* "occasional failing" and you will start parse data stored using relayfs. kloczek O.K, Tomasz your point is we can do aggregation in the kernel and cut down the amount of data that needs to be sent out from the kernel hence we don't need an efficient, low overhead mechanism like relayfs to get the data out of the kernel. Having relayfs doesn't prevent someone in aggregating the data in the kernel, so it is not an argument for not including relayfs in the kernel when it fills the need for those who needs raw data. I am part of a team working on systemtap where we are are developing a tool similar to Dtrace that does some aggregation where appropriate but nothing like fancy statistics etc. We use relayfs in our systemtap project and based on my reading of Dtrace paper they use exactly similar to relayfs buffering mechanism as well. There are tools like itrace and Intel has one (i forgot the name) they would like to get the raw data into user space and do all kinds of fancy statistical analysis, visualization etc. Their value add is the analysis of the data. I am sure you are not suggesting pushing capabilities of those tools to the kernel, right. As Steven Rostedt mentioned in his initial reply in this thread, many of us have written adhoc buffering scheme similar to what relayfs provides to debug kernel problems that happen after a long running test, if such facility already exists in the kernel everyone doesn't have to develop one. I would like to see relayfs merged. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Merging relayfs?
On Tue, 2005-07-12 at 16:55 -0700, Andrew Morton wrote: > Steven Rostedt <[EMAIL PROTECTED]> wrote: > > > > I will also admit that my ring buffers lost one byte per page. Because > > I wanted to save on space with the accounting, and only had a start and > > end pointer per page. So when start and end were equal, the buffer was > > considered empty and when end was one less than start, it was considered > > full. But since end always pointed to an empty spot, it would still be > > empty when the buffer was full, thus wasting one byte per page. But to > > solve this, I would either have to add another variable in the buffer > > page descriptor (adding at least one byte, but probably 4 bytes) which > > would just be more waste, or I would have to make a complex system even > > more complex (ie. adding a flag on the end pointer at the MSB to > > differentiate between end being empty or filled). > > Nope. Just make the indices 32-bit numbers and let them wrap. > > Full: (tail - head) == size > Empty:(tail - head) == 0 > Add item: buf[head++ & (size-1)] = item; > Remove item: buf[tail++ & (size-1)] You know I knew someone would have an answer. Look for version 0.2.1 comming soon :-) Thanks, -- Steve - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Merging relayfs?
Steven Rostedt <[EMAIL PROTECTED]> wrote: > > I will also admit that my ring buffers lost one byte per page. Because > I wanted to save on space with the accounting, and only had a start and > end pointer per page. So when start and end were equal, the buffer was > considered empty and when end was one less than start, it was considered > full. But since end always pointed to an empty spot, it would still be > empty when the buffer was full, thus wasting one byte per page. But to > solve this, I would either have to add another variable in the buffer > page descriptor (adding at least one byte, but probably 4 bytes) which > would just be more waste, or I would have to make a complex system even > more complex (ie. adding a flag on the end pointer at the MSB to > differentiate between end being empty or filled). Nope. Just make the indices 32-bit numbers and let them wrap. Full: (tail - head) == size Empty: (tail - head) == 0 Add item: buf[head++ & (size-1)] = item; Remove item:buf[tail++ & (size-1)] - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Merging relayfs?
On Tue, 2005-07-12 at 16:38 -0500, Tom Zanussi wrote: > Tom Zanussi writes: > > > > > > I was thinking of something simpler, like just using the page array we > > already have in relayfs, but not vmap'ing it and instead writing to > > the current page, detecting when to split a record, moving on to the > > next page, etc. and seeing how it compares with the vmap version. > > > > Just a clarification - I didn't mean to ignore your ring buffers - it > would be good to try both, I think... Oh, by all means, simple is usually better. I didn't take any offense to not using it. My ring buffers are quite confusing, and took quite of bit debugging to finally get them straight. If you get something that works then it should be good to go. My ring buffers were meant to be always used as a ring buffer that would only save the latest data and not stop when full. So, each page had to have it's own start and stop since the beginning of the buffer could actually be anywhere on any page. That's because, once the ring buffer filled up, the start of the buffer would move as you added more data. A simple approach should be best, but if you start doing the individual page accounting, and find that it's getting complex to handle all cases, then it's good to know that my ring buffers are always out there :-) I will also admit that my ring buffers lost one byte per page. Because I wanted to save on space with the accounting, and only had a start and end pointer per page. So when start and end were equal, the buffer was considered empty and when end was one less than start, it was considered full. But since end always pointed to an empty spot, it would still be empty when the buffer was full, thus wasting one byte per page. But to solve this, I would either have to add another variable in the buffer page descriptor (adding at least one byte, but probably 4 bytes) which would just be more waste, or I would have to make a complex system even more complex (ie. adding a flag on the end pointer at the MSB to differentiate between end being empty or filled). -- Steve - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Merging relayfs?
Tom Zanussi writes: > Steven Rostedt writes: > > On Tue, 2005-07-12 at 11:36 -0500, Tom Zanussi wrote: > > > > > > > > > > I totally agree that the vmalloc way is faster, but I would also > argue > > > > that the accounting to handle the separate pages would not even be > > > > noticeable with the time it takes to do the actual copying into the > > > > buffer. So if the accounting adds 3ns on top of 500ns to complete, I > > > > don't think people will mind. > > > > > > OK, it sounds like something to experiment with - I can play around > > > with it, and later submit a patch to remove vmap if it works out. > > > Does that sound like a good idea? > > > > Sounds good to me, since different approaches to a problem are always > > good, since it allows for comparing the plusses and minuses. Not sure > > if you want to take a crack using my ring buffers, but although they are > > quite confusing, they have been fully tested, since I haven't changed > > the ring buffer for a few years (although logdev itself has gone > through > > I was thinking of something simpler, like just using the page array we > already have in relayfs, but not vmap'ing it and instead writing to > the current page, detecting when to split a record, moving on to the > next page, etc. and seeing how it compares with the vmap version. > Just a clarification - I didn't mean to ignore your ring buffers - it would be good to try both, I think... Tom - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Merging relayfs?
Tomasz Kłoczko wrote: On Tue, 12 Jul 2005, Tom Zanussi wrote: =?ISO-8859-2?Q?Tomasz_K=B3oczko?= writes: > On Tue, 12 Jul 2005, Tom Zanussi wrote: [...] > > OK .. "so you can say better is stop flushing buffers on measure which > wil take day or more" ? :_) > Some DTrace probes/technik are specialy prepared for long or evel very > long time experiment wich will only prodyce few lines results on end of > experiment. > Look at DTrace documentation for speculative tracing: > http://docs.sun.com/app/docs/doc/817-6223/6mlkidli7?a=view > How do you propose to implement speculative tracing without a buffer to hold the data, when data needs to stay in the kernel for a while before we decide to commit or discard? [...] kloczek - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Merging relayfs?
Tomasz Kłoczko wrote: On Tue, 12 Jul 2005, Tom Zanussi wrote: [..] > This is much more simpler and much better for control (also from point of > view caching bugs in agregator code -> also from point of view kernel > stability). > > Also .. probably some code for handle i.e. counters cen be the same as > existing code in current kernel. > Probably some "atomic" (and/or simpler) agregators can be usefull in other > places in kernel for collecting some data during all time when system > works .. so code for handle this can be reused in non-ocasinal > tracing/measuring. > And again: all without things like relayfs. Well, you should check out the sytemtap project. It's basically a DTrace clone which is already doing these kinds of things with kprobes, and it's using relayfs... Probaly by this it will be harder to say "KProbes it is Solaris DTrace clone". I have not looked at Dtrace code but based on their USENIX paper looks like we can not call Systemtap as Dtrace clone without a buffering scheme like relayfs. kloczek - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Merging relayfs?
On Tue, 12 Jul 2005, Tom Zanussi wrote: [..] > This is much more simpler and much better for control (also from point of > view caching bugs in agregator code -> also from point of view kernel > stability). > > Also .. probably some code for handle i.e. counters cen be the same as > existing code in current kernel. > Probably some "atomic" (and/or simpler) agregators can be usefull in other > places in kernel for collecting some data during all time when system > works .. so code for handle this can be reused in non-ocasinal > tracing/measuring. > And again: all without things like relayfs. Well, you should check out the sytemtap project. It's basically a DTrace clone which is already doing these kinds of things with kprobes, and it's using relayfs... Probaly by this it will be harder to say "KProbes it is Solaris DTrace clone". kloczek -- --- *Ludzie nie mają problemów, tylko sobie sami je stwarzają* --- Tomasz Kłoczko, sys adm @zie.pg.gda.pl|*e-mail: [EMAIL PROTECTED]
Re: Merging relayfs?
=?ISO-8859-2?Q?Tomasz_K=B3oczko?= writes: > On Tue, 12 Jul 2005, Tom Zanussi wrote: [...] > > > > Most of the time the data is just being buffered and only when the > > buffer is full is it written to disk, as one write. If that's too > > disruptive, then maybe you do need to do some aggregation in the kernel, > > but it sounds like a special case. > > OK .. "so you can say better is stop flushing buffers on measure which > wil take day or more" ? :_) > Some DTrace probes/technik are specialy prepared for long or evel very > long time experiment wich will only prodyce few lines results on end of > experiment. > Look at DTrace documentation for speculative tracing: > http://docs.sun.com/app/docs/doc/817-6223/6mlkidli7?a=view > It's also possible to do long-running 'experiments' using relayfs, and never write anything at all to disk. Here's an example prototype I did using a Perl interpreter embedded in the user space event-reading loop: http://www.listserv.shafik.org/pipermail/ltt-dev/2004-August/000649.html > Some experiments do not have deterinistic time and must be finished after > i. e. "occasional failing". What if it will take so long so you will fill > all avalaible storage in relayfs way ? > OK, never mind .. you have discontinued storage. Using kind speculative > tracing way I'll have result *just after* "occasional failing" and you > will start parse data stored using relayfs. As in the example above, you don't necessary need to fill any available storage. You can also use relayfs in 'circular-buffer' mode, which would capture a buffer full of events up the point of your failure. Sounds like speculative tracing to me. Tom - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Merging relayfs?
Steven Rostedt writes: > On Tue, 2005-07-12 at 11:36 -0500, Tom Zanussi wrote: > > > > > > > I totally agree that the vmalloc way is faster, but I would also argue > > > that the accounting to handle the separate pages would not even be > > > noticeable with the time it takes to do the actual copying into the > > > buffer. So if the accounting adds 3ns on top of 500ns to complete, I > > > don't think people will mind. > > > > OK, it sounds like something to experiment with - I can play around > > with it, and later submit a patch to remove vmap if it works out. > > Does that sound like a good idea? > > Sounds good to me, since different approaches to a problem are always > good, since it allows for comparing the plusses and minuses. Not sure > if you want to take a crack using my ring buffers, but although they are > quite confusing, they have been fully tested, since I haven't changed > the ring buffer for a few years (although logdev itself has gone through I was thinking of something simpler, like just using the page array we already have in relayfs, but not vmap'ing it and instead writing to the current page, detecting when to split a record, moving on to the next page, etc. and seeing how it compares with the vmap version. Tom - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Merging relayfs?
On Tue, 12 Jul 2005, Tom Zanussi wrote: [..] > DTrace real examples shows something completly diffret. > MANY things (if not ~almost all) can be kept only in aggregated form > during experiments. But you can also do the aggregation in user space if you have a cheap way of getting it there, as we've shown with some of the examples. Sorry but real life examples shows that store chunk of data in agregator is less expensive than context switch neccessary for store data or time neccasy for send and handle signal from buffer like "I'm full! let me out of here ..". [..] > store raw data. What you need ? only one counter (few bytes) instead of huge > amount of memeory for buffer and store logs. Try measure something like > scheduler with possible small system distruption. Most of the time the data is just being buffered and only when the buffer is full is it written to disk, as one write. If that's too disruptive, then maybe you do need to do some aggregation in the kernel, but it sounds like a special case. OK .. "so you can say better is stop flushing buffers on measure which wil take day or more" ? :_) Some DTrace probes/technik are specialy prepared for long or evel very long time experiment wich will only prodyce few lines results on end of experiment. Look at DTrace documentation for speculative tracing: http://docs.sun.com/app/docs/doc/817-6223/6mlkidli7?a=view Some experiments do not have deterinistic time and must be finished after i. e. "occasional failing". What if it will take so long so you will fill all avalaible storage in relayfs way ? OK, never mind .. you have discontinued storage. Using kind speculative tracing way I'll have result *just after* "occasional failing" and you will start parse data stored using relayfs. kloczek -- --- *Ludzie nie mają problemów, tylko sobie sami je stwarzają* --- Tomasz Kłoczko, sys adm @zie.pg.gda.pl|*e-mail: [EMAIL PROTECTED]
Re: Merging relayfs?
On Tue, 2005-07-12 at 11:36 -0500, Tom Zanussi wrote: > > > > I totally agree that the vmalloc way is faster, but I would also argue > > that the accounting to handle the separate pages would not even be > > noticeable with the time it takes to do the actual copying into the > > buffer. So if the accounting adds 3ns on top of 500ns to complete, I > > don't think people will mind. > > OK, it sounds like something to experiment with - I can play around > with it, and later submit a patch to remove vmap if it works out. > Does that sound like a good idea? Sounds good to me, since different approaches to a problem are always good, since it allows for comparing the plusses and minuses. Not sure if you want to take a crack using my ring buffers, but although they are quite confusing, they have been fully tested, since I haven't changed the ring buffer for a few years (although logdev itself has gone through several changes). I use the logdev device on a daily basis to debug almost every kernel I ever touch. When working with a new kernel, the first thing I do is usually add my logdev patch. Note to all: The patch I posted is not the same patch that I usually use (although the ring buffers _are_ the same), since I add stuff that is usually more specific to what I do. So if something is broken with it, I would greatly appreciate it if someone lets me know. Thanks, -- Steve - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Merging relayfs?
Steven Rostedt writes: > On Tue, 2005-07-12 at 11:08 -0500, Tom Zanussi wrote: > > Steven Rostedt writes: > > > On Tue, 2005-07-12 at 10:58 -0400, Jason Baron wrote: > > > > On Mon, 11 Jul 2005, Tom Zanussi wrote: > > > > > > > One concern I had regarding relayfs, which was raised previously, was > > > > regarding its use of vmap, > > > > http://marc.theaimsgroup.com/?l=linux-kernel&m=110755199913216&w=2 On > > x86, > > > > the vmap space is at a premium, and this space is reserved over the > > entire > > > > lifetime of a 'channel'. Is the use of vmap really critical for > > > > performance? > > > > > > I believe that (Tom correct me if I'm wrong) the use of vmap was to > > > allocate a large buffer without risking failing to allocate. Since the > > > buffer does not need to be in continuous pages. If this is a problem, > > > maybe Tom can use my buffer method to make a buffer :-) > > > > > > > The main reason we use vmap is so that from the kernel side we have a > > nice contiguous address range to log to even though the the pages > > aren't actually contiguous. > > That's what I meant, but you said it better :-) > > > > > > See http://www.kihontech.com/logdev where my logdev debugging tool that > > > allocates separate pages and uses an accounting system instead of the > > > more efficient vmalloc to keep the data in the pages together. I'm > > > currently working with Tom to get this to use relayfs as the back end. > > > But here you can take a look at how the buffering works and it doesn't > > > waste up vmalloc. > > > > It might be worthwhile to try out different alternatives and compare > > them, but I'm pretty sure we won't be able to beat what's already in > > relayfs. The question is I guess, how much slower would be > > acceptable? > > I totally agree that the vmalloc way is faster, but I would also argue > that the accounting to handle the separate pages would not even be > noticeable with the time it takes to do the actual copying into the > buffer. So if the accounting adds 3ns on top of 500ns to complete, I > don't think people will mind. OK, it sounds like something to experiment with - I can play around with it, and later submit a patch to remove vmap if it works out. Does that sound like a good idea? Tom - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Merging relayfs?
=?ISO-8859-2?Q?Tomasz_K=B3oczko?= writes: > On Tue, 12 Jul 2005, Tom Zanussi wrote: > > > =?ISO-8859-2?Q?Tomasz_K=B3oczko?= writes: > > > On Mon, 11 Jul 2005, Tom Zanussi wrote: > > > > > > > > > > > Hi Andrew, can you please merge relayfs? It provides a low-overhead > > > > logging and buffering capability, which does not currently exist in > > > > the kernel. > > > > > > > > relayfs key features: > > > > > > > > - Extremely efficient high-speed logging/buffering > > > > > > Usualy/for now relayfs is used as base infrastructure for variuos > > > debuging/measuring. > > > IMO storing raw data and transfer them to user space it is wrong way. > > > Why ? Becase i adds very big overhead for memory nad storage. > > > Big .. compare to in situ storing partialy analyzed data in conters > > > and other like it is in DTrace. > > > > > > > But isn't it supposed to be a good thing to keep analysis out of the > > kernel if possible? > > As long as you try for example measure (?) .. not. > > > And many things can't be aggregated, such as the detailed sequence of > > events in a trace. > > DTrace real examples shows something completly diffret. > MANY things (if not ~almost all) can be kept only in aggregated form > during experiments. But you can also do the aggregation in user space if you have a cheap way of getting it there, as we've shown with some of the examples. Why do you need it in the kernel? And what do you do when you need to know the exact sequence of events, especially if you don't really know what you're looking for? > > > Anyway, it doesn't have to be > > an 'all or nothing' thing. For some applications it may make sense to > > do some amount of filtering and aggregation in the kernel. AFAICS > > DTrace takes this to the extreme and does everything in the kernel, > > and IIRC it can't easily be made to general system tracing along the > > lines of LTT, for instance. > > Try measure number of dysk I/O operation without touching storage for > store raw data. What you need ? only one counter (few bytes) instead of huge > amount of memeory for buffer and store logs. Try measure something like > scheduler with possible small system distruption. Most of the time the data is just being buffered and only when the buffer is full is it written to disk, as one write. If that's too disruptive, then maybe you do need to do some aggregation in the kernel, but it sounds like a special case. As for measuring the sheduler, I know that people have used it for that e.g. Steven Rostedt's logdev device, which he uses to trace problems in the RT kernel. Tom - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Merging relayfs?
On Tue, 2005-07-12 at 11:08 -0500, Tom Zanussi wrote: > Steven Rostedt writes: > > On Tue, 2005-07-12 at 10:58 -0400, Jason Baron wrote: > > > On Mon, 11 Jul 2005, Tom Zanussi wrote: > > > > > One concern I had regarding relayfs, which was raised previously, was > > > regarding its use of vmap, > > > http://marc.theaimsgroup.com/?l=linux-kernel&m=110755199913216&w=2 On > x86, > > > the vmap space is at a premium, and this space is reserved over the > entire > > > lifetime of a 'channel'. Is the use of vmap really critical for > > > performance? > > > > I believe that (Tom correct me if I'm wrong) the use of vmap was to > > allocate a large buffer without risking failing to allocate. Since the > > buffer does not need to be in continuous pages. If this is a problem, > > maybe Tom can use my buffer method to make a buffer :-) > > > > The main reason we use vmap is so that from the kernel side we have a > nice contiguous address range to log to even though the the pages > aren't actually contiguous. That's what I meant, but you said it better :-) > > > See http://www.kihontech.com/logdev where my logdev debugging tool that > > allocates separate pages and uses an accounting system instead of the > > more efficient vmalloc to keep the data in the pages together. I'm > > currently working with Tom to get this to use relayfs as the back end. > > But here you can take a look at how the buffering works and it doesn't > > waste up vmalloc. > > It might be worthwhile to try out different alternatives and compare > them, but I'm pretty sure we won't be able to beat what's already in > relayfs. The question is I guess, how much slower would be > acceptable? I totally agree that the vmalloc way is faster, but I would also argue that the accounting to handle the separate pages would not even be noticeable with the time it takes to do the actual copying into the buffer. So if the accounting adds 3ns on top of 500ns to complete, I don't think people will mind. I haven't looked too much into the workings of relayfs (I let you handle that ;-) so I don't really know the impact it would have to use something like logdev's buffering system. -- Steve - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Merging relayfs?
Steven Rostedt writes: > On Tue, 2005-07-12 at 10:58 -0400, Jason Baron wrote: > > On Mon, 11 Jul 2005, Tom Zanussi wrote: > > > One concern I had regarding relayfs, which was raised previously, was > > regarding its use of vmap, > > http://marc.theaimsgroup.com/?l=linux-kernel&m=110755199913216&w=2 On x86, > > the vmap space is at a premium, and this space is reserved over the entire > > lifetime of a 'channel'. Is the use of vmap really critical for > > performance? > > I believe that (Tom correct me if I'm wrong) the use of vmap was to > allocate a large buffer without risking failing to allocate. Since the > buffer does not need to be in continuous pages. If this is a problem, > maybe Tom can use my buffer method to make a buffer :-) > The main reason we use vmap is so that from the kernel side we have a nice contiguous address range to log to even though the the pages aren't actually contiguous. > See http://www.kihontech.com/logdev where my logdev debugging tool that > allocates separate pages and uses an accounting system instead of the > more efficient vmalloc to keep the data in the pages together. I'm > currently working with Tom to get this to use relayfs as the back end. > But here you can take a look at how the buffering works and it doesn't > waste up vmalloc. It might be worthwhile to try out different alternatives and compare them, but I'm pretty sure we won't be able to beat what's already in relayfs. The question is I guess, how much slower would be acceptable? Tom - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Merging relayfs?
On Tue, 2005-07-12 at 10:26 -0500, Tom Zanussi wrote: > I don't really know how we would get around using vmap - it seems like > the alternatives, such as managing an array of pages or something like > that, would slow down the logging path too much to make it useful as a > low overhead logging mechanism. I you have any ideas though, please > let me know. Tom, My logdev device was pretty quick! The managing of the pages were negligible to the copying of the data to the buffer. Although, sometimes you needed to copy across buffers, but this too wouldn't be too much of an impact. -- Steve - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Merging relayfs?
On Tue, 2005-07-12 at 10:58 -0400, Jason Baron wrote: > On Mon, 11 Jul 2005, Tom Zanussi wrote: > One concern I had regarding relayfs, which was raised previously, was > regarding its use of vmap, > http://marc.theaimsgroup.com/?l=linux-kernel&m=110755199913216&w=2 On x86, > the vmap space is at a premium, and this space is reserved over the entire > lifetime of a 'channel'. Is the use of vmap really critical for > performance? I believe that (Tom correct me if I'm wrong) the use of vmap was to allocate a large buffer without risking failing to allocate. Since the buffer does not need to be in continuous pages. If this is a problem, maybe Tom can use my buffer method to make a buffer :-) See http://www.kihontech.com/logdev where my logdev debugging tool that allocates separate pages and uses an accounting system instead of the more efficient vmalloc to keep the data in the pages together. I'm currently working with Tom to get this to use relayfs as the back end. But here you can take a look at how the buffering works and it doesn't waste up vmalloc. -- Steve - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Merging relayfs?
On Tue, 12 Jul 2005, Tom Zanussi wrote: =?ISO-8859-2?Q?Tomasz_K=B3oczko?= writes: > On Mon, 11 Jul 2005, Tom Zanussi wrote: > > > > > Hi Andrew, can you please merge relayfs? It provides a low-overhead > > logging and buffering capability, which does not currently exist in > > the kernel. > > > > relayfs key features: > > > > - Extremely efficient high-speed logging/buffering > > Usualy/for now relayfs is used as base infrastructure for variuos > debuging/measuring. > IMO storing raw data and transfer them to user space it is wrong way. > Why ? Becase i adds very big overhead for memory nad storage. > Big .. compare to in situ storing partialy analyzed data in conters > and other like it is in DTrace. > But isn't it supposed to be a good thing to keep analysis out of the kernel if possible? As long as you try for example measure (?) .. not. And many things can't be aggregated, such as the detailed sequence of events in a trace. DTrace real examples shows something completly diffret. MANY things (if not ~almost all) can be kept only in aggregated form during experiments. Anyway, it doesn't have to be an 'all or nothing' thing. For some applications it may make sense to do some amount of filtering and aggregation in the kernel. AFAICS DTrace takes this to the extreme and does everything in the kernel, and IIRC it can't easily be made to general system tracing along the lines of LTT, for instance. Try measure number of dysk I/O operation without touching storage for store raw data. What you need ? only one counter (few bytes) instead of huge amount of memeory for buffer and store logs. Try measure something like scheduler with possible small system distruption. kloczek -- --- *Ludzie nie mają problemów, tylko sobie sami je stwarzają* --- Tomasz Kłoczko, sys adm @zie.pg.gda.pl|*e-mail: [EMAIL PROTECTED]
Re: Merging relayfs?
On Tue, 12 Jul 2005, Baruch Even wrote: [..] Usualy/for now relayfs is used as base infrastructure for variuos debuging/measuring. IMO storing raw data and transfer them to user space it is wrong way. Why ? Becase i adds very big overhead for memory nad storage. Big .. compare to in situ storing partialy analyzed data in conters and other like it is in DTrace. IMO much better will be add base/template set of functions for use in KProbes probes which will come with KProbes code as base tool set. It will allow cut transfered data size from megabites/gigabyutes to hundret bytes/kilo bytes, make debuging/measuring more smooth without additional latency for transfer data outside kernel space. There is no relation between using kprobes and reducing the logged data size. At the end the debugging/tracing facility is there to provide data to the developer who tries to detect the problem or ensure correctness. Yes, now relayfs and KProbes this two diffrent stories without strict relation but this relation exist on higher level. Both are used for solve tha same problems (for measure, watch, some skeleton debug). Collecting data _without_ dynamically hanged probes requires relayfes but if collected data can be rolled to data types what you will want to see as result of experiment (i.e. number of calls of some code asociated with differnt stack path or number of I/O operation asociated with avarange transfered data in I/O operations) sucking result data will not be an issue :) The kprobes can only serve as a replacement to changing the source code in order to extract the debugging information, and it does it very well. Cutting the amount of data transferred is only possible if you add the problem detection logic into the kernel and only transport problem reports to user-mode. Of course yes. I want only say: if KProbes will have this logic relayfs will not be neccessary and instead focusing on develop and merge relayfs better will be spend time on prepare code for this additional logic (and probably neccesasary amount of code will be compareable to current relayfs code size :) kloczek -- --- *Ludzie nie mają problemów, tylko sobie sami je stwarzają* --- Tomasz Kłoczko, sys adm @zie.pg.gda.pl|*e-mail: [EMAIL PROTECTED]
Re: Merging relayfs?
Jason Baron writes: > > On Mon, 11 Jul 2005, Tom Zanussi wrote: > > > > > Hi Andrew, can you please merge relayfs? It provides a low-overhead > > logging and buffering capability, which does not currently exist in > > the kernel. > > > > One concern I had regarding relayfs, which was raised previously, was > regarding its use of vmap, > http://marc.theaimsgroup.com/?l=linux-kernel&m=110755199913216&w=2 On x86, > the vmap space is at a premium, and this space is reserved over the entire > lifetime of a 'channel'. Is the use of vmap really critical for > performance? Yes, the vmap'ed area is reserved over the lifetime of the channel, but the typical usage of a channel is transient - allocate it at the start of say a tracing run, and then vunmap it and free the memory when done. Unless you're using huge buffers, you wouldn't run into a problem running out of vmalloc space, and typical applications should be able to use relatively small buffers. I don't really know how we would get around using vmap - it seems like the alternatives, such as managing an array of pages or something like that, would slow down the logging path too much to make it useful as a low overhead logging mechanism. I you have any ideas though, please let me know. Tom - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Merging relayfs?
=?ISO-8859-2?Q?Tomasz_K=B3oczko?= writes: > On Mon, 11 Jul 2005, Tom Zanussi wrote: > > > > > Hi Andrew, can you please merge relayfs? It provides a low-overhead > > logging and buffering capability, which does not currently exist in > > the kernel. > > > > relayfs key features: > > > > - Extremely efficient high-speed logging/buffering > > Usualy/for now relayfs is used as base infrastructure for variuos > debuging/measuring. > IMO storing raw data and transfer them to user space it is wrong way. > Why ? Becase i adds very big overhead for memory nad storage. > Big .. compare to in situ storing partialy analyzed data in conters > and other like it is in DTrace. > But isn't it supposed to be a good thing to keep analysis out of the kernel if possible? And many things can't be aggregated, such as the detailed sequence of events in a trace. Anyway, it doesn't have to be an 'all or nothing' thing. For some applications it may make sense to do some amount of filtering and aggregation in the kernel. AFAICS DTrace takes this to the extreme and does everything in the kernel, and IIRC it can't easily be made to general system tracing along the lines of LTT, for instance. > IMO much better will be add base/template set of functions for use in > KProbes probes which will come with KProbes code as base tool set. It will > allow cut transfered data size from megabites/gigabyutes to hundret > bytes/kilo bytes, make debuging/measuring more smooth without additional > latency for transfer data outside kernel space. The systemtap project is using kprobes along these lines. Tom - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Merging relayfs?
On Mon, 11 Jul 2005, Tom Zanussi wrote: > > Hi Andrew, can you please merge relayfs? It provides a low-overhead > logging and buffering capability, which does not currently exist in > the kernel. > One concern I had regarding relayfs, which was raised previously, was regarding its use of vmap, http://marc.theaimsgroup.com/?l=linux-kernel&m=110755199913216&w=2 On x86, the vmap space is at a premium, and this space is reserved over the entire lifetime of a 'channel'. Is the use of vmap really critical for performance? thanks, -Jason - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Merging relayfs?
On Tue, 2005-07-12 at 01:23, Greg KH wrote: > And you _are_ doing kernel debugging and tracing with ltt, what's wrong > with admitting that? > Hi. I think that viewing tracing tools like LTT and systemtap as strictly kernel debug tools is very short-sighted. With a good post-processing tool, tracing is very useful to application developers who can benefit by visualizing the interaction between user-level tasks and the OS as well as the synchronization of multiple tasks/threads. IOW, tracing is in many ways an _application_ debug tool, not a _kernel_ debug tool. And application developers usually do not want to run a debug kernel. I would like to see relayfs merged. -- Steve Rotolo <[EMAIL PROTECTED]> - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Merging relayfs?
Tomasz Kłoczko wrote: > On Mon, 11 Jul 2005, Tom Zanussi wrote: > >> >> Hi Andrew, can you please merge relayfs? It provides a low-overhead >> logging and buffering capability, which does not currently exist in >> the kernel. >> >> relayfs key features: >> >> - Extremely efficient high-speed logging/buffering > > > Usualy/for now relayfs is used as base infrastructure for variuos > debuging/measuring. > IMO storing raw data and transfer them to user space it is wrong way. > Why ? Becase i adds very big overhead for memory nad storage. > Big .. compare to in situ storing partialy analyzed data in conters > and other like it is in DTrace. > > IMO much better will be add base/template set of functions for use in > KProbes probes which will come with KProbes code as base tool set. It > will allow cut transfered data size from megabites/gigabyutes to hundret > bytes/kilo bytes, make debuging/measuring more smooth without additional > latency for transfer data outside kernel space. There is no relation between using kprobes and reducing the logged data size. At the end the debugging/tracing facility is there to provide data to the developer who tries to detect the problem or ensure correctness. The kprobes can only serve as a replacement to changing the source code in order to extract the debugging information, and it does it very well. Cutting the amount of data transferred is only possible if you add the problem detection logic into the kernel and only transport problem reports to user-mode. Baruch - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Merging relayfs?
On Mon, 11 Jul 2005, Tom Zanussi wrote: Hi Andrew, can you please merge relayfs? It provides a low-overhead logging and buffering capability, which does not currently exist in the kernel. relayfs key features: - Extremely efficient high-speed logging/buffering Usualy/for now relayfs is used as base infrastructure for variuos debuging/measuring. IMO storing raw data and transfer them to user space it is wrong way. Why ? Becase i adds very big overhead for memory nad storage. Big .. compare to in situ storing partialy analyzed data in conters and other like it is in DTrace. IMO much better will be add base/template set of functions for use in KProbes probes which will come with KProbes code as base tool set. It will allow cut transfered data size from megabites/gigabyutes to hundret bytes/kilo bytes, make debuging/measuring more smooth without additional latency for transfer data outside kernel space. It will be good not reinvent wheel in wrong way if in working implemtation like DTrace it work more than well. Yes, maybe it will be good have something like relayfs for some other tasks but for debuging/measuring better will be IMO use other way which will not use this technik. kloczek -- --- *Ludzie nie mają problemów, tylko sobie sami je stwarzają* --- Tomasz Kłoczko, sys adm @zie.pg.gda.pl|*e-mail: [EMAIL PROTECTED]
Re: Merging relayfs?
On Tue, 2005-07-12 at 03:25 +0100, Christoph Hellwig wrote: > On Mon, Jul 11, 2005 at 08:10:42PM -0500, Tom Zanussi wrote: > > > > Hi Andrew, can you please merge relayfs? It provides a low-overhead > > logging and buffering capability, which does not currently exist in > > the kernel. > > While the code is pretty nicely in shape it seems rather pointless to > merge until an actual user goes with it. > I have to also say that this is an exception. How many people out there have written a variant of relayfs to do debugging? It is about time that there's a buffer in the kernel that can be written to and later retrieved to debug things like the scheduler that printk in all its forms just doesn't cut it. I've been working with Tom to get my logdev debugging tool to use relayfs as a back end. This allows for showing output that shows exactly what's going on inside the kernel. It keeps the latest data around and when/if the kernel crashes, it shows all the events that lead up to the crash. Well, it doesn't automatically show what has happened, but you can put print like statements anywhere in any context and the latest will be dumped on command or a NMI/panic/oops or whatever. Once relayfs is added, we need to make a buffer that can be written to from multiple CPUS. I understand that Tom got complaints that the buffers were not orignally lockless, and different CPUs would have their own buffers. But this really hurts trying to debug race conditions on SMP machines, since you don't get the interleaved output of what's going on. God I need to get KCSP working, and not worry about race conditions anymore! :-) -- Steve - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Merging relayfs?
Tom Zanussi wrote: > Andrew Morton writes: > > Tom Zanussi <[EMAIL PROTECTED]> wrote: > > > > > > Hi Andrew, can you please merge relayfs? > > > > I guess so. Would you have time to prepare a list of existing and planned > > applications? > > I've also added a couple of people to the cc: list that I've consulted > with in getting their applications to use relayfs, one of which is the > logdev debugging device recently posted to LKML. I'm using relayfs during my development work to log the current TCP stack parameters and timing information. There is no reason that I can see to merge this into the kernel, but it's very useful for my development work. I'd like to see relayfs merged. Baruch - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Merging relayfs?
On Tue, Jul 12, 2005 at 12:40:41AM -0400, Karim Yaghmour wrote: > > Greg KH wrote: > > The path/filename dictates how it is used, so putting relayfs type files > > in debugfs is just fine. debugfs allows any types of files to be there. > ... > > New trees in / are not LSB compliant, hence the reason for writing > > securityfs to get rid of /selinux and other LSM filesystems that were > > starting to sprout up. > ... > > But that's exactly what debugfs is for, to allow data to be dumped out > > of the kernel for different usages. > ... > > Ok, have a better name for it? It's simple and easy to understand. > > It also carries with it the stigma of "kernel debugging", which I just > don't see production system maintainers liking very much. But they like the name "dtrace" instead? (sorry, couldn't resist...) Come on, they will never see the name "debugfs", right? Your tools will then have a common place to look for your ltt and other files, as you _know_ where it will be mounted in the fs namespace. And you _are_ doing kernel debugging and tracing with ltt, what's wrong with admitting that? > So tell you what, how about if we merged what's in debugfs into relayfs > instead? We'll still end up with one filesystem, but we'll have a more > inocuous name. After all, if debugfs is indeed for dumping data from the > kernel to user-space for different usages, then relaying is what it's > actually doing, right? Sorry, but debugfs was there first, and people are already using it in the kernel tree :) Anyway, good luck trying to get the distros to accept yet-another-fs-to-mount-somewhere, I know it was hard to get support for sysfs as it was... greg k-h - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Merging relayfs?
Greg KH wrote: > The path/filename dictates how it is used, so putting relayfs type files > in debugfs is just fine. debugfs allows any types of files to be there. ... > New trees in / are not LSB compliant, hence the reason for writing > securityfs to get rid of /selinux and other LSM filesystems that were > starting to sprout up. ... > But that's exactly what debugfs is for, to allow data to be dumped out > of the kernel for different usages. ... > Ok, have a better name for it? It's simple and easy to understand. It also carries with it the stigma of "kernel debugging", which I just don't see production system maintainers liking very much. So tell you what, how about if we merged what's in debugfs into relayfs instead? We'll still end up with one filesystem, but we'll have a more inocuous name. After all, if debugfs is indeed for dumping data from the kernel to user-space for different usages, then relaying is what it's actually doing, right? Karim -- Author, Speaker, Developer, Consultant Pushing Embedded and Real-Time Linux Systems Beyond the Limits http://www.opersys.com || [EMAIL PROTECTED] || 1-866-677-4546 - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Merging relayfs?
Andrew Morton <[EMAIL PROTECTED]> writes: > Christoph Hellwig <[EMAIL PROTECTED]> wrote: > > > > On Mon, Jul 11, 2005 at 08:10:42PM -0500, Tom Zanussi wrote: > > > > > > Hi Andrew, can you please merge relayfs? It provides a low-overhead > > > logging and buffering capability, which does not currently exist in > > > the kernel. > > > > While the code is pretty nicely in shape it seems rather pointless to > > merge until an actual user goes with it. > > Ordinarily I'd agree. But this is a bit like kprobes - it's a funny thing > which other kernel features rely upon, but those features are often ad-hoc > and aren't intended for merging. Yes, it's a special case because it's useful for custom debugging hacks. I would be in favour of merging it. -Andi - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Merging relayfs?
On Mon, Jul 11, 2005 at 10:55:33PM -0500, Tom Zanussi wrote: > Greg KH writes: > > On Mon, Jul 11, 2005 at 11:03:59PM -0400, Karim Yaghmour wrote: > > > > > > Greg KH wrote: > > > > What ever happened to exporting the relayfs file ops, and just using > > > > debugfs as your controlling fs instead? As all of the possible users > > > > fall under the "debug" type of kernel feature, it makes more sense to > > > > confine users to that fs, right? > > > > > > Actually, like we discussed the last time this surfaced, there are far > > > more users for relayfs than just debugging. > > > > Based on the proposed users of this fs, I don't see any. What ones are > > you saying are not "debug" type operations? And yes, I consider LTT a > > "debug" type operation :) > > > > The best part of this, is it gives distros and users a consistant place > > to mount the fs, and to know where this kind of thing shows up in the fs > > namespace. > > Makes sense, and I don't see a problem with getting rid of the fs part > of relayfs and letting debugfs take over that role, if debugfs were > there for all potential users. It doesn't sound like it would satisfy > users like LTT and systemtap though, who expect to be available at all > times even on production systems, which wouldn't be the case unless > the distros always shipped with debugfs enabled. They will, the overhead of adding debugfs support is _very_ tiny, only: $ size fs/debugfs/built-in.o textdata bss dec hex filename 2257 788 83053 bed fs/debugfs/built-in.o So I do not see why you should not just drop your fs part. > > > What we settled on was having relayfs export its file ops so that > > > indeed debugfs users could use it to log things in conjunction with > > > debugfs. > > > > Last I looked, this was not possible. Has this changed in the latest > > version? > > The file operations are all exported, but I haven't actually tried to > use relayfs files in debugfs. Is there something more needed? Shouldn't be. Try it to make sure though :) thanks, greg k-h - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Merging relayfs?
On Mon, Jul 11, 2005 at 11:52:57PM -0400, Karim Yaghmour wrote: > > Greg KH wrote: > > Based on the proposed users of this fs, I don't see any. What ones are > > you saying are not "debug" type operations? And yes, I consider LTT a > > "debug" type operation :) > > > > The best part of this, is it gives distros and users a consistant place > > to mount the fs, and to know where this kind of thing shows up in the fs > > namespace. > > Except that relayfs contains files that all behave in a very specific > way: as relayfs buffers, while debugfs may contain a variety of different > types of files. The path/filename dictates how it is used, so putting relayfs type files in debugfs is just fine. debugfs allows any types of files to be there. > I kind'a see what you're trying to say, and I fully understand that some > debugfs users may indeed use the relayfs fileops to add an entry in > debugfs which serves as a buffer, and that's the very reason we exported > them to boot. Good. > But there's something to be said about having a single filesystem (and > therefore tree somewhere in /) New trees in / are not LSB compliant, hence the reason for writing securityfs to get rid of /selinux and other LSM filesystems that were starting to sprout up. > which contains entries dedicated to a single purpose: dump huge > amounts of data out of the kernel and into userspace whether or not > the system is being debuged. But that's exactly what debugfs is for, to allow data to be dumped out of the kernel for different usages. > From a user point of view, it sounds awfully weird if they're using > "debugfs" on a production system ... Ok, have a better name for it? It's simple and easy to understand. > > Last I looked, this was not possible. Has this changed in the latest > > version? > > Here's from 2.6.13-rc2-mm1 fs/relayfs/inode.c > > +EXPORT_SYMBOL_GPL(relayfs_open); > > +EXPORT_SYMBOL_GPL(relayfs_poll); > > +EXPORT_SYMBOL_GPL(relayfs_mmap); > > +EXPORT_SYMBOL_GPL(relayfs_release); > > +EXPORT_SYMBOL_GPL(relayfs_file_operations); > > +EXPORT_SYMBOL_GPL(relayfs_create_dir); > > +EXPORT_SYMBOL_GPL(relayfs_remove_dir); > > It's been there ever since you've asked for it earlier this year :) Thanks, didn't realize that. Wait, all that should be needed is "relayfs_file_operations", right? Why have those others exported? thanks, greg k-h - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Merging relayfs?
Greg KH wrote: > Based on the proposed users of this fs, I don't see any. What ones are > you saying are not "debug" type operations? And yes, I consider LTT a > "debug" type operation :) > > The best part of this, is it gives distros and users a consistant place > to mount the fs, and to know where this kind of thing shows up in the fs > namespace. Except that relayfs contains files that all behave in a very specific way: as relayfs buffers, while debugfs may contain a variety of different types of files. I kind'a see what you're trying to say, and I fully understand that some debugfs users may indeed use the relayfs fileops to add an entry in debugfs which serves as a buffer, and that's the very reason we exported them to boot. But there's something to be said about having a single filesystem (and therefore tree somewhere in /) which contains entries dedicated to a single purpose: dump huge amounts of data out of the kernel and into userspace whether or not the system is being debuged. >From a user point of view, it sounds awfully weird if they're using "debugfs" on a production system ... > Last I looked, this was not possible. Has this changed in the latest > version? Here's from 2.6.13-rc2-mm1 fs/relayfs/inode.c > +EXPORT_SYMBOL_GPL(relayfs_open); > +EXPORT_SYMBOL_GPL(relayfs_poll); > +EXPORT_SYMBOL_GPL(relayfs_mmap); > +EXPORT_SYMBOL_GPL(relayfs_release); > +EXPORT_SYMBOL_GPL(relayfs_file_operations); > +EXPORT_SYMBOL_GPL(relayfs_create_dir); > +EXPORT_SYMBOL_GPL(relayfs_remove_dir); It's been there ever since you've asked for it earlier this year :) Karim -- Author, Speaker, Developer, Consultant Pushing Embedded and Real-Time Linux Systems Beyond the Limits http://www.opersys.com || [EMAIL PROTECTED] || 1-866-677-4546 - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Merging relayfs?
Greg KH writes: > On Mon, Jul 11, 2005 at 11:03:59PM -0400, Karim Yaghmour wrote: > > > > Greg KH wrote: > > > What ever happened to exporting the relayfs file ops, and just using > > > debugfs as your controlling fs instead? As all of the possible users > > > fall under the "debug" type of kernel feature, it makes more sense to > > > confine users to that fs, right? > > > > Actually, like we discussed the last time this surfaced, there are far > > more users for relayfs than just debugging. > > Based on the proposed users of this fs, I don't see any. What ones are > you saying are not "debug" type operations? And yes, I consider LTT a > "debug" type operation :) > > The best part of this, is it gives distros and users a consistant place > to mount the fs, and to know where this kind of thing shows up in the fs > namespace. Makes sense, and I don't see a problem with getting rid of the fs part of relayfs and letting debugfs take over that role, if debugfs were there for all potential users. It doesn't sound like it would satisfy users like LTT and systemtap though, who expect to be available at all times even on production systems, which wouldn't be the case unless the distros always shipped with debugfs enabled. > > > What we settled on was having relayfs export its file ops so that > > indeed debugfs users could use it to log things in conjunction with > > debugfs. > > Last I looked, this was not possible. Has this changed in the latest > version? The file operations are all exported, but I haven't actually tried to use relayfs files in debugfs. Is there something more needed? Tom - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Merging relayfs?
On Mon, Jul 11, 2005 at 11:03:59PM -0400, Karim Yaghmour wrote: > > Greg KH wrote: > > What ever happened to exporting the relayfs file ops, and just using > > debugfs as your controlling fs instead? As all of the possible users > > fall under the "debug" type of kernel feature, it makes more sense to > > confine users to that fs, right? > > Actually, like we discussed the last time this surfaced, there are far > more users for relayfs than just debugging. Based on the proposed users of this fs, I don't see any. What ones are you saying are not "debug" type operations? And yes, I consider LTT a "debug" type operation :) The best part of this, is it gives distros and users a consistant place to mount the fs, and to know where this kind of thing shows up in the fs namespace. > What we settled on was having relayfs export its file ops so that > indeed debugfs users could use it to log things in conjunction with > debugfs. Last I looked, this was not possible. Has this changed in the latest version? thanks, greg k-h - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Merging relayfs?
Greg KH wrote: > What ever happened to exporting the relayfs file ops, and just using > debugfs as your controlling fs instead? As all of the possible users > fall under the "debug" type of kernel feature, it makes more sense to > confine users to that fs, right? Actually, like we discussed the last time this surfaced, there are far more users for relayfs than just debugging. What we settled on was having relayfs export its file ops so that indeed debugfs users could use it to log things in conjunction with debugfs. Karim -- Author, Speaker, Developer, Consultant Pushing Embedded and Real-Time Linux Systems Beyond the Limits http://www.opersys.com || [EMAIL PROTECTED] || 1-866-677-4546 - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Merging relayfs?
Andrew Morton wrote: > Still, first let us get a handle on who wants relayfs now and in the future > and for what. Then we can better decide. We used relayfs for our series of tests on PREEMPT_RT and I-Pipe. Specifically, we used relayfs buffers to store the timestamps for our interrupt latency measurements. This allowed us to easily have access to very large buffering areas without having to worry about any form of detailed resource allocation, or runtime overhead of logging. IOW, it allowed us to concentrate on our main priority: log a very large amount of timestamps. On the LTT side, relayfs is bound to be at the center of whatever architecture we settle on for the ongoing rewrite. For having used it for past releases of LTT, we know that it can handle very heavy data throughput with little overhead using a relatively simple API. Karim -- Author, Speaker, Developer, Consultant Pushing Embedded and Real-Time Linux Systems Beyond the Limits http://www.opersys.com || [EMAIL PROTECTED] || 1-866-677-4546 - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Merging relayfs?
On Mon, Jul 11, 2005 at 08:10:42PM -0500, Tom Zanussi wrote: > > Hi Andrew, can you please merge relayfs? It provides a low-overhead > logging and buffering capability, which does not currently exist in > the kernel. > > relayfs key features: > > - Extremely efficient high-speed logging/buffering > - Simple mechanism for user-space data retrieval > - Very short write path > - Can be used in any context, including interrupt context > - No runtime resource allocation > - Doesn't do a kmalloc for each "packet" > - No need for end-recipient > - Data may remain buffered whether it is consumed or not > - Data committed to disk in bulk, not per "packet" > - Can be used in circular-buffer mode for flight-recording What ever happened to exporting the relayfs file ops, and just using debugfs as your controlling fs instead? As all of the possible users fall under the "debug" type of kernel feature, it makes more sense to confine users to that fs, right? thanks, greg k-h - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Merging relayfs?
Christoph Hellwig <[EMAIL PROTECTED]> wrote: > > On Mon, Jul 11, 2005 at 08:10:42PM -0500, Tom Zanussi wrote: > > > > Hi Andrew, can you please merge relayfs? It provides a low-overhead > > logging and buffering capability, which does not currently exist in > > the kernel. > > While the code is pretty nicely in shape it seems rather pointless to > merge until an actual user goes with it. Ordinarily I'd agree. But this is a bit like kprobes - it's a funny thing which other kernel features rely upon, but those features are often ad-hoc and aren't intended for merging. relayfs is more for in-kernel "applications" than for userspace ones, if you like. Still, first let us get a handle on who wants relayfs now and in the future and for what. Then we can better decide. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Merging relayfs?
On Mon, Jul 11, 2005 at 08:10:42PM -0500, Tom Zanussi wrote: > > Hi Andrew, can you please merge relayfs? It provides a low-overhead > logging and buffering capability, which does not currently exist in > the kernel. While the code is pretty nicely in shape it seems rather pointless to merge until an actual user goes with it. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Merging relayfs?
Andrew Morton writes: > Tom Zanussi <[EMAIL PROTECTED]> wrote: > > > > Hi Andrew, can you please merge relayfs? > > I guess so. Would you have time to prepare a list of existing and planned > applications? Sure. I know that systemtap (http://sourceware.org/systemtap/) is using relayfs and that LTT (http://www.opersys.com/ltt/index.html) is also currently being reworked to use it. I've also added a couple of people to the cc: list that I've consulted with in getting their applications to use relayfs, one of which is the logdev debugging device recently posted to LKML. I also know that there are still users of the old relayfs around; I don't however know what their plans are regarding moving to the new relayfs. My own personal interest is to start playing around with creating some visualization tools using data gathered from relayfs. Hopefully, I'll have more time to do that if relayfs gets merged. ;-) Hope that helps, Tom - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Merging relayfs?
> > > > Hi Andrew, can you please merge relayfs? > > I guess so. Would you have time to prepare a list of existing and planned > applications? I have a plan to use it for something that no-one knows about yet.. I was going to use it for doing a DRM packet debug logger... to try and trace hangs in the system, using printk doesn't really help as guess what it slows the machine down so much that your races don't happen... I wrote some basic code for this already.. and I'm hoping to use some work time to get it finished at some stage... Dave. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Merging relayfs?
Tom Zanussi <[EMAIL PROTECTED]> wrote: > > Hi Andrew, can you please merge relayfs? I guess so. Would you have time to prepare a list of existing and planned applications? - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/