Re: [9fans] PDP11 (Was: Re: what heavy negativity!)
Il giorno dom 14 ott 2018 alle ore 19:39 Ole-Hjalmar Kristensen ha scritto: > > OK, that makes sense. So it would not stop a client from for example first > read an index block in a B-tree, wait for the result, and then issue read > operations for all the data blocks in parallel. If the client is the kernel that's true. If the client is directly speaking 9P that's true again. But if the client is a userspace program using pread/pwrite that wouldn't work unless it fork a new process per each read as the syscalls blocks. Which is what fcp does, actually: https://github.com/brho/plan9/blob/master/sys/src/cmd/fcp.c Giacomo
Re: [9fans] PDP11 (Was: Re: what heavy negativity!)
also read what has been written before about fcp. and read the source of fcp. On 10/14/18, Ole-Hjalmar Kristensen wrote: > OK, that makes sense. So it would not stop a client from for example first > read an index block in a B-tree, wait for the result, and then issue read > operations for all the data blocks in parallel. That's exactly the same as > any asynchronous disk subsystem I am acquainted with. Reordering is the > norm. > > On Sun, Oct 14, 2018 at 1:21 PM hiro <23h...@gmail.com> wrote: > >> there's no tyranny involved. >> >> a client that is fine with the *responses* coming in reordered could >> remember the tag obviously and do whatever you imagine. >> >> the problem is potential reordering of the messages in the kernel >> before responding, even if the 9p transport has guaranteed ordering. >> >> On 10/14/18, Ole-Hjalmar Kristensen >> wrote: >> > I'm not going to argue with someone who has got his hands dirty by >> actually >> > doing this but I don't really get this about the tyranny of 9p. Isn't >> > the >> > point of the tag field to identify the request? What is stopping the >> client >> > from issuing multiple requests and match the replies based on the tag? >> From >> > the manual: >> > >> > Each T-message has a tag field, chosen and used by the >> > client to identify the message. The reply to the message >> > will have the same tag. Clients must arrange that no two >> > outstanding messages on the same connection have the same >> > tag. An exception is the tag NOTAG, defined as (ushort)~0 >> > in : the client can use it, when establishing a >> > connection, to override tag matching in version messages. >> > >> > >> > >> > Den ons. 10. okt. 2018, 23.56 skrev Steven Stallion >> > > >: >> > >> >> As the guy who wrote the majority of the code that pushed those 1M 4K >> >> random IOPS erik mentioned, this thread annoys the shit out of me. You >> >> don't get an award for writing a driver. In fact, it's probably better >> >> not to be known at all considering the bloody murder one has to commit >> >> to marry hardware and software together. >> >> >> >> Let's be frank, the I/O handling in the kernel is anachronistic. To >> >> hit those rates, I had to add support for asynchronous and vectored >> >> I/O not to mention a sizable bit of work by a co-worker to properly >> >> handle NUMA on our appliances to hit those speeds. As I recall, we had >> >> to rewrite the scheduler and re-implement locking, which even Charles >> >> Forsyth had a hand in. Had we the time and resources to implement >> >> something like zero-copy we'd have done it in a heartbeat. >> >> >> >> In the end, it doesn't matter how "fast" a storage driver is in Plan 9 >> >> - as soon as you put a 9P-based filesystem on it, it's going to be >> >> limited to a single outstanding operation. This is the tyranny of 9P. >> >> We (Coraid) got around this by avoiding filesystems altogether. >> >> >> >> Go solve that problem first. >> >> On Wed, Oct 10, 2018 at 12:36 PM wrote: >> >> > >> >> > > But the reason I want this is to reduce latency to the first >> >> > > access, especially for very large files. With read() I have >> >> > > to wait until the read completes. With mmap() processing can >> >> > > start much earlier and can be interleaved with background >> >> > > data fetch or prefetch. With read() a lot more resources >> >> > > are tied down. If I need random access and don't need to >> >> > > read all of the data, the application has to do pread(), >> >> > > pwrite() a lot thus complicating it. With mmap() I can just >> >> > > map in the whole file and excess reading (beyond what the >> >> > > app needs) will not be a large fraction. >> >> > >> >> > you think doing single 4K page sized reads in the pagefault >> >> > handler is better than doing precise >4K reads from your >> >> > application? possibly in a background thread so you can >> >> > overlap processing with data fetching? >> >> > >> >> > the advantage of mmap is not prefetch. its about not to do >> >> > any I/O when data is already in the *SHARED* buffer cache! >> >> > which plan9 does not have (except the mntcache, but that is >> >> > optional and only works for the disk fileservers that maintain >> >> > ther file qid ver info consistently). its *IS* really a linux >> >> > thing where all block device i/o goes thru the buffer cache. >> >> > >> >> > -- >> >> > cinap >> >> > >> >> >> >> >> > >> >> >
Re: [9fans] PDP11 (Was: Re: what heavy negativity!)
OK, that makes sense. So it would not stop a client from for example first read an index block in a B-tree, wait for the result, and then issue read operations for all the data blocks in parallel. That's exactly the same as any asynchronous disk subsystem I am acquainted with. Reordering is the norm. On Sun, Oct 14, 2018 at 1:21 PM hiro <23h...@gmail.com> wrote: > there's no tyranny involved. > > a client that is fine with the *responses* coming in reordered could > remember the tag obviously and do whatever you imagine. > > the problem is potential reordering of the messages in the kernel > before responding, even if the 9p transport has guaranteed ordering. > > On 10/14/18, Ole-Hjalmar Kristensen > wrote: > > I'm not going to argue with someone who has got his hands dirty by > actually > > doing this but I don't really get this about the tyranny of 9p. Isn't the > > point of the tag field to identify the request? What is stopping the > client > > from issuing multiple requests and match the replies based on the tag? > From > > the manual: > > > > Each T-message has a tag field, chosen and used by the > > client to identify the message. The reply to the message > > will have the same tag. Clients must arrange that no two > > outstanding messages on the same connection have the same > > tag. An exception is the tag NOTAG, defined as (ushort)~0 > > in : the client can use it, when establishing a > > connection, to override tag matching in version messages. > > > > > > > > Den ons. 10. okt. 2018, 23.56 skrev Steven Stallion >: > > > >> As the guy who wrote the majority of the code that pushed those 1M 4K > >> random IOPS erik mentioned, this thread annoys the shit out of me. You > >> don't get an award for writing a driver. In fact, it's probably better > >> not to be known at all considering the bloody murder one has to commit > >> to marry hardware and software together. > >> > >> Let's be frank, the I/O handling in the kernel is anachronistic. To > >> hit those rates, I had to add support for asynchronous and vectored > >> I/O not to mention a sizable bit of work by a co-worker to properly > >> handle NUMA on our appliances to hit those speeds. As I recall, we had > >> to rewrite the scheduler and re-implement locking, which even Charles > >> Forsyth had a hand in. Had we the time and resources to implement > >> something like zero-copy we'd have done it in a heartbeat. > >> > >> In the end, it doesn't matter how "fast" a storage driver is in Plan 9 > >> - as soon as you put a 9P-based filesystem on it, it's going to be > >> limited to a single outstanding operation. This is the tyranny of 9P. > >> We (Coraid) got around this by avoiding filesystems altogether. > >> > >> Go solve that problem first. > >> On Wed, Oct 10, 2018 at 12:36 PM wrote: > >> > > >> > > But the reason I want this is to reduce latency to the first > >> > > access, especially for very large files. With read() I have > >> > > to wait until the read completes. With mmap() processing can > >> > > start much earlier and can be interleaved with background > >> > > data fetch or prefetch. With read() a lot more resources > >> > > are tied down. If I need random access and don't need to > >> > > read all of the data, the application has to do pread(), > >> > > pwrite() a lot thus complicating it. With mmap() I can just > >> > > map in the whole file and excess reading (beyond what the > >> > > app needs) will not be a large fraction. > >> > > >> > you think doing single 4K page sized reads in the pagefault > >> > handler is better than doing precise >4K reads from your > >> > application? possibly in a background thread so you can > >> > overlap processing with data fetching? > >> > > >> > the advantage of mmap is not prefetch. its about not to do > >> > any I/O when data is already in the *SHARED* buffer cache! > >> > which plan9 does not have (except the mntcache, but that is > >> > optional and only works for the disk fileservers that maintain > >> > ther file qid ver info consistently). its *IS* really a linux > >> > thing where all block device i/o goes thru the buffer cache. > >> > > >> > -- > >> > cinap > >> > > >> > >> > > > >
Re: [9fans] PDP11 (Was: Re: what heavy negativity!)
there's no tyranny involved. a client that is fine with the *responses* coming in reordered could remember the tag obviously and do whatever you imagine. the problem is potential reordering of the messages in the kernel before responding, even if the 9p transport has guaranteed ordering. On 10/14/18, Ole-Hjalmar Kristensen wrote: > I'm not going to argue with someone who has got his hands dirty by actually > doing this but I don't really get this about the tyranny of 9p. Isn't the > point of the tag field to identify the request? What is stopping the client > from issuing multiple requests and match the replies based on the tag? From > the manual: > > Each T-message has a tag field, chosen and used by the > client to identify the message. The reply to the message > will have the same tag. Clients must arrange that no two > outstanding messages on the same connection have the same > tag. An exception is the tag NOTAG, defined as (ushort)~0 > in : the client can use it, when establishing a > connection, to override tag matching in version messages. > > > > Den ons. 10. okt. 2018, 23.56 skrev Steven Stallion : > >> As the guy who wrote the majority of the code that pushed those 1M 4K >> random IOPS erik mentioned, this thread annoys the shit out of me. You >> don't get an award for writing a driver. In fact, it's probably better >> not to be known at all considering the bloody murder one has to commit >> to marry hardware and software together. >> >> Let's be frank, the I/O handling in the kernel is anachronistic. To >> hit those rates, I had to add support for asynchronous and vectored >> I/O not to mention a sizable bit of work by a co-worker to properly >> handle NUMA on our appliances to hit those speeds. As I recall, we had >> to rewrite the scheduler and re-implement locking, which even Charles >> Forsyth had a hand in. Had we the time and resources to implement >> something like zero-copy we'd have done it in a heartbeat. >> >> In the end, it doesn't matter how "fast" a storage driver is in Plan 9 >> - as soon as you put a 9P-based filesystem on it, it's going to be >> limited to a single outstanding operation. This is the tyranny of 9P. >> We (Coraid) got around this by avoiding filesystems altogether. >> >> Go solve that problem first. >> On Wed, Oct 10, 2018 at 12:36 PM wrote: >> > >> > > But the reason I want this is to reduce latency to the first >> > > access, especially for very large files. With read() I have >> > > to wait until the read completes. With mmap() processing can >> > > start much earlier and can be interleaved with background >> > > data fetch or prefetch. With read() a lot more resources >> > > are tied down. If I need random access and don't need to >> > > read all of the data, the application has to do pread(), >> > > pwrite() a lot thus complicating it. With mmap() I can just >> > > map in the whole file and excess reading (beyond what the >> > > app needs) will not be a large fraction. >> > >> > you think doing single 4K page sized reads in the pagefault >> > handler is better than doing precise >4K reads from your >> > application? possibly in a background thread so you can >> > overlap processing with data fetching? >> > >> > the advantage of mmap is not prefetch. its about not to do >> > any I/O when data is already in the *SHARED* buffer cache! >> > which plan9 does not have (except the mntcache, but that is >> > optional and only works for the disk fileservers that maintain >> > ther file qid ver info consistently). its *IS* really a linux >> > thing where all block device i/o goes thru the buffer cache. >> > >> > -- >> > cinap >> > >> >> >
Re: [9fans] PDP11 (Was: Re: what heavy negativity!)
I'm not going to argue with someone who has got his hands dirty by actually doing this but I don't really get this about the tyranny of 9p. Isn't the point of the tag field to identify the request? What is stopping the client from issuing multiple requests and match the replies based on the tag? From the manual: Each T-message has a tag field, chosen and used by the client to identify the message. The reply to the message will have the same tag. Clients must arrange that no two outstanding messages on the same connection have the same tag. An exception is the tag NOTAG, defined as (ushort)~0 in : the client can use it, when establishing a connection, to override tag matching in version messages. Den ons. 10. okt. 2018, 23.56 skrev Steven Stallion : > As the guy who wrote the majority of the code that pushed those 1M 4K > random IOPS erik mentioned, this thread annoys the shit out of me. You > don't get an award for writing a driver. In fact, it's probably better > not to be known at all considering the bloody murder one has to commit > to marry hardware and software together. > > Let's be frank, the I/O handling in the kernel is anachronistic. To > hit those rates, I had to add support for asynchronous and vectored > I/O not to mention a sizable bit of work by a co-worker to properly > handle NUMA on our appliances to hit those speeds. As I recall, we had > to rewrite the scheduler and re-implement locking, which even Charles > Forsyth had a hand in. Had we the time and resources to implement > something like zero-copy we'd have done it in a heartbeat. > > In the end, it doesn't matter how "fast" a storage driver is in Plan 9 > - as soon as you put a 9P-based filesystem on it, it's going to be > limited to a single outstanding operation. This is the tyranny of 9P. > We (Coraid) got around this by avoiding filesystems altogether. > > Go solve that problem first. > On Wed, Oct 10, 2018 at 12:36 PM wrote: > > > > > But the reason I want this is to reduce latency to the first > > > access, especially for very large files. With read() I have > > > to wait until the read completes. With mmap() processing can > > > start much earlier and can be interleaved with background > > > data fetch or prefetch. With read() a lot more resources > > > are tied down. If I need random access and don't need to > > > read all of the data, the application has to do pread(), > > > pwrite() a lot thus complicating it. With mmap() I can just > > > map in the whole file and excess reading (beyond what the > > > app needs) will not be a large fraction. > > > > you think doing single 4K page sized reads in the pagefault > > handler is better than doing precise >4K reads from your > > application? possibly in a background thread so you can > > overlap processing with data fetching? > > > > the advantage of mmap is not prefetch. its about not to do > > any I/O when data is already in the *SHARED* buffer cache! > > which plan9 does not have (except the mntcache, but that is > > optional and only works for the disk fileservers that maintain > > ther file qid ver info consistently). its *IS* really a linux > > thing where all block device i/o goes thru the buffer cache. > > > > -- > > cinap > > > >
Re: [9fans] PDP11 (Was: Re: what heavy negativity!)
Digby R.S. Tarvin writes: > Oh yes, I read Eldon Halls book on that quite a few years ago. Meetings > held to discuss competing potential uses for a word of memory that had > become free. > That one would be a challenging Plan9 port.. And yet Plan9 was not there to save the day. Such a pity.
Re: [9fans] PDP11 (Was: Re: what heavy negativity!)
Oh yes, I read Eldon Halls book on that quite a few years ago. Meetings held to discuss competing potential uses for a word of memory that had become free. That one would be a challenging Plan9 port.. On Fri, 12 Oct 2018 at 05:13, Lyndon Nerenberg wrote: > Digby R.S. Tarvin writes: > > > Agreed, but the PDP11/70 was not constrained to 64KB memory either. > > > I do recall the MS-DOS small/large/medium etc models that used the > > segmentation in various ways to mitigate the limitations of being a 16 > bit > > computer. Similar techniques were possible on the PDP11, for example > > Coincidental to this conversation, I'm currently reading "The Apollo > Guidance Computer: Architecture and Operation" by _Framk O'Brien_. > (ISBN 978-1-4419-0876-6) Very interesting to see what you can do with > a 15 bit architecture when sufficiently motivated. > > --lyndon > >
Re: [9fans] PDP11 (Was: Re: what heavy negativity!)
hiro writes: > don't you need sending ability, too for AIS? No, a receive-only setup is very useful on a small boat. Where I would like to go with this is to take the decoded AIS data as input for "ARPA" style collision plots. I'm interested in the big boats sailing through the straight. They can't turn fast, and rarely change course. If I can derive their intentions, I can plot a path between them that requires the least amount of tacking. The big boats, in turn, have no interest in us little critters. They actively filter out the "class B" (I think that's the term) noise that are AIS transmissions from the small craft. Even if we hit them, we can't sink them, so they don't care about us. Therefore there is no incentive for small boats to transmit AIS. Unless you're trying to locate your buddies for a tie-up somewhere. (That can be a very valid reason for transmitting!) --lyndon
Re: [9fans] PDP11 (Was: Re: what heavy negativity!)
>> I assumed you were using an RTL2832U (rtlsdr library). > > I'm pretty sure they all do, under the hood. > > don't you need sending ability, too for AIS?
Re: [9fans] PDP11 (Was: Re: what heavy negativity!)
> need to prove it can be done with the usual > suspects (GNU radio, on the Pi -- the native fft libraries seem fast > enought to make this viable). be assured i've demodulated 25khz signals in real-time and it's a walk in the park, as long as your revision has the neon stuff i mentioned, otherwise the fft becomes bottleneck.
Re: [9fans] PDP11 (Was: Re: what heavy negativity!)
Skip Tavakkolian writes: > I assumed you were using an RTL2832U (rtlsdr library). I'm pretty sure they all do, under the hood.
Re: [9fans] PDP11 (Was: Re: what heavy negativity!)
I assumed you were using an RTL2832U (rtlsdr library). On Thu, Oct 11, 2018, 12:40 PM Lyndon Nerenberg wrote: > > I was able to use dump1090 (same author as redis) to get ADSB data > reliably > > on RPi/Linux a while back. > > I have a pair of Flightbox ADS-B receivers I am using as references. > While mostly reliable, they can and do stutter along with the rest > of the alternatives on occasion. > > --lyndon > >
Re: [9fans] PDP11 (Was: Re: what heavy negativity!)
> I was able to use dump1090 (same author as redis) to get ADSB data reliably > on RPi/Linux a while back. I have a pair of Flightbox ADS-B receivers I am using as references. While mostly reliable, they can and do stutter along with the rest of the alternatives on occasion. --lyndon
Re: [9fans] PDP11 (Was: Re: what heavy negativity!)
hiro writes: > But given the alternatives available back then, even the armv5 in the > kirkwood, which was cheaper even before the rpi became popular, did > the same job more stably, which is why i would never actually > recommend the pi. And there are even more alternatives now. I get that. But the actual hardware driving this conversation isn't particularly relevant,, and devolving to a hardware bikeshed isn't helpful. (Not picking on you specifically.) > Are you doing the AIS demodulation on plan9 on rpi? It would be a > great showcase. Wish I had been given the opportunity to find an > excuse to build something like that on plan9 instead :) Not yet. First I need to prove it can be done with the usual suspects (GNU radio, on the Pi -- the native fft libraries seem fast enought to make this viable). If the pessimized case works, then porting the code from the GNU radio python modules to C is a mechanical process for the most part. This week I am ENOTIME with getting the boat tarped up in preparation for the winter monsoon season :-P. --lyndon
Re: [9fans] PDP11 (Was: Re: what heavy negativity!)
I was able to use dump1090 (same author as redis) to get ADSB data reliably on RPi/Linux a while back. On Thu, Oct 11, 2018, 10:54 AM Lyndon Nerenberg wrote: > > I have been able to copy 1 GiB/s to userspace from an nvme device. I > should > > think a radio should be no problem. > > The problem is when you have multiple decoder blocks implemented > as individual processes (i.e. the GNU radio model). Once you have > everything debugged, you can put it into a single threaded process > and eliminate the copy overhead. But it's completely impractical > to prototype or debug real applications this way. And it's the > prototyping case I'm interested in here. > > So I'm *curious* to know if page flipping a 'protocol buffer' like > object between processes provides an optimization over copying > through the kernel. Not so much for the speed aspect, but to free > up CPU cycles that can be devoted to actual SDR work. > > Since when did curiosity become a capital crime? Oh, wait, that > was January 20, 2017. My bad. > > --lyndon > >
Re: [9fans] PDP11 (Was: Re: what heavy negativity!)
We also have CPU extensions that can help make fast FFT, because it's such a generic problem, and in the worst case you can use fpgas, asics, in any case dedicated hardware.
Re: [9fans] PDP11 (Was: Re: what heavy negativity!)
i meant without having to resort to some soft fp. On 10/11/18, hiro <23h...@gmail.com> wrote: >> through the kernel. Not so much for the speed aspect, but to free >> up CPU cycles that can be devoted to actual SDR work. > > those 2x25kHz channels would hardly need many cycles. rather it's just > a matter of selecting the right CPU that can actually do the FFT with > some software floating point implementation :) > > i don't see memory bandwidth or even random memory access latency > affecting this scenario in the slightest. >
Re: [9fans] PDP11 (Was: Re: what heavy negativity!)
> through the kernel. Not so much for the speed aspect, but to free > up CPU cycles that can be devoted to actual SDR work. those 2x25kHz channels would hardly need many cycles. rather it's just a matter of selecting the right CPU that can actually do the FFT with some software floating point implementation :) i don't see memory bandwidth or even random memory access latency affecting this scenario in the slightest.
Re: [9fans] PDP11 (Was: Re: what heavy negativity!)
> One example is for an AIS transceiver on a boat. By putting the > radio and decoder at the top of the mast, the backhaul can be a > cat-3 twisted pair cable, rather than a much heavier coax run from > the antenna at the top of the mast to the receiver below decks. Yeah, I've been sending 3Mbit I/Q samples over ethernet to a more beefy computer. For non-technical crowds I described the rpi as a passable USB->ethernet gateway for SDR tasks in that bandwidth. But given the alternatives available back then, even the armv5 in the kirkwood, which was cheaper even before the rpi became popular, did the same job more stably, which is why i would never actually recommend the pi. And there are even more alternatives now. Even the rpi itself is proof that better alternatives exist (as they did even back then when the first one out), because the newer rpi revision (i think) has finally gained neon cpu extensions, which surprisingly have been supported by gnuradio long before this, and a reason why my bachelor thesis back then was an easy success :) In general all limits that occured to me on the rpi were due to stability (usb power and compatibility issues), but more concretely for our discussion: lack of cpu power, mainly for the FFT. There were no throughput, delay or memory copy bottlenecks for me. This was using linux, because my mouse didn't work on the old rpi plan9 image and sadly there was a time-limit... Are you doing the AIS demodulation on plan9 on rpi? It would be a great showcase. Wish I had been given the opportunity to find an excuse to build something like that on plan9 instead :)
Re: [9fans] PDP11 (Was: Re: what heavy negativity!)
Digby R.S. Tarvin writes: > Agreed, but the PDP11/70 was not constrained to 64KB memory either. > I do recall the MS-DOS small/large/medium etc models that used the > segmentation in various ways to mitigate the limitations of being a 16 bit > computer. Similar techniques were possible on the PDP11, for example Coincidental to this conversation, I'm currently reading "The Apollo Guidance Computer: Architecture and Operation" by _Framk O'Brien_. (ISBN 978-1-4419-0876-6) Very interesting to see what you can do with a 15 bit architecture when sufficiently motivated. --lyndon
Re: [9fans] PDP11 (Was: Re: what heavy negativity!)
On Thu, Oct 11, 2018 at 10:54:22AM -0700, Lyndon Nerenberg wrote: > > Since when did curiosity become a capital crime? Oh, wait, that > was January 20, 2017. My bad. Turns out it's not, so you can climb down off your cross. It's just that it helps to be a little clearer about your meaning, that's all. Otherwise you might do something embarassing, like posting SAS controller code into an NVMe discussion. khm
Re: [9fans] PDP11 (Was: Re: what heavy negativity!)
> I have been able to copy 1 GiB/s to userspace from an nvme device. I should > think a radio should be no problem. The problem is when you have multiple decoder blocks implemented as individual processes (i.e. the GNU radio model). Once you have everything debugged, you can put it into a single threaded process and eliminate the copy overhead. But it's completely impractical to prototype or debug real applications this way. And it's the prototyping case I'm interested in here. So I'm *curious* to know if page flipping a 'protocol buffer' like object between processes provides an optimization over copying through the kernel. Not so much for the speed aspect, but to free up CPU cycles that can be devoted to actual SDR work. Since when did curiosity become a capital crime? Oh, wait, that was January 20, 2017. My bad. --lyndon
Re: [9fans] PDP11 (Was: Re: what heavy negativity!)
hiro writes: > Does this include demodulation on the pi? Yes. At least to a certain extent. The idea is to get from the high-birate I/Q data so something more amenable to transmission over an RS-422 (or -485) serial drop. One example is for an AIS transceiver on a boat. By putting the radio and decoder at the top of the mast, the backhaul can be a cat-3 twisted pair cable, rather than a much heavier coax run from the antenna at the top of the mast to the receiver below decks. Reducing the weight at the top of the mast reduces the moment arm acting on the boat, significantly enhancing the stability of a sailboat (which is how I got started down this road to begin with). --lyndon
Re: [9fans] PDP11 (Was: Re: what heavy negativity!)
> Posted August 15th, 2013: > https://9p.io/sources/contrib/stallion/src/sdmpt2.c Corresponding > announcement: > https://groups.google.com/forum/#!topic/comp.os.plan9/134-YyYnfbQ This is not a NVMe driver. -- Aram Hăvărneanu
Re: [9fans] PDP11 (Was: Re: what heavy negativity!)
Interesting - was this ever generalized? It's been several years since I last looked, but I seem to recall that unless you went out of your way to write your own 9P implementation, you were limited to a single tag. On Wed, Oct 10, 2018 at 7:51 PM Skip Tavakkolian wrote: > > For operations that matter in this context (read, write), there can be > multiple outstanding tags. A while back rsc implemented fcp, partly to prove > this point. > > On Wed, Oct 10, 2018 at 2:54 PM Steven Stallion wrote: >> >> As the guy who wrote the majority of the code that pushed those 1M 4K >> random IOPS erik mentioned, this thread annoys the shit out of me. You >> don't get an award for writing a driver. In fact, it's probably better >> not to be known at all considering the bloody murder one has to commit >> to marry hardware and software together. >> >> Let's be frank, the I/O handling in the kernel is anachronistic. To >> hit those rates, I had to add support for asynchronous and vectored >> I/O not to mention a sizable bit of work by a co-worker to properly >> handle NUMA on our appliances to hit those speeds. As I recall, we had >> to rewrite the scheduler and re-implement locking, which even Charles >> Forsyth had a hand in. Had we the time and resources to implement >> something like zero-copy we'd have done it in a heartbeat. >> >> In the end, it doesn't matter how "fast" a storage driver is in Plan 9 >> - as soon as you put a 9P-based filesystem on it, it's going to be >> limited to a single outstanding operation. This is the tyranny of 9P. >> We (Coraid) got around this by avoiding filesystems altogether. >> >> Go solve that problem first. >> On Wed, Oct 10, 2018 at 12:36 PM wrote: >> > >> > > But the reason I want this is to reduce latency to the first >> > > access, especially for very large files. With read() I have >> > > to wait until the read completes. With mmap() processing can >> > > start much earlier and can be interleaved with background >> > > data fetch or prefetch. With read() a lot more resources >> > > are tied down. If I need random access and don't need to >> > > read all of the data, the application has to do pread(), >> > > pwrite() a lot thus complicating it. With mmap() I can just >> > > map in the whole file and excess reading (beyond what the >> > > app needs) will not be a large fraction. >> > >> > you think doing single 4K page sized reads in the pagefault >> > handler is better than doing precise >4K reads from your >> > application? possibly in a background thread so you can >> > overlap processing with data fetching? >> > >> > the advantage of mmap is not prefetch. its about not to do >> > any I/O when data is already in the *SHARED* buffer cache! >> > which plan9 does not have (except the mntcache, but that is >> > optional and only works for the disk fileservers that maintain >> > ther file qid ver info consistently). its *IS* really a linux >> > thing where all block device i/o goes thru the buffer cache. >> > >> > -- >> > cinap >> > >>
Re: [9fans] PDP11 (Was: Re: what heavy negativity!)
For operations that matter in this context (read, write), there can be multiple outstanding tags. A while back rsc implemented fcp, partly to prove this point. On Wed, Oct 10, 2018 at 2:54 PM Steven Stallion wrote: > As the guy who wrote the majority of the code that pushed those 1M 4K > random IOPS erik mentioned, this thread annoys the shit out of me. You > don't get an award for writing a driver. In fact, it's probably better > not to be known at all considering the bloody murder one has to commit > to marry hardware and software together. > > Let's be frank, the I/O handling in the kernel is anachronistic. To > hit those rates, I had to add support for asynchronous and vectored > I/O not to mention a sizable bit of work by a co-worker to properly > handle NUMA on our appliances to hit those speeds. As I recall, we had > to rewrite the scheduler and re-implement locking, which even Charles > Forsyth had a hand in. Had we the time and resources to implement > something like zero-copy we'd have done it in a heartbeat. > > In the end, it doesn't matter how "fast" a storage driver is in Plan 9 > - as soon as you put a 9P-based filesystem on it, it's going to be > limited to a single outstanding operation. This is the tyranny of 9P. > We (Coraid) got around this by avoiding filesystems altogether. > > Go solve that problem first. > On Wed, Oct 10, 2018 at 12:36 PM wrote: > > > > > But the reason I want this is to reduce latency to the first > > > access, especially for very large files. With read() I have > > > to wait until the read completes. With mmap() processing can > > > start much earlier and can be interleaved with background > > > data fetch or prefetch. With read() a lot more resources > > > are tied down. If I need random access and don't need to > > > read all of the data, the application has to do pread(), > > > pwrite() a lot thus complicating it. With mmap() I can just > > > map in the whole file and excess reading (beyond what the > > > app needs) will not be a large fraction. > > > > you think doing single 4K page sized reads in the pagefault > > handler is better than doing precise >4K reads from your > > application? possibly in a background thread so you can > > overlap processing with data fetching? > > > > the advantage of mmap is not prefetch. its about not to do > > any I/O when data is already in the *SHARED* buffer cache! > > which plan9 does not have (except the mntcache, but that is > > optional and only works for the disk fileservers that maintain > > ther file qid ver info consistently). its *IS* really a linux > > thing where all block device i/o goes thru the buffer cache. > > > > -- > > cinap > > > >
Re: [9fans] PDP11 (Was: Re: what heavy negativity!)
On Wed, 10 Oct 2018 at 21:40, Ethan Gardener wrote: > > > > Not sure I would agree with that. The 20 bit addressing of the 8086 and > 8088 did not change their 16 bit nature. They were still 16 bit program > counter, with segmentation to provide access to a larger memory - similar > in principle to the PDP11 with MMU. > > That's not at all the same as being constrained to 64KB memory. Are we > communicating at cross purposes here? If we're not, if I haven't > misunderstood you, you might want to read up on creating .exe files for > MS-DOS. Agreed, but the PDP11/70 was not constrained to 64KB memory either. I do recall the MS-DOS small/large/medium etc models that used the segmentation in various ways to mitigate the limitations of being a 16 bit computer. Similar techniques were possible on the PDP11, for example Modula-2/VRS under RT-11 used the MMU to transparently support 4MB programs back in 1984 (it used trap instructions to implement subroutine calls). It wasn't possible under Unix, of course, because there were no system calls for manipulating the mmu. Understandable, as it would have complicated the security model in a multi-tasking system. Something neither MS-DOS or RT-11 had to deal with. Address space manipulation was more convenient with Intel segmentation because the instruction set included procedure call/return instructions that manipulated the segmentation registers, but the situation was not fundamentally different. They were both 16 bit machines with hacks to give access to a larger than 64K physical memory. The OS9 operating system allowed some control of application memory maps in a unix like environement by supporting dynamic (but explicit) link and unlink of subroutine and data modules - which would be added and removed from your 64K address space as required.So more analogous to memory based overlays. > > I went Commodore Amiga at about that time - because it at least > supported some form of multi-tasking out out the box, and I spent many > happy hours getting OS9 running on it.. An interesting architecture, > capable of some impressive graphics, but subject to quite severe > limitations which made general purpose graphics difficult. (Commodore later > released SVR4 Unix for the A3000, but limited X11 to monochrome when using > the inbuilt graphics). > > It does sound like fun. :) I'm not surprised by the monochrome graphics > limitation after my calculations. Still, X11 or any other window system > which lacks a backing store may do better in low-memory environments than > Plan 9's present draw device. It's a shame, a backing store is a great > simplification for programmers. > X11 does, of course, support the concept of a backing store. It just doesn't mandate it. It was an expensive thing to provide back when X11 was young, so pretty rare. I remember finding the need to be able to re-create windows on demand rather annoying when I first learned to program in Xlib, but once you get used to it I find it can lead to benefits when you have to retain a knowledge of how an image is created, not just the end result. > > But being 32 bit didn't give it a huge advantage over the 16 bit x86 > systems for tinkering with operating system, because the 68000 had no MMU. > It was easier to get a Unix like system going with 16 bit segmentation than > a 32 bit linear space and no hardware support for run time relocation. > > (OS9 used position independent code throughout to work without an MMU, > but didn't try to implement fork() semantics). > > I'm sometimes tempted to think that fork() is freakishly high-level crazy > stuff. :) Still, like backing store, it's very nice to have. > I agree. Very elegant when you compare it to the hoops you have to jump through to initialize the child process environment in systems with the more common combined 'forkexec' semantics, but a real sticking point for low end hardware. > > It wasn't till the 68030 based Amiga 3000 came out in 1990 that it > really did everything I wanted. The 68020 with an optional MMU was > equivalent, but not so common in consumer machines. > > > > Hardware progress seems to have been rather uninteresting since then. > Sure, hardware is *much* faster and *much* bigger, but fundamentally the > same architecture. Intel had a brief flirtation with a novel architecture > with the iAPX 432 in 81, but obviously found that was more profitable > making the familiar architecture bigger and faster . > > I rather agree. Multi-core and hyperthreading don't bring in much from an > operating system designer's perspective, and I think all the interesting > things about caches are means of working around their problems. I don't think anyone would bother with multiple cores or caches if that same performance could be achieved without them. They just buy a bit more performance at the cost of additional software complexity. I would very much like to get my hands on a ga144 to see what sort of > operating system structure
Re: [9fans] PDP11 (Was: Re: what heavy negativity!)
Posted August 15th, 2013: https://9p.io/sources/contrib/stallion/src/sdmpt2.c Corresponding announcement: https://groups.google.com/forum/#!topic/comp.os.plan9/134-YyYnfbQ On Wed, Oct 10, 2018 at 5:31 PM Kurt H Maier wrote: > > On Wed, Oct 10, 2018 at 04:54:22PM -0500, Steven Stallion wrote: > > As the guy > > might be worth keeping in mind the current most common use case for nvme > is laptop storage and not building jet engines in coraid's basement > > so the nvme driver that cinap wrote works on my thinkpad today and is > about infinity times faster than the one you guys locked up in the > warehouse at the end of raiders of the lost ark, because my laptop can't > seem to boot off nostalgia. > > so no, nobody gets an award for writing a driver. but cinap won the > 9front Order of Valorous Service (with bronze oak leaf cluster, > signifying working code) for *releasing* one. I was there when field > marshal aiju presented the award; it was a very nice ceremony. > > anyway, someone once said communication is not a zero-sum game. the > hyperspecific use case you describe is fine but there are other reasons > to care about how well this stuff works, you know? > > khm >
Re: [9fans] PDP11 (Was: Re: what heavy negativity!)
On Wed, Oct 10, 2018 at 04:54:22PM -0500, Steven Stallion wrote: > As the guy might be worth keeping in mind the current most common use case for nvme is laptop storage and not building jet engines in coraid's basement so the nvme driver that cinap wrote works on my thinkpad today and is about infinity times faster than the one you guys locked up in the warehouse at the end of raiders of the lost ark, because my laptop can't seem to boot off nostalgia. so no, nobody gets an award for writing a driver. but cinap won the 9front Order of Valorous Service (with bronze oak leaf cluster, signifying working code) for *releasing* one. I was there when field marshal aiju presented the award; it was a very nice ceremony. anyway, someone once said communication is not a zero-sum game. the hyperspecific use case you describe is fine but there are other reasons to care about how well this stuff works, you know? khm
Re: [9fans] PDP11 (Was: Re: what heavy negativity!)
hahahahahahahaha -- cinap
Re: [9fans] PDP11 (Was: Re: what heavy negativity!)
As the guy who wrote the majority of the code that pushed those 1M 4K random IOPS erik mentioned, this thread annoys the shit out of me. You don't get an award for writing a driver. In fact, it's probably better not to be known at all considering the bloody murder one has to commit to marry hardware and software together. Let's be frank, the I/O handling in the kernel is anachronistic. To hit those rates, I had to add support for asynchronous and vectored I/O not to mention a sizable bit of work by a co-worker to properly handle NUMA on our appliances to hit those speeds. As I recall, we had to rewrite the scheduler and re-implement locking, which even Charles Forsyth had a hand in. Had we the time and resources to implement something like zero-copy we'd have done it in a heartbeat. In the end, it doesn't matter how "fast" a storage driver is in Plan 9 - as soon as you put a 9P-based filesystem on it, it's going to be limited to a single outstanding operation. This is the tyranny of 9P. We (Coraid) got around this by avoiding filesystems altogether. Go solve that problem first. On Wed, Oct 10, 2018 at 12:36 PM wrote: > > > But the reason I want this is to reduce latency to the first > > access, especially for very large files. With read() I have > > to wait until the read completes. With mmap() processing can > > start much earlier and can be interleaved with background > > data fetch or prefetch. With read() a lot more resources > > are tied down. If I need random access and don't need to > > read all of the data, the application has to do pread(), > > pwrite() a lot thus complicating it. With mmap() I can just > > map in the whole file and excess reading (beyond what the > > app needs) will not be a large fraction. > > you think doing single 4K page sized reads in the pagefault > handler is better than doing precise >4K reads from your > application? possibly in a background thread so you can > overlap processing with data fetching? > > the advantage of mmap is not prefetch. its about not to do > any I/O when data is already in the *SHARED* buffer cache! > which plan9 does not have (except the mntcache, but that is > optional and only works for the disk fileservers that maintain > ther file qid ver info consistently). its *IS* really a linux > thing where all block device i/o goes thru the buffer cache. > > -- > cinap >
Re: [9fans] PDP11 (Was: Re: what heavy negativity!)
Well, I think 'avoid at all costs' is a bit strong. The Raspberry Pi is a good little platform for the right applications, so long as you are aware of its limitations. I use one as my 'always on' home server to give me access files when travelling (the networking is slow by LAN standards, but ok for WAN), and another for my energy monitoring system. It is good for experimenting with OS's, especially networking OS's like Plan9 where price is important if you want to try a large number of hosts. Its good for teaching/learning. Or for running/trying different operating systems without having do spend time and resources setting up VMs (downloading and flashing an sd card image is quick and takes up no space on my main systems). Just don't plan on deploying RPi's for mission critical applications that have demanding I/O or processing requirements. It was never intended to compete in that market. On Wed, 10 Oct 2018 at 20:54, hiro <23h...@gmail.com> wrote: > I agree, if you have a choice avoid rpi by all costs. > Even if the software side of that other board was less pleasent at least > it worked with my mouse and keyboard!! :) > > As I said I was looking at 2Mbit/s stuff, which is nothing, even over USB. > But my point is that even though this number is low, the rpi is too limited > to do any meaningful processing anyway (ignoring the usb troubles and lack > of ethernet). It's a mobile phone soc after all, where the modulation is > done by dedicated chips, not on cpu! :) > > On Wednesday, October 10, 2018, Digby R.S. Tarvin > wrote: > > I don't know which other ARM board you tried, but I have always found > terrible I/O performance of the Pi to be a bigger problem that the ARM > speed. The USB2 interface is really slow, and there arn't really many > other (documented) alternative options. The Ethernet goes through the same > slow USB interface, and there is only so much that you can do bit bashing > data with GPIO's. The sdCard interface seems to be the only non-usb > filesystem I/O available. And that in turn limits the viability of > relieving the RAM contraints with virtual memory. So the ARM processor > itself is not usually the problem for me. > > In general I find the pi a nice little device for quite a few things - > like low power, low bandwidth, low cost servers or displays with plenty of > open source compatability.. Or hacking/prototyping where I don't want to > have to worry too much about blowing things up. But it not good for high > throughput I/O, memory intensive applications, or anything requiring a lot > of processing power. > > The validity of your conclusion regarding low power ARM in general > probably depends on what the other board you tried was.. > > DigbyT > > On Wed, 10 Oct 2018 at 17:51, hiro <23h...@gmail.com> wrote: > >> > >> > Eliminating as much of the copy in/out WRT the kernel cannot but > >> > help, especially when you're doing SDR decoding near the radios > >> > using low-powered compute hardware (think Pies and the like). > >> > >> Does this include demodulation on the pi? cause even when i dumped the > >> pi i was given for that purpose (with a <2Mbit I/Q stream) and > >> replaced it with some similar ARM platform that at least had neon cpu > >> instruction extensions for faster floating point operations, I was > >> barely able to run a small FFT. > >> > >> My conclusion was that these low-powered ARM systems are just good > >> enough for gathering low-bandwidth, non-critical USB traffic, like > >> those raw I/Q samples from a dongle, but unfit for anything else. > >> > >
Re: [9fans] PDP11 (Was: Re: what heavy negativity!)
> But the reason I want this is to reduce latency to the first > access, especially for very large files. With read() I have > to wait until the read completes. With mmap() processing can > start much earlier and can be interleaved with background > data fetch or prefetch. With read() a lot more resources > are tied down. If I need random access and don't need to > read all of the data, the application has to do pread(), > pwrite() a lot thus complicating it. With mmap() I can just > map in the whole file and excess reading (beyond what the > app needs) will not be a large fraction. you think doing single 4K page sized reads in the pagefault handler is better than doing precise >4K reads from your application? possibly in a background thread so you can overlap processing with data fetching? the advantage of mmap is not prefetch. its about not to do any I/O when data is already in the *SHARED* buffer cache! which plan9 does not have (except the mntcache, but that is optional and only works for the disk fileservers that maintain ther file qid ver info consistently). its *IS* really a linux thing where all block device i/o goes thru the buffer cache. -- cinap
Re: [9fans] PDP11 (Was: Re: what heavy negativity!)
oh! you wrote a nvme driver TOO? where can i find it? maybe we can share some knowledge. especially regarding some quirks. i dont own hardware myself, so i wrote it using an emulator over a weekend and tested it on a work machine afterwork. http://code.9front.org/hg/plan9front/log/9df9ef969856/sys/src/9/pc/sdnvme.c -- cinap
Re: [9fans] PDP11 (Was: Re: what heavy negativity!)
> > with meltdown/Spectre mitigations in place, I would like to see evidence > > that flip is faster than copy. > > If your system is well balanced, you should be able to > stream data as fast as memory allows[1]. In such a system > copying things N times will reduce throughput by similar > factor. It may be that plan9 underperforms so much this > doesn't matter normally. sure. but flipping page tables is also not free. there is a huge cost in processor stalls, etc. spectre and meltdown mitigations make this worse as each page flip has to be accompanied by a complete pipeline flush or other costly mitigation. (not that this was cheap to begin with) it's also not an object to move data as fast as possible. the object is to do work as fast as possible. > [1] See: https://code.kx.com/q/cloud/aws/benchmarking/ > A single q process can ingest data at 1.9GB/s from a > single drive. 16 can achieve 2.7GB/s, with theoretical > max being 2.8GB/s. with my same crappy un-optimized nvme driver, i was able to hit 2.5-2.6 GiB/s with two very crappy nvme drives. (are you're numbers really GB rather than GiB?) i am sure i could scale that lineraly. there's plenty of memory bandwidth left, but i haven't got any more nvme. :-) similarly coraid built an appliance that did copying (due to cache) and hit 1 million 4k iops. this was in 2011 or so. but, so what. all this proves is that with copying or without, we can ingest enough data for even the most hungry programs. unless you have data that shows otherwise. :-) - erik
Re: [9fans] PDP11 (Was: Re: what heavy negativity!)
people come down very hard on the pi. here are my times for building the pi kernel. i rebuilt it a few times to push data into any caches available. pi3+ with a high-ish spec sd card: 23 secs dual intel atom 1.8Ghz with an SSD: 9 secs the pi is slower, but not 10 times slower. However it does cost a 10th of the price and consumes a 10th of the electricity. i use the order of magnitude test as that is (in my experience) what you need to make a really noticeable difference (to stuff in general). i use one daily as a plan9 terminal, for which i feel its ideal. -Steve
Re: [9fans] PDP11 (Was: Re: what heavy negativity!)
On Tue, Oct 9, 2018, at 8:14 PM, Lyndon Nerenberg wrote: > hiro writes: > > > Huh? What exactly do you mean? Can you describe the scenario and the > > measurements you made? > > The big one is USB. disk/radio->kernel->user-space-usbd->kernel->application. > Four copies. > > I would like to start playing with software defined radio on Plan > 9, but that amount of data copying is going to put a lot of pressure > on the kernel to keep up. UNIX/Linux suffers the same copy bloat, > and it's having trouble keeping up, too. References, please. Programmers are notoriously bad at determining the cause of performance problems. Examining the source will help to see if "copy bloat" is the actual problem. > > --lyndon > -- Progress might have been all right once, but it has gone on too long -- Ogden Nash
Re: [9fans] PDP11 (Was: Re: what heavy negativity!)
On Tue, Oct 9, 2018, at 11:22 PM, Digby R.S. Tarvin wrote: > > > On Tue, 9 Oct 2018 at 23:00, Ethan Gardener wrote: >> >> Fascinating thread, but I think you're off by a decade with the 16-bit >> address bus comment, unless you're not actually talking about Plan 9. The >> 8086 and 8088 were introduced with 20-bit addressing in 1978 and 1979 >> respectively. The IBM PC, launched in 1982, had its ROM at the top of that >> 1MByte space, so it couldn't have been constrained in that way. By the end >> of the 80s, all my schoolmates had 68k-powered computers from Commodore and >> Atari, showing hardware with a 24-bit address space was very much affordable >> and ubiquitous at the time Plan 9 development started. Almost all of them >> had 512KB at the time. A few flashy gits had 1MB machines. :) > > Not sure I would agree with that. The 20 bit addressing of the 8086 and 8088 > did not change their 16 bit nature. They were still 16 bit program counter, > with segmentation to provide access to a larger memory - similar in principle > to the PDP11 with MMU. That's not at all the same as being constrained to 64KB memory. Are we communicating at cross purposes here? If we're not, if I haven't misunderstood you, you might want to read up on creating .exe files for MS-DOS. > The first 32 bit x86 processor was the 386, which I think came out in 1985, > very close to when work on Plan9 was rumored to have started. So it seemed > not impossible that work might have started on an older 16 bit machine, but > at Bell Labs probably a long shot. Mmh, rumors. I read they were starting to think about Plan 9 in 1985, but I haven't read anything about it being up and running until '89 or '90. There's not much to go on. >> I still wish I'd kept the better of the Atari STs which made their way down >> to me -- a "1040 STE" -- 1MB with a better keyboard and ROM than the earlier >> "STFM" models. I remember wanting to try to run Plan 9 on it. Let's >> estimate how tight it would be... >> >> I think it would be terrible, because I got frustrated enough trying to run >> a 4e CPU server with graphics on a 2GB x86. I kept running out of image >> memory! The trouble was the draw device in 4th edition stores images in the >> same "image memory" the kernel loads programs into, and the 386 CPU kernel >> 'only' allocates 64MB of that. :) >> >> 1 bit per pixel would obviously improve matters by a factor of 16 compared >> to my setup, and 640x400 (Atari ST high resolution) would be another 5 times >> smaller than my screen. Putting these numbers together with my experience, >> you'd have to be careful to use images sparingly on a machine with 800KB >> free RAM after the kernel is loaded. That's better than I thought, probably >> achievable on that Atari I had, but it couldn't be used as intensively as I >> used Plan 9 back then. >> >> How could it be used? I think it would be a good idea to push the draw >> device back to user space and make very sure to have it check for failing >> malloc! I certainly wouldn't want a terminal with a filesystem and graphics >> all on a single 1MByte 64000-powered computer, because a filesystem on a >> terminal runs in user space, and thus requires some free memory to run the >> programs to shut it down. Actually, Plan 9's separation of terminal from >> filesystem seems quite the obvious choice when I look at it like this. :) > > I went Commodore Amiga at about that time - because it at least supported > some form of multi-tasking out out the box, and I spent many happy hours > getting OS9 running on it.. An interesting architecture, capable of some > impressive graphics, but subject to quite severe limitations which made > general purpose graphics difficult. (Commodore later released SVR4 Unix for > the A3000, but limited X11 to monochrome when using the inbuilt graphics). It does sound like fun. :) I'm not surprised by the monochrome graphics limitation after my calculations. Still, X11 or any other window system which lacks a backing store may do better in low-memory environments than Plan 9's present draw device. It's a shame, a backing store is a great simplification for programmers. > But being 32 bit didn't give it a huge advantage over the 16 bit x86 systems > for tinkering with operating system, because the 68000 had no MMU. It was > easier to get a Unix like system going with 16 bit segmentation than a 32 bit > linear space and no hardware support for run time relocation. > (OS9 used position independent code throughout to work without an MMU, but > didn't try to implement fork() semantics). I'm sometimes tempted to think that fork() is freakishly high-level crazy stuff. :) Still, like backing store, it's very nice to have. > It wasn't till the 68030 based Amiga 3000 came out in 1990 that it really did > everything I wanted. The 68020 with an optional MMU was equivalent, but not > so common in
Re: [9fans] PDP11 (Was: Re: what heavy negativity!)
I agree, if you have a choice avoid rpi by all costs. Even if the software side of that other board was less pleasent at least it worked with my mouse and keyboard!! :) As I said I was looking at 2Mbit/s stuff, which is nothing, even over USB. But my point is that even though this number is low, the rpi is too limited to do any meaningful processing anyway (ignoring the usb troubles and lack of ethernet). It's a mobile phone soc after all, where the modulation is done by dedicated chips, not on cpu! :) On Wednesday, October 10, 2018, Digby R.S. Tarvin wrote: > I don't know which other ARM board you tried, but I have always found terrible I/O performance of the Pi to be a bigger problem that the ARM speed. The USB2 interface is really slow, and there arn't really many other (documented) alternative options. The Ethernet goes through the same slow USB interface, and there is only so much that you can do bit bashing data with GPIO's. The sdCard interface seems to be the only non-usb filesystem I/O available. And that in turn limits the viability of relieving the RAM contraints with virtual memory. So the ARM processor itself is not usually the problem for me. > In general I find the pi a nice little device for quite a few things - like low power, low bandwidth, low cost servers or displays with plenty of open source compatability.. Or hacking/prototyping where I don't want to have to worry too much about blowing things up. But it not good for high throughput I/O, memory intensive applications, or anything requiring a lot of processing power. > The validity of your conclusion regarding low power ARM in general probably depends on what the other board you tried was.. > DigbyT > On Wed, 10 Oct 2018 at 17:51, hiro <23h...@gmail.com> wrote: >> >> > Eliminating as much of the copy in/out WRT the kernel cannot but >> > help, especially when you're doing SDR decoding near the radios >> > using low-powered compute hardware (think Pies and the like). >> >> Does this include demodulation on the pi? cause even when i dumped the >> pi i was given for that purpose (with a <2Mbit I/Q stream) and >> replaced it with some similar ARM platform that at least had neon cpu >> instruction extensions for faster floating point operations, I was >> barely able to run a small FFT. >> >> My conclusion was that these low-powered ARM systems are just good >> enough for gathering low-bandwidth, non-critical USB traffic, like >> those raw I/Q samples from a dongle, but unfit for anything else. >> >
Re: [9fans] PDP11 (Was: Re: what heavy negativity!)
I don't know which other ARM board you tried, but I have always found terrible I/O performance of the Pi to be a bigger problem that the ARM speed. The USB2 interface is really slow, and there arn't really many other (documented) alternative options. The Ethernet goes through the same slow USB interface, and there is only so much that you can do bit bashing data with GPIO's. The sdCard interface seems to be the only non-usb filesystem I/O available. And that in turn limits the viability of relieving the RAM contraints with virtual memory. So the ARM processor itself is not usually the problem for me. In general I find the pi a nice little device for quite a few things - like low power, low bandwidth, low cost servers or displays with plenty of open source compatability.. Or hacking/prototyping where I don't want to have to worry too much about blowing things up. But it not good for high throughput I/O, memory intensive applications, or anything requiring a lot of processing power. The validity of your conclusion regarding low power ARM in general probably depends on what the other board you tried was.. DigbyT On Wed, 10 Oct 2018 at 17:51, hiro <23h...@gmail.com> wrote: > > Eliminating as much of the copy in/out WRT the kernel cannot but > > help, especially when you're doing SDR decoding near the radios > > using low-powered compute hardware (think Pies and the like). > > Does this include demodulation on the pi? cause even when i dumped the > pi i was given for that purpose (with a <2Mbit I/Q stream) and > replaced it with some similar ARM platform that at least had neon cpu > instruction extensions for faster floating point operations, I was > barely able to run a small FFT. > > My conclusion was that these low-powered ARM systems are just good > enough for gathering low-bandwidth, non-critical USB traffic, like > those raw I/Q samples from a dongle, but unfit for anything else. > >
Re: [9fans] PDP11 (Was: Re: what heavy negativity!)
Il giorno mar 9 ott 2018 alle ore 05:33 Lucio De Re ha scritto: > > On 10/9/18, Bakul Shah wrote: > > > > One thing I have mused about is recasting plan9 as a > > microkernel and pushing out a lot of its kernel code into user > > mode code. It is already half way there -- it is basically a > > mux for 9p calls, low level device drivers, > > > There are religious reasons not to go there Indeed, as an heretic, one of the first things I did with Jehanne was to move the console filesystem out of kernel. Then I moved several syscalls into userspace. Or turned them to files or to operation on existing files. More syscall/kernel services will move to user space as I'll have time to hack it again. You know... heretics ruin everything! I'm not going to turn Jehanne to a microkernel, but I'm looking for the simplest possible set of kernel abstractions that can support a distributed operating system able to replace the mainstream Web+OS mess. You know... heretics are crazy, too! Giacomo
Re: [9fans] PDP11 (Was: Re: what heavy negativity!)
zero copy is also the source of the dreaded 'D' state.- Erik
Re: [9fans] PDP11 (Was: Re: what heavy negativity!)
On Oct 9, 2018, at 3:06 PM, erik quanstrom wrote: > > with meltdown/Spectre mitigations in place, I would like to see evidence that > flip is faster than copy. If your system is well balanced, you should be able to stream data as fast as memory allows[1]. In such a system copying things N times will reduce throughput by similar factor. It may be that plan9 underperforms so much this doesn't matter normally. But the reason I want this is to reduce latency to the first access, especially for very large files. With read() I have to wait until the read completes. With mmap() processing can start much earlier and can be interleaved with background data fetch or prefetch. With read() a lot more resources are tied down. If I need random access and don't need to read all of the data, the application has to do pread(), pwrite() a lot thus complicating it. With mmap() I can just map in the whole file and excess reading (beyond what the app needs) will not be a large fraction. The default assumption here seems to be that doing this will be very complicated and be as bad as on Linux. But Linux is not a good model of what to do and examples of what not to do are not useful guides in system design. There are other OSes such as the old Apollo Aegis (AKA Apollo/Domain), KeyKOS & seL4 that avoid copying[2]. Though none of this matters right now as we don't even have a paper design so please put down your clubs and swords :-) [1] See: https://code.kx.com/q/cloud/aws/benchmarking/ A single q process can ingest data at 1.9GB/s from a single drive. 16 can achieve 2.7GB/s, with theoretical max being 2.8GB/s. [2] Liedke's original L4 evolved into a provably secure seL4 and in the process it became very much like KeyKOS. Capability systems do pass around pages as protected objects and avoid copying. Sort of like how in a program you'd pass a huge array by reference and not by value to a function.
Re: [9fans] PDP11 (Was: Re: what heavy negativity!)
I have been able to copy 1 GiB/s to userspace from an nvme device. I should think a radio should be no problem.- Erik
Re: [9fans] PDP11 (Was: Re: what heavy negativity!)
> via USB and see how it stands up. But the real question is what > kind of delay, latency, and jitter will there be, getting that raw > I/Q data from the USB interface up to the consuming application? How is your proposal of zero-copy going to help latency? IIRC we have some real-time thingy, might be able to reduce jitter... But then I might also ask why you're not doing the most critical path on an fpga anyway? Start with identifying your worst bottleneck. > Eliminating as much of the copy in/out WRT the kernel cannot but > help wrong, this design change requires ressources, too, and might gain you higher complexity. measure first.
Re: [9fans] PDP11 (Was: Re: what heavy negativity!)
> Eliminating as much of the copy in/out WRT the kernel cannot but > help, especially when you're doing SDR decoding near the radios > using low-powered compute hardware (think Pies and the like). Does this include demodulation on the pi? cause even when i dumped the pi i was given for that purpose (with a <2Mbit I/Q stream) and replaced it with some similar ARM platform that at least had neon cpu instruction extensions for faster floating point operations, I was barely able to run a small FFT. My conclusion was that these low-powered ARM systems are just good enough for gathering low-bandwidth, non-critical USB traffic, like those raw I/Q samples from a dongle, but unfit for anything else.
Re: [9fans] PDP11 (Was: Re: what heavy negativity!)
I was responding to lyndon's comment on certain "experiments" that should have to be done here, 2 messages up. But what he described sounded exactly like the zero-copying stuff that linux is trying to shove into everything. I have not made any statement about non-linux systems, and I'm not even saying these experiments couldn't be done on plan9, it's just that the linux people are way busier going down that path. On 10/10/18, Dan Cross wrote: > On Tue, Oct 9, 2018 at 7:24 PM hiro <23h...@gmail.com> wrote: > >> from what i see in linux people have been more than just exploring it, >> they've gone absolutely nuts. it makes everything complex, not just >> the fast path. >> > > To whom are you responding? Your email is devoid of context, so it is not > clear. > > However your statement appears to be based on an unstated assumption that > there is a plan9 school of thought, and a Linux school of thought, and no > other school of thought. If so, that is incorrect. > > - Dan C. >
Re: [9fans] PDP11 (Was: Re: what heavy negativity!)
On Tue, Oct 9, 2018 at 7:24 PM hiro <23h...@gmail.com> wrote: > from what i see in linux people have been more than just exploring it, > they've gone absolutely nuts. it makes everything complex, not just > the fast path. > To whom are you responding? Your email is devoid of context, so it is not clear. However your statement appears to be based on an unstated assumption that there is a plan9 school of thought, and a Linux school of thought, and no other school of thought. If so, that is incorrect. - Dan C.
Re: [9fans] PDP11 (Was: Re: what heavy negativity!)
cinap_len...@felloff.net writes: > why? the *HOST CONTROLLER* schedules the data transfers. I *DON'T KNOW*. It's just observed behaviour. > a! we'r talking about some crappy raspi here... probably with all > caches disabled... never mind. Hah. An Rpi tips over with 1200 baud USB serial. I was talking about "real" (Intel :-P) hardware for the other tippy-over behaviour. --lyndon
Re: [9fans] PDP11 (Was: Re: what heavy negativity!)
> To address Hiro's comments, I have no benchmarks on Plan 9, because > the SDR code I run does not exist there. But I do have experience > with running SDR on Linux and FreeBSD with hardware like the HackRF > One. That hardware can easily saturate a USB2 interface/driver on > both of those operating systems. Given my experience with USB on > Plan 9 to date, it's a safe bet that all the variants would die > when presented with that amount of traffic. why? the *HOST CONTROLLER* schedules the data transfers. if the program doesnt do a read() theres nothing to schedule... (unless its isochronous endpoint, in which case the controller dma's for you in the background at the specified sampling rate). > (I can knock down a Plan9 system with 56 Kb/s USB serial traffic.) that sounds seriously scewed up. i have no issues here reading a usb stick on my x230 with xhci at 32MB/s, not using any fancy streaming optimization. no load at all. and this is just some garbage from the supermarket. > I can see about > twisting up some code that would read the raw I/Q data from the SDR > via USB and see how it stands up. But the real question is what > kind of delay, latency, and jitter will there be, getting that raw > I/Q data from the USB interface up to the consuming application? is this a isochronous endpoint? in that case you would not have to worry much as the controller does all the timing for you in hardware. > Eliminating as much of the copy in/out WRT the kernel cannot but > help, especially when you're doing SDR decoding near the radios > using low-powered compute hardware (think Pies and the like). a! we'r talking about some crappy raspi here... probably with all caches disabled... never mind. > --lyndon -- cinap
Re: [9fans] PDP11 (Was: Re: what heavy negativity!)
cinap_len...@felloff.net writes: > > The big one is USB. disk/radio->kernel->user-space-usbd->kernel->applicati > on. > > Four copies. > > that sounds wrong. > > usbd is not involved in the data transfer. You're right, I was wrong about 'usbd'. In the bits of testing I've done with this, 'usbd' is replaces with a user space file server that abstracts the hardware and presents a useful file system interface. (E.g. along the lines of the gps filesystem interface.) To address Hiro's comments, I have no benchmarks on Plan 9, because the SDR code I run does not exist there. But I do have experience with running SDR on Linux and FreeBSD with hardware like the HackRF One. That hardware can easily saturate a USB2 interface/driver on both of those operating systems. Given my experience with USB on Plan 9 to date, it's a safe bet that all the variants would die when presented with that amount of traffic. (I can knock down a Plan9 system with 56 Kb/s USB serial traffic.) I can see about twisting up some code that would read the raw I/Q data from the SDR via USB and see how it stands up. But the real question is what kind of delay, latency, and jitter will there be, getting that raw I/Q data from the USB interface up to the consuming application? Eliminating as much of the copy in/out WRT the kernel cannot but help, especially when you're doing SDR decoding near the radios using low-powered compute hardware (think Pies and the like). --lyndon
Re: [9fans] PDP11 (Was: Re: what heavy negativity!)
On Tue, 9 Oct 2018 at 23:00, Ethan Gardener wrote: > > Fascinating thread, but I think you're off by a decade with the 16-bit > address bus comment, unless you're not actually talking about Plan 9. The > 8086 and 8088 were introduced with 20-bit addressing in 1978 and 1979 > respectively. The IBM PC, launched in 1982, had its ROM at the top of that > 1MByte space, so it couldn't have been constrained in that way. By the end > of the 80s, all my schoolmates had 68k-powered computers from Commodore and > Atari, showing hardware with a 24-bit address space was very much > affordable and ubiquitous at the time Plan 9 development started. Almost > all of them had 512KB at the time. A few flashy gits had 1MB machines. :) > Not sure I would agree with that. The 20 bit addressing of the 8086 and 8088 did not change their 16 bit nature. They were still 16 bit program counter, with segmentation to provide access to a larger memory - similar in principle to the PDP11 with MMU. The first 32 bit x86 processor was the 386, which I think came out in 1985, very close to when work on Plan9 was rumored to have started. So it seemed not impossible that work might have started on an older 16 bit machine, but at Bell Labs probably a long shot. > I still wish I'd kept the better of the Atari STs which made their way > down to me -- a "1040 STE" -- 1MB with a better keyboard and ROM than the > earlier "STFM" models. I remember wanting to try to run Plan 9 on it. > Let's estimate how tight it would be... > > I think it would be terrible, because I got frustrated enough trying to > run a 4e CPU server with graphics on a 2GB x86. I kept running out of > image memory! The trouble was the draw device in 4th edition stores images > in the same "image memory" the kernel loads programs into, and the 386 CPU > kernel 'only' allocates 64MB of that. :) > > 1 bit per pixel would obviously improve matters by a factor of 16 compared > to my setup, and 640x400 (Atari ST high resolution) would be another 5 > times smaller than my screen. Putting these numbers together with my > experience, you'd have to be careful to use images sparingly on a machine > with 800KB free RAM after the kernel is loaded. That's better than I > thought, probably achievable on that Atari I had, but it couldn't be used > as intensively as I used Plan 9 back then. > > How could it be used? I think it would be a good idea to push the draw > device back to user space and make very sure to have it check for failing > malloc! I certainly wouldn't want a terminal with a filesystem and > graphics all on a single 1MByte 64000-powered computer, because a > filesystem on a terminal runs in user space, and thus requires some free > memory to run the programs to shut it down. Actually, Plan 9's separation > of terminal from filesystem seems quite the obvious choice when I look at > it like this. :) > I went Commodore Amiga at about that time - because it at least supported some form of multi-tasking out out the box, and I spent many happy hours getting OS9 running on it.. An interesting architecture, capable of some impressive graphics, but subject to quite severe limitations which made general purpose graphics difficult. (Commodore later released SVR4 Unix for the A3000, but limited X11 to monochrome when using the inbuilt graphics). But being 32 bit didn't give it a huge advantage over the 16 bit x86 systems for tinkering with operating system, because the 68000 had no MMU. It was easier to get a Unix like system going with 16 bit segmentation than a 32 bit linear space and no hardware support for run time relocation. (OS9 used position independent code throughout to work without an MMU, but didn't try to implement fork() semantics). It wasn't till the 68030 based Amiga 3000 came out in 1990 that it really did everything I wanted. The 68020 with an optional MMU was equivalent, but not so common in consumer machines. Hardware progress seems to have been rather uninteresting since then. Sure, hardware is *much* faster and *much* bigger, but fundamentally the same architecture. Intel had a brief flirtation with a novel architecture with the iAPX 432 in 81, but obviously found that was more profitable making the familiar architecture bigger and faster .
Re: [9fans] PDP11 (Was: Re: what heavy negativity!)
he has ignored my questions about measurement, so i'm sure he hasn't On 10/9/18, cinap_len...@felloff.net wrote: > also, i wonder how much is the actual copy overhead you claim is the issue. > maybe the impact for copying is more dominated by the memory allocator used > for allocb(). have you measured? > > -- > cinap > >
Re: [9fans] PDP11 (Was: Re: what heavy negativity!)
also, i wonder how much is the actual copy overhead you claim is the issue. maybe the impact for copying is more dominated by the memory allocator used for allocb(). have you measured? -- cinap
Re: [9fans] PDP11 (Was: Re: what heavy negativity!)
> The big one is USB. disk/radio->kernel->user-space-usbd->kernel->application. > Four copies. that sounds wrong. usbd is not involved in the data transfer. it mainly is just responsible to enumerating devices and instantiating drivers and registering the endpoints in devusb. after that you access the endpoint files from devusb which goes directly to the kernel. devusb also allows you to create a alias for a endpoint file which then appears directly under /dev. usb audio uses this mechanism. the usb driver just activates the device and provides the ctl/volume files, while audio data is handled by the kernel's devusb. on another remark regarding zero copy. the reason plan9 drivers are small comes from NOT doing these "optimizations". identity mapping the low part of memory in the kernel avoids alot of trouble and allows you to get DMA capable memory with just wrapping a pointer in PADDR(va). no page lists needed. no MMU tricks needed in the drivers. you can use any kernel memory va for DMA... even your kernel stack! its never paged out. you can be sure it is not changed while the device looks at it ect. do not underestimate the impact of this "simplification". linux block layer is broken in that regard btw. it just hands user pages into the drivers without making sure they do not change while the i/o is in flight, which results in all kinds of false-negatives when you actually start verifying your raid arrays as different snapshots in time got written out to the raid members. they know about this and ignore it because benchmarks are more important. -- cinap
Re: [9fans] PDP11 (Was: Re: what heavy negativity!)
hiro writes: > from what i see in linux people have been more than just exploring it, > they've gone absolutely nuts. it makes everything complex, not just > the fast path. And those are the Linux folks doing thier thing. The reading I'm doing right now is related to the pessimizations page flipping throws at the CPU caches. It looks scary ... --lyndon
Re: [9fans] PDP11 (Was: Re: what heavy negativity!)
also, if all you care about is throughput, i don't see how those 4 copies you identified makes a difference. especially with something slow like USB.
Re: [9fans] PDP11 (Was: Re: what heavy negativity!)
from what i see in linux people have been more than just exploring it, they've gone absolutely nuts. it makes everything complex, not just the fast path.
Re: [9fans] PDP11 (Was: Re: what heavy negativity!)
Bakul Shah writes: And funny you should mention this! > Some of this process/memory management can be delegated to > user code as well. At $DAYJOB we would really like to have application process control over the kernel scheduler, as this seems to be the only realistic way to avoid the (kernel) resource starvation issues we run into. Our back end servers don't go down often. But when they do, it's for reasons entirely out of our control. Because those resource allocation policies have been pushed into the kernel, and beyond our control. --lyndon
Re: [9fans] PDP11 (Was: Re: what heavy negativity!)
hiro writes: > > Dealing with the security issues isn't trivial > what security issues? Passing protocol buffer like objects around user space, that might affect how the kernel talks to hardware. E.g. IPsec offload into hardware. You don't want user-space messing with that sort of context, but you want to tag it with the data buffer as it gets passed up and down through the user/kernel gate. Practical page flipping needs a kernel-read-only context attached to the non-kernel user data part of the page. A quick solution is to pair pages, one half of which the kernel owns, the other being the data payload. But that't just a start. And that's all I'm saying: this might be an approach to a better/faster I/O paradigm, but it needs interested people to explore it ... --lyndon
Re: [9fans] PDP11 (Was: Re: what heavy negativity!)
hiro writes: > Huh? What exactly do you mean? Can you describe the scenario and the > measurements you made? The big one is USB. disk/radio->kernel->user-space-usbd->kernel->application. Four copies. I would like to start playing with software defined radio on Plan 9, but that amount of data copying is going to put a lot of pressure on the kernel to keep up. UNIX/Linux suffers the same copy bloat, and it's having trouble keeping up, too. --lyndon
Re: [9fans] PDP11 (Was: Re: what heavy negativity!)
On Tue, 9 Oct 2018 10:50:08 -0700 Bakul Shah wrote: > Exactly! No point in being scared by labels! I am really > only talking about distilling plan9 further. At least as a > thought experiment. > > Isn’t it more fun to discuss this than all the “heavy > negativity”? :-) It's much better with patches. -- Ori Bernstein
Re: [9fans] PDP11 (Was: Re: what heavy negativity!)
> E.g. right now Plan 9 suffers from a *lot* of data copying between > the kernel and processes, and between processes themselves. Huh? What exactly do you mean? Can you describe the scenario and the measurements you made? > If we could eliminate most of that copying, things would get a lot faster. Which things would get faster? > Dealing with the security issues isn't trivial what security issues?
Re: [9fans] PDP11 (Was: Re: what heavy negativity!)
> On Oct 9, 2018, at 2:45 AM, Ethan Gardener wrote: > > One day, Uriel met a man who explained very > convincingly that the Plan 9 kernel is a microkernel. > On another day, Uriel met a man who explained very > convincingly that the Plan 9 kernel is a macrokernel. > Uriel was enlightened. Exactly! No point in being scared by labels! I am really only talking about distilling plan9 further. At least as a thought experiment. Isn’t it more fun to discuss this than all the “heavy negativity”? :-)
Re: [9fans] PDP11 (Was: Re: what heavy negativity!)
Bakul Shah writes: > One thing I have mused about is recasting plan9 as a > microkernel and pushing out a lot of its kernel code into user > mode code. It is already half way there -- it is basically a > mux for 9p calls, low level device drivers, VM support & some > process related code. Somewhat related to this ... after reading some papers on TCP-in-user-space implementations, I've been thinking about how an interface that supported fast/secure page flipping between the kernel and process address space would change how we do things. E.g. right now Plan 9 suffers from a *lot* of data copying between the kernel and processes, and between processes themselves. If we could eliminate most of that copying, things would get a lot faster. Dealing with the security issues isn't trivial, but the programmer time going into eeking out the last bit of I/O throughput of the current scheme could be redirected. If it works, this would reduce the kernel back to handling process/memory management, and talking to the hardware. Not a micro-kernel, but just as good from a practical standpoint. And no, this wouldn't get us to running on the 11/70. But by taking advantage of modern large virtual memory spaces by using page flipping, we could cut down on physical memory usage in the kernel. --lyndon
Re: [9fans] PDP11 (Was: Re: what heavy negativity!)
> From what I recall, PDP11 hardware memory management was based on > segmentation rather than paging (64K divided into 16 variable sized > segments), and Unix did swapping rather than paging (a process is either > completely in memory or completely on disk). It does relocation and completely in memory /and running/. or swapped out. - erik
Re: [9fans] PDP11 (Was: Re: what heavy negativity!)
> I think it would be terrible, because I got frustrated enough trying to run a > 4e CPU server with graphics on a 2GB x86. I kept running out of image > memory! The trouble was the draw device in 4th edition stores images in the > same "image memory" the kernel loads programs into, and the 386 CPU kernel > 'only' allocates 64MB of that. :) this was changed long ago. image memory can now be much bigger. i never had a problem when a 4e terminal was my daily driver. - erik
Re: [9fans] PDP11 (Was: Re: what heavy negativity!)
On Tue, Oct 9, 2018, at 4:08 AM, Digby R.S. Tarvin wrote: > I thought there might have been a chance of an early attempt to target the > x86 because of its ubiquity and low cost - which could be useful for a > networked operating system. And those were 16 bit address constrained in the > early days. But its probably not an architecture you would choose to work > with if you had a choice.. 68K is what I would have gone for.. Fascinating thread, but I think you're off by a decade with the 16-bit address bus comment, unless you're not actually talking about Plan 9. The 8086 and 8088 were introduced with 20-bit addressing in 1978 and 1979 respectively. The IBM PC, launched in 1982, had its ROM at the top of that 1MByte space, so it couldn't have been constrained in that way. By the end of the 80s, all my schoolmates had 68k-powered computers from Commodore and Atari, showing hardware with a 24-bit address space was very much affordable and ubiquitous at the time Plan 9 development started. Almost all of them had 512KB at the time. A few flashy gits had 1MB machines. :) I still wish I'd kept the better of the Atari STs which made their way down to me -- a "1040 STE" -- 1MB with a better keyboard and ROM than the earlier "STFM" models. I remember wanting to try to run Plan 9 on it. Let's estimate how tight it would be... I think it would be terrible, because I got frustrated enough trying to run a 4e CPU server with graphics on a 2GB x86. I kept running out of image memory! The trouble was the draw device in 4th edition stores images in the same "image memory" the kernel loads programs into, and the 386 CPU kernel 'only' allocates 64MB of that. :) 1 bit per pixel would obviously improve matters by a factor of 16 compared to my setup, and 640x400 (Atari ST high resolution) would be another 5 times smaller than my screen. Putting these numbers together with my experience, you'd have to be careful to use images sparingly on a machine with 800KB free RAM after the kernel is loaded. That's better than I thought, probably achievable on that Atari I had, but it couldn't be used as intensively as I used Plan 9 back then. How could it be used? I think it would be a good idea to push the draw device back to user space and make very sure to have it check for failing malloc! I certainly wouldn't want a terminal with a filesystem and graphics all on a single 1MByte 64000-powered computer, because a filesystem on a terminal runs in user space, and thus requires some free memory to run the programs to shut it down. Actually, Plan 9's separation of terminal from filesystem seems quite the obvious choice when I look at it like this. :)
Re: [9fans] PDP11 (Was: Re: what heavy negativity!)
On Tue, Oct 9, 2018, at 4:28 AM, Lucio De Re wrote: > On 10/9/18, Bakul Shah wrote: > > One thing I have mused about is recasting plan9 as a > > microkernel and pushing out a lot of its kernel code into user > > mode code. > > > There are religious reasons not to go there I'm trying to forget all the religious beliefs I once held with regard to computers, but I've had these lines in my head for a long time, and probably won't get a better opportunity to post them: One day, Uriel met a man who explained very convincingly that the Plan 9 kernel is a microkernel. On another day, Uriel met a man who explained very convincingly that the Plan 9 kernel is a macrokernel. Uriel was enlightened. Based on a true story. ;) > You won't believe what kind of madnesses I need to deal with to > consume my few and short remaining years - I'm with Dan in cursing the > modern technological trends, but one of these days I'm going to lock > myself in someone's attic or basement (or a prison cell, if that's > what it takes, a monastery, whatever...) with my Galaxy S4 and a dated > Riff-box - is that really what this black object is called? - and > build an OS from the accumulated wisdom of the last forty years. It > will probably look more like MS-DOS, though! :-( I've started already, but I keep getting sidetracked by my need for entertainment, which often comes down to spending my energies on things which don't require such deep design work. I'm hoping it'll get easier as my health improves; I'm still too stressed too often. The trouble with this stress is I forget my goals, which are things I've learned from Plan 9 and other conclusions I've come to. -- Progress might have been all right once, but it has gone on too long -- Ogden Nash
Re: [9fans] PDP11 (Was: Re: what heavy negativity!)
we already have a lot of user filesystems. feel free to add other useful ones.
Re: [9fans] PDP11 (Was: Re: what heavy negativity!)
On 10/9/18, Bakul Shah wrote: > > One thing I have mused about is recasting plan9 as a > microkernel and pushing out a lot of its kernel code into user > mode code. It is already half way there -- it is basically a > mux for 9p calls, low level device drivers, VM support & some > process related code. Such a redesign can be made more secure > and more resilient. The kind of problems you mention are > easier to fix in user code. Different application domains may > have different needs which are better handled as optional user > mode components. > There are religious reasons not to go there and, perhaps not very widely advertised, Minix-3 already does that, although I confess that all my best efforts have not yet created the space for my own experimentation with it. You won't believe what kind of madnesses I need to deal with to consume my few and short remaining years - I'm with Dan in cursing the modern technological trends, but one of these days I'm going to lock myself in someone's attic or basement (or a prison cell, if that's what it takes, a monastery, whatever...) with my Galaxy S4 and a dated Riff-box - is that really what this black object is called? - and build an OS from the accumulated wisdom of the last forty years. It will probably look more like MS-DOS, though! :-( > Said another way, keep the good parts of the plan9 design and > reachitect/reimplement the kernel + essential drivers/usermode > daemons. This is unlikely to happen (without some serious > funding) but still fun to think about! If done, this would be > a more radical departure than Oberon-7 compared to Oberon but > in the same spirit. > Surely, the targets for experimentation should be the ubiquitous smart-mobile and the insane arithmetic power of GPUs? All neatly networked over SDLC (or HDLC: AoH, anyone, for persistent storage?). Lucio.
Re: [9fans] PDP11 (Was: Re: what heavy negativity!)
On Tue, 9 Oct 2018 at 10:07, Dan Cross wrote: > My guess is that there is no reason in principle that it could not fit >> comfortably into the constraints of a PDP11/70, but if the initial >> implementation was done targeting a machine with significantly more >> resources, it would be easy to make design decisions that would be entirely >> incompatible. >> > > I find this unlikely. > > The PDP-11, while a respectable machine for its day, required too many > tradeoffs to make it attractive as a development platform for a > next-generation research operating system in the late 1980s: be it > electrical power consumption vs computational oomph or dollar cost vs > available memory, the -11 had fallen from the attractive position it held a > decade prior. Perhaps slimming a plan9 kernel down sufficiently so that it > COULD run on a PDP-11 was possible in the early days, but I can't see any > reason one would have WANTED to do so: particularly as part of the impetus > behind plan9 was to exploit advances in contemporary hardware: lower-cost, > higher-performance, RISC-based multiprocessors; ubiquitous networking; > common high-resolution bitmapped graphical displays; even magneto-optical > storage (one bet that didn't pan out); etc. > If you mean that you find it unlikely that that development would have been done on a PDP11, then I agree, for the reasons you mentioned. Not sure that I can see why it wouldn't have been feasible, but I can see why it wouldn't have been desirable. I thought there might have been a chance of an early attempt to target the x86 because of its ubiquity and low cost - which could be useful for a networked operating system. And those were 16 bit address constrained in the early days. But its probably not an architecture you would choose to work with if you had a choice.. 68K is what I would have gone for.. > Certainly Richard Millar's comment suggests that might be the case. If it >> is heavily dependent on VM, then the necessary rewrite is likely to be >> substantial. >> > > As a demonstration project, getting a slimmed-down plan9 kernel to boot on > a PDP-11/70-class machine would be a nifty hack, but it would be quite a > tour de force and most likely the result would not be generally useful. I > think that, as has been suggested, the conceptual simplicity of plan9 > paradoxically means that resource utilization is higher than it might > otherwise be on either a more elaborate OR more constrained system (such as > one targeting e.g. the PDP-11). When you can afford not to care about a few > bytes here or a couple of cycles there and you're not obsessed with > scraping out the very last drop of performance, you can employ a simpler > (some might say 'naive') algorithm or data structure. > > I'm not sure how the kernel design has changed since the first release. >> The earliest version I have is the release I bought through Harcourt Brace >> back in 1995. But I won't be home till December so it will be a while >> before I can look at it, and probably won't have time to experiment before >> then in any case. >> > > The kernel evolved substantially over its life; something like doubling in > size. I remember vaguely having a discussion with Sape where he said he > felt it had grown bloated. That was probably close to 20 years ago now. > I guess kernel size wasn't a priority. I did a bit of searching back through the old papers, and whilst there is a lot of talk about lines of code and numbers of system calls, I didn't find any reference to kernel size or memory requirements. > For what it is worth, I don't think the embarrassment of riches presented >> to programmers by current hardware has tended to produce more elegant >> designs. If more resources resulted in elegance, Windows would be a thing >> of beauty. Perhaps Plan9 is an exception. It certainly excels in elegance >> and design simplicity, even if it does turn out to be more resource hungry >> than I imagined. I will admit that the evils of excessively constrained >> environments are generally worse in terms of coding elegance - especially >> when it leads to overlays and self modifying code. >> > > plan9 is breathtakingly elegant, but this is in no small part because as a > research system it had the luxury of simply ignoring many thorny problems > that would have marred that beauty but that the developers chose not to > tackle. Some of these problems have non-trivial domain complexity and, > while "modern" systems are far too complex by far, that doesn't mean that > all solutions can be recast as elegantly simple pearls in the plan9 style. > Whether we like those problems or not, they exist and real-world solutions > have to at least attempt to deal with them (I'm looking at you, web x.0 for > x >= 2...but curse you you aren't alone). > > PDP11's don't support virtual memory, so there doesn't seem any elegant >> way to overcome that fundamental limitation on size of a singe executable. >> > > No, they do: there is
Re: [9fans] PDP11 (Was: Re: what heavy negativity!)
On Mon, Oct 8, 2018, 17:15 Bakul Shah wrote: > On Mon, 08 Oct 2018 19:03:49 -0400 Dan Cross wrote: > > > > plan9 is breathtakingly elegant, but this is in no small part because as > a > > research system it had the luxury of simply ignoring many thorny problems > > that would have marred that beauty but that the developers chose not to > > tackle. Some of these problems have non-trivial domain complexity and, > > while "modern" systems are far too complex by far, that doesn't mean that > > all solutions can be recast as elegantly simple pearls in the plan9 > style. > > One thing I have mused about is recasting plan9 as a > microkernel and pushing out a lot of its kernel code into user > mode code. It is already half way there -- it is basically a > mux for 9p calls, low level device drivers, VM support & some > process related code. Such a redesign can be made more secure > and more resilient. The kind of problems you mention are > easier to fix in user code. Different application domains may > have different needs which are better handled as optional user > mode components. > > Said another way, keep the good parts of the plan9 design and > reachitect/reimplement the kernel + essential drivers/usermode > daemons. This is unlikely to happen (without some serious > funding) but still fun to think about! If done, this would be > a more radical departure than Oberon-7 compared to Oberon but > in the same spirit. > I've mused about that also. My problem has been finding the time. I think it would be a worthwhile project. Not entirely unrelated, I've been tinkering with seL4. >
Re: [9fans] PDP11 (Was: Re: what heavy negativity!)
On Mon, 08 Oct 2018 19:03:49 -0400 Dan Cross wrote: > > plan9 is breathtakingly elegant, but this is in no small part because as a > research system it had the luxury of simply ignoring many thorny problems > that would have marred that beauty but that the developers chose not to > tackle. Some of these problems have non-trivial domain complexity and, > while "modern" systems are far too complex by far, that doesn't mean that > all solutions can be recast as elegantly simple pearls in the plan9 style. One thing I have mused about is recasting plan9 as a microkernel and pushing out a lot of its kernel code into user mode code. It is already half way there -- it is basically a mux for 9p calls, low level device drivers, VM support & some process related code. Such a redesign can be made more secure and more resilient. The kind of problems you mention are easier to fix in user code. Different application domains may have different needs which are better handled as optional user mode components. Said another way, keep the good parts of the plan9 design and reachitect/reimplement the kernel + essential drivers/usermode daemons. This is unlikely to happen (without some serious funding) but still fun to think about! If done, this would be a more radical departure than Oberon-7 compared to Oberon but in the same spirit.
Re: [9fans] PDP11 (Was: Re: what heavy negativity!)
On Mon, Oct 8, 2018 at 6:25 PM Digby R.S. Tarvin wrote: > Does anyone know what platform Plan9 was initially implemented on? > My understanding is that the earliest experiments involved a VAX, but development quickly shifted to MIPS and 68020-based machines (the "gnot" was, IIRC, a 68020-based computer). My guess is that there is no reason in principle that it could not fit > comfortably into the constraints of a PDP11/70, but if the initial > implementation was done targeting a machine with significantly more > resources, it would be easy to make design decisions that would be entirely > incompatible. > I find this unlikely. The PDP-11, while a respectable machine for its day, required too many tradeoffs to make it attractive as a development platform for a next-generation research operating system in the late 1980s: be it electrical power consumption vs computational oomph or dollar cost vs available memory, the -11 had fallen from the attractive position it held a decade prior. Perhaps slimming a plan9 kernel down sufficiently so that it COULD run on a PDP-11 was possible in the early days, but I can't see any reason one would have WANTED to do so: particularly as part of the impetus behind plan9 was to exploit advances in contemporary hardware: lower-cost, higher-performance, RISC-based multiprocessors; ubiquitous networking; common high-resolution bitmapped graphical displays; even magneto-optical storage (one bet that didn't pan out); etc. Certainly Richard Millar's comment suggests that might be the case. If it > is heavily dependent on VM, then the necessary rewrite is likely to be > substantial. > As a demonstration project, getting a slimmed-down plan9 kernel to boot on a PDP-11/70-class machine would be a nifty hack, but it would be quite a tour de force and most likely the result would not be generally useful. I think that, as has been suggested, the conceptual simplicity of plan9 paradoxically means that resource utilization is higher than it might otherwise be on either a more elaborate OR more constrained system (such as one targeting e.g. the PDP-11). When you can afford not to care about a few bytes here or a couple of cycles there and you're not obsessed with scraping out the very last drop of performance, you can employ a simpler (some might say 'naive') algorithm or data structure. I'm not sure how the kernel design has changed since the first release. The > earliest version I have is the release I bought through Harcourt Brace back > in 1995. But I won't be home till December so it will be a while before I > can look at it, and probably won't have time to experiment before then in > any case. > The kernel evolved substantially over its life; something like doubling in size. I remember vaguely having a discussion with Sape where he said he felt it had grown bloated. That was probably close to 20 years ago now. For what it is worth, I don't think the embarrassment of riches presented > to programmers by current hardware has tended to produce more elegant > designs. If more resources resulted in elegance, Windows would be a thing > of beauty. Perhaps Plan9 is an exception. It certainly excels in elegance > and design simplicity, even if it does turn out to be more resource hungry > than I imagined. I will admit that the evils of excessively constrained > environments are generally worse in terms of coding elegance - especially > when it leads to overlays and self modifying code. > plan9 is breathtakingly elegant, but this is in no small part because as a research system it had the luxury of simply ignoring many thorny problems that would have marred that beauty but that the developers chose not to tackle. Some of these problems have non-trivial domain complexity and, while "modern" systems are far too complex by far, that doesn't mean that all solutions can be recast as elegantly simple pearls in the plan9 style. Whether we like those problems or not, they exist and real-world solutions have to at least attempt to deal with them (I'm looking at you, web x.0 for x >= 2...but curse you you aren't alone). PDP11's don't support virtual memory, so there doesn't seem any elegant way > to overcome that fundamental limitation on size of a singe executable. > No, they do: there is paging hardware on the PDP-11 that's used for address translation and memory protection (recall that PDP-11 kept the kernel at the top of the address space, the per-process "user" structure is at a fixed virtual address, and the system could trap a bus error and kill a misbehaving user-space process). What they may not support is the sort of trap handling that would let them recover from a page fault (though I haven't looked) and in any case, the address space is too small to make demand-paging with reclamation cost-effective. > So I don't think it i would be worth a substantial rewrite to get it > going. It is a shame that there don't seem to have been any more powerful > machines with a comparably
Re: [9fans] PDP11 (Was: Re: what heavy negativity!)
Does anyone know what platform Plan9 was initially implemented on? My guess is that there is no reason in principle that it could not fit comfortably into the constraints of a PDP11/70, but if the initial implementation was done targeting a machine with significantly more resources, it would be easy to make design decisions that would be entirely incompatible. Certainly Richard Millar's comment suggests that might be the case. If it is heavily dependent on VM, then the necessary rewrite is likely to be substantial. I'm not sure how the kernel design has changed since the first release. The earliest version I have is the release I bought through Harcourt Brace back in 1995. But I won't be home till December so it will be a while before I can look at it, and probably won't have time to experiment before then in any case. For what it is worth, I don't think the embarrassment of riches presented to programmers by current hardware has tended to produce more elegant designs. If more resources resulted in elegance, Windows would be a thing of beauty. Perhaps Plan9 is an exception. It certainly excels in elegance and design simplicity, even if it does turn out to be more resource hungry than I imagined. I will admit that the evils of excessively constrained environments are generally worse in terms of coding elegance - especially when it leads to overlays and self modifying code. PDP11's don't support virtual memory, so there doesn't seem any elegant way to overcome that fundamental limitation on size of a singe executable. So I don't think it i would be worth a substantial rewrite to get it going. It is a shame that there don't seem to have been any more powerful machines with a comparably elegant architecture and attractive front panel :) It is sounding like Inferno is going to be the more practical option. I believe gcc can still generate PDP-11 code, so it shouldn't be too hard to try. DigbyT On Tue, 9 Oct 2018 at 04:53, hiro <23h...@gmail.com> wrote: > i should have said could, not can :) > >
Re: [9fans] PDP11 (Was: Re: what heavy negativity!)
i should have said could, not can :)
Re: [9fans] PDP11 (Was: Re: what heavy negativity!)
Ideally, anyway. On Mon, 8 Oct 2018 at 11:20, hiro <23h...@gmail.com> wrote: > saving every bit of memory has costs in coding, the pressure wasn't as > strong any more. > the earned flexibility can be used for more elegant design. > >
Re: [9fans] PDP11 (Was: Re: what heavy negativity!)
I quite agree - the PDP 11/70 was quite a high end 16 bit machine, but it was the machine that I was talking about and the one I would most like to revisit (although I wouldn't turn down an 11/40 if somebody offered me a working one). I don't think I would contemplate putting Plan9 on a machine with no MMU or a 64K physical memory limit. My first reasonable multi-user, multi-tasking computer system (back in the early 80s) was home made 6809 machine with 6829 MMU and eventually 1MB of ram, running OS-9/6809. It initially ran with 64K for programs and and the rest of memory was a big ram disk - because what else could you do with such a ridiculous amount of memory. It did pretty well at providing a personal Unix like environment, although counldn't reproduce the fork() semantics and there was no memory protection, and the memory contraints meant always running the C compiler one pass at a time.. But we eventually ported 'Level 2' OS-9 which could use a mapping ram/MMU, and with that I had a quite robust multi-user system, with up to 64K available per process, and 64K available for the kernel. I was able to get most Unix programs running on it (except for a few with big tables that compiled to larger than 64K) and no longer had to worry about exiting the editor before doing a compile. Most of the core system utilities were written in assembly language - so the equivalent of 'ls' for example, required no more than a 256 byte memory allocation. And all executables were loaded read-only and re-entrant (shared text) which helped. The only real Achilles heal was the 6809 had no illegal instruction trapping, so executing data could occasionally result in an unrecoverable freeze.. I never liked the 68K version os OS-9 quite as much. Because of the larger address space it used the MMU for protection only, with no address translation - so the kernel was mapped into the same address space as the user programs but just not accessible in user mode. It just didn't seem as elegant. Anyway, thats why I don't see 64K per process as necessarily being inadequate for a lean operating system, although it would be easy enough to write extravagant code that would not run in 64K, or a design that relied on a large virtual address space - especially if you were used to relying on virtual memory. I just don't know if how small Plan9 can go, and unless someone has already explored those limits, I suppose rather than speculating i'll just have to plan on a little experimentation when I get a bit of spare time. Regards, Digby On Mon, 8 Oct 2018 at 19:13, Nils M Holm wrote: > On 2018-10-08T15:29:02+1100, Digby R.S. Tarvin wrote: > > A native Inferno port would certainly be a lot easier, but I think you > > might be a bit pessimistic about would can fit into a 64K address space > > machine. The 11/70 certainly managed to run a very respectable V7 Unix > > supporting 20-30 simultaneous active users in its day, [...] > > The 11/70 was a completely different beast than, say, an 11/03. > The 70 had a backplane with 22 address lines, a MMU, and up to > 4M bytes of memory. So while its processes were limited to > 64K+64K bytes, I would not consider it to be a typical 16-bit > machine. > > -- > Nils M Holm < n m h @ t 3 x . o r g > www.t3x.org > >
Re: [9fans] PDP11 (Was: Re: what heavy negativity!)
On 2018-10-08T15:29:02+1100, Digby R.S. Tarvin wrote: > A native Inferno port would certainly be a lot easier, but I think you > might be a bit pessimistic about would can fit into a 64K address space > machine. The 11/70 certainly managed to run a very respectable V7 Unix > supporting 20-30 simultaneous active users in its day, [...] The 11/70 was a completely different beast than, say, an 11/03. The 70 had a backplane with 22 address lines, a MMU, and up to 4M bytes of memory. So while its processes were limited to 64K+64K bytes, I would not consider it to be a typical 16-bit machine. -- Nils M Holm < n m h @ t 3 x . o r g > www.t3x.org
Re: [9fans] PDP11 (Was: Re: what heavy negativity!)
On 2018-10-08T05:38:07+0200, Lucio De Re wrote: > You really must be thinking of Inferno, native, running in a host with > 1MiB of memory. 64KiB isn't enough for anything other than maybe CPM. > Even MPM won't cut it, I don't think. There were serveral UNIX 6th Edition-based "Mini Unix" variants for the PDP-11/03 and other 16-bit systems. Then there is UZI, the Unix Z80 Implementation, which can run multiple processes (with swapping) in 64K bytes of RAM. CP/M ran in much less than 64KB. -- Nils M Holm < n m h @ t 3 x . o r g > www.t3x.org
Re: [9fans] PDP11 (Was: Re: what heavy negativity!)
saving every bit of memory has costs in coding, the pressure wasn't as strong any more. the earned flexibility can be used for more elegant design.
Re: [9fans] PDP11 (Was: Re: what heavy negativity!)
A native Inferno port would certainly be a lot easier, but I think you might be a bit pessimistic about would can fit into a 64K address space machine. The 11/70 certainly managed to run a very respectable V7 Unix supporting 20-30 simultaneous active users in its day, and I wouldn't have thought plan 9 arriving about a decade later, would have been hugely bigger than V7 Unix. I recall a demo of Plan9 (I think it also included the source) being given by Rob Pike at UNSW which he carried on a 1.44Mb floppy disc. By its open source release in 2002 the distribution was 65MB The smallest Linux system I have used recently had 256K RAM and 512K flash. A rather stripped down busybox based system, but it did include a full TCP/IP stack and a web server. Thats comparable to a PDP11 except for the limitation on the largest individual process. Bear in mind that 16 bit executables are smaller, and whilst the 11/70 had a 64Kb address space, physical memory could be somewhat larger, and an individual process could have 128K of memory is using separate instruction and data space. I am used to thinking of Plan9 as very compact, but I havn't really looked to see if it has grown much since the 80s, and perhaps it is only next to the astronomical expansion of other systems that it still looks small. It would be an interesting exercise to find out. It would be an interesting thing to try, if only to get a better feel for how compact Plan9 actually is ... DigbyT On Mon, 8 Oct 2018 at 14:38, Lucio De Re wrote: > On 10/8/18, Digby R.S. Tarvin wrote: > > > > So the question is... is plan9 still lean and mean enough to fit onto a > > machine with a 64K address space? Doing a port would certainly provide > > plenty of opportunity to tinker with the lights and switches on front > > panel, and if it the port was initially limited to being a CPU server, > > there would be no need to worry about displays and mass storage just > > the compiler back end and low level kernel support. > > > You really must be thinking of Inferno, native, running in a host with > 1MiB of memory. 64KiB isn't enough for anything other than maybe CPM. > Even MPM won't cut it, I don't think. > > Lucio. >