Re: [9fans] PDP11 (Was: Re: what heavy negativity!)

2018-10-14 Thread hiro
also read what has been written before about fcp. and read the source of fcp.

On 10/14/18, Ole-Hjalmar Kristensen  wrote:
> OK, that makes sense. So it would not stop a client from for example first
> read an index block in a B-tree, wait for the result, and then issue read
> operations for all the data blocks in parallel. That's exactly the same as
> any asynchronous disk subsystem I am acquainted with. Reordering is the
> norm.
>
> On Sun, Oct 14, 2018 at 1:21 PM hiro <23h...@gmail.com> wrote:
>
>> there's no tyranny involved.
>>
>> a client that is fine with the *responses* coming in reordered could
>> remember the tag obviously and do whatever you imagine.
>>
>> the problem is potential reordering of the messages in the kernel
>> before responding, even if the 9p transport has guaranteed ordering.
>>
>> On 10/14/18, Ole-Hjalmar Kristensen 
>> wrote:
>> > I'm not going to argue with someone who has got his hands dirty by
>> actually
>> > doing this but I don't really get this about the tyranny of 9p. Isn't
>> > the
>> > point of the tag field to identify the request? What is stopping the
>> client
>> > from issuing multiple requests and match the replies based on the tag?
>> From
>> > the manual:
>> >
>> > Each T-message has a tag field, chosen and used by the
>> >   client to identify the message.  The reply to the message
>> >   will have the same tag.  Clients must arrange that no two
>> >   outstanding messages on the same connection have the same
>> >   tag.  An exception is the tag NOTAG, defined as (ushort)~0
>> >   in : the client can use it, when establishing a
>> >   connection, to override tag matching in version messages.
>> >
>> >
>> >
>> > Den ons. 10. okt. 2018, 23.56 skrev Steven Stallion
>> > > >:
>> >
>> >> As the guy who wrote the majority of the code that pushed those 1M 4K
>> >> random IOPS erik mentioned, this thread annoys the shit out of me. You
>> >> don't get an award for writing a driver. In fact, it's probably better
>> >> not to be known at all considering the bloody murder one has to commit
>> >> to marry hardware and software together.
>> >>
>> >> Let's be frank, the I/O handling in the kernel is anachronistic. To
>> >> hit those rates, I had to add support for asynchronous and vectored
>> >> I/O not to mention a sizable bit of work by a co-worker to properly
>> >> handle NUMA on our appliances to hit those speeds. As I recall, we had
>> >> to rewrite the scheduler and re-implement locking, which even Charles
>> >> Forsyth had a hand in. Had we the time and resources to implement
>> >> something like zero-copy we'd have done it in a heartbeat.
>> >>
>> >> In the end, it doesn't matter how "fast" a storage driver is in Plan 9
>> >> - as soon as you put a 9P-based filesystem on it, it's going to be
>> >> limited to a single outstanding operation. This is the tyranny of 9P.
>> >> We (Coraid) got around this by avoiding filesystems altogether.
>> >>
>> >> Go solve that problem first.
>> >> On Wed, Oct 10, 2018 at 12:36 PM  wrote:
>> >> >
>> >> > > But the reason I want this is to reduce latency to the first
>> >> > > access, especially for very large files. With read() I have
>> >> > > to wait until the read completes. With mmap() processing can
>> >> > > start much earlier and can be interleaved with background
>> >> > > data fetch or prefetch. With read() a lot more resources
>> >> > > are tied down. If I need random access and don't need to
>> >> > > read all of the data, the application has to do pread(),
>> >> > > pwrite() a lot thus complicating it. With mmap() I can just
>> >> > > map in the whole file and excess reading (beyond what the
>> >> > > app needs) will not be a large fraction.
>> >> >
>> >> > you think doing single 4K page sized reads in the pagefault
>> >> > handler is better than doing precise >4K reads from your
>> >> > application? possibly in a background thread so you can
>> >> > overlap processing with data fetching?
>> >> >
>> >> > the advantage of mmap is not prefetch. its about not to do
>> >> > any I/O when data is already in the *SHARED* buffer cache!
>> >> > which plan9 does not have (except the mntcache, but that is
>> >> > optional and only works for the disk fileservers that maintain
>> >> > ther file qid ver info consistently). its *IS* really a linux
>> >> > thing where all block device i/o goes thru the buffer cache.
>> >> >
>> >> > --
>> >> > cinap
>> >> >
>> >>
>> >>
>> >
>>
>>
>



Re: [9fans] PDP11 (Was: Re: what heavy negativity!)

2018-10-14 Thread Ole-Hjalmar Kristensen
OK, that makes sense. So it would not stop a client from for example first
read an index block in a B-tree, wait for the result, and then issue read
operations for all the data blocks in parallel. That's exactly the same as
any asynchronous disk subsystem I am acquainted with. Reordering is the
norm.

On Sun, Oct 14, 2018 at 1:21 PM hiro <23h...@gmail.com> wrote:

> there's no tyranny involved.
>
> a client that is fine with the *responses* coming in reordered could
> remember the tag obviously and do whatever you imagine.
>
> the problem is potential reordering of the messages in the kernel
> before responding, even if the 9p transport has guaranteed ordering.
>
> On 10/14/18, Ole-Hjalmar Kristensen 
> wrote:
> > I'm not going to argue with someone who has got his hands dirty by
> actually
> > doing this but I don't really get this about the tyranny of 9p. Isn't the
> > point of the tag field to identify the request? What is stopping the
> client
> > from issuing multiple requests and match the replies based on the tag?
> From
> > the manual:
> >
> > Each T-message has a tag field, chosen and used by the
> >   client to identify the message.  The reply to the message
> >   will have the same tag.  Clients must arrange that no two
> >   outstanding messages on the same connection have the same
> >   tag.  An exception is the tag NOTAG, defined as (ushort)~0
> >   in : the client can use it, when establishing a
> >   connection, to override tag matching in version messages.
> >
> >
> >
> > Den ons. 10. okt. 2018, 23.56 skrev Steven Stallion  >:
> >
> >> As the guy who wrote the majority of the code that pushed those 1M 4K
> >> random IOPS erik mentioned, this thread annoys the shit out of me. You
> >> don't get an award for writing a driver. In fact, it's probably better
> >> not to be known at all considering the bloody murder one has to commit
> >> to marry hardware and software together.
> >>
> >> Let's be frank, the I/O handling in the kernel is anachronistic. To
> >> hit those rates, I had to add support for asynchronous and vectored
> >> I/O not to mention a sizable bit of work by a co-worker to properly
> >> handle NUMA on our appliances to hit those speeds. As I recall, we had
> >> to rewrite the scheduler and re-implement locking, which even Charles
> >> Forsyth had a hand in. Had we the time and resources to implement
> >> something like zero-copy we'd have done it in a heartbeat.
> >>
> >> In the end, it doesn't matter how "fast" a storage driver is in Plan 9
> >> - as soon as you put a 9P-based filesystem on it, it's going to be
> >> limited to a single outstanding operation. This is the tyranny of 9P.
> >> We (Coraid) got around this by avoiding filesystems altogether.
> >>
> >> Go solve that problem first.
> >> On Wed, Oct 10, 2018 at 12:36 PM  wrote:
> >> >
> >> > > But the reason I want this is to reduce latency to the first
> >> > > access, especially for very large files. With read() I have
> >> > > to wait until the read completes. With mmap() processing can
> >> > > start much earlier and can be interleaved with background
> >> > > data fetch or prefetch. With read() a lot more resources
> >> > > are tied down. If I need random access and don't need to
> >> > > read all of the data, the application has to do pread(),
> >> > > pwrite() a lot thus complicating it. With mmap() I can just
> >> > > map in the whole file and excess reading (beyond what the
> >> > > app needs) will not be a large fraction.
> >> >
> >> > you think doing single 4K page sized reads in the pagefault
> >> > handler is better than doing precise >4K reads from your
> >> > application? possibly in a background thread so you can
> >> > overlap processing with data fetching?
> >> >
> >> > the advantage of mmap is not prefetch. its about not to do
> >> > any I/O when data is already in the *SHARED* buffer cache!
> >> > which plan9 does not have (except the mntcache, but that is
> >> > optional and only works for the disk fileservers that maintain
> >> > ther file qid ver info consistently). its *IS* really a linux
> >> > thing where all block device i/o goes thru the buffer cache.
> >> >
> >> > --
> >> > cinap
> >> >
> >>
> >>
> >
>
>


Re: [9fans] PDP11 (Was: Re: what heavy negativity!)

2018-10-14 Thread hiro
there's no tyranny involved.

a client that is fine with the *responses* coming in reordered could
remember the tag obviously and do whatever you imagine.

the problem is potential reordering of the messages in the kernel
before responding, even if the 9p transport has guaranteed ordering.

On 10/14/18, Ole-Hjalmar Kristensen  wrote:
> I'm not going to argue with someone who has got his hands dirty by actually
> doing this but I don't really get this about the tyranny of 9p. Isn't the
> point of the tag field to identify the request? What is stopping the client
> from issuing multiple requests and match the replies based on the tag? From
> the manual:
>
> Each T-message has a tag field, chosen and used by the
>   client to identify the message.  The reply to the message
>   will have the same tag.  Clients must arrange that no two
>   outstanding messages on the same connection have the same
>   tag.  An exception is the tag NOTAG, defined as (ushort)~0
>   in : the client can use it, when establishing a
>   connection, to override tag matching in version messages.
>
>
>
> Den ons. 10. okt. 2018, 23.56 skrev Steven Stallion :
>
>> As the guy who wrote the majority of the code that pushed those 1M 4K
>> random IOPS erik mentioned, this thread annoys the shit out of me. You
>> don't get an award for writing a driver. In fact, it's probably better
>> not to be known at all considering the bloody murder one has to commit
>> to marry hardware and software together.
>>
>> Let's be frank, the I/O handling in the kernel is anachronistic. To
>> hit those rates, I had to add support for asynchronous and vectored
>> I/O not to mention a sizable bit of work by a co-worker to properly
>> handle NUMA on our appliances to hit those speeds. As I recall, we had
>> to rewrite the scheduler and re-implement locking, which even Charles
>> Forsyth had a hand in. Had we the time and resources to implement
>> something like zero-copy we'd have done it in a heartbeat.
>>
>> In the end, it doesn't matter how "fast" a storage driver is in Plan 9
>> - as soon as you put a 9P-based filesystem on it, it's going to be
>> limited to a single outstanding operation. This is the tyranny of 9P.
>> We (Coraid) got around this by avoiding filesystems altogether.
>>
>> Go solve that problem first.
>> On Wed, Oct 10, 2018 at 12:36 PM  wrote:
>> >
>> > > But the reason I want this is to reduce latency to the first
>> > > access, especially for very large files. With read() I have
>> > > to wait until the read completes. With mmap() processing can
>> > > start much earlier and can be interleaved with background
>> > > data fetch or prefetch. With read() a lot more resources
>> > > are tied down. If I need random access and don't need to
>> > > read all of the data, the application has to do pread(),
>> > > pwrite() a lot thus complicating it. With mmap() I can just
>> > > map in the whole file and excess reading (beyond what the
>> > > app needs) will not be a large fraction.
>> >
>> > you think doing single 4K page sized reads in the pagefault
>> > handler is better than doing precise >4K reads from your
>> > application? possibly in a background thread so you can
>> > overlap processing with data fetching?
>> >
>> > the advantage of mmap is not prefetch. its about not to do
>> > any I/O when data is already in the *SHARED* buffer cache!
>> > which plan9 does not have (except the mntcache, but that is
>> > optional and only works for the disk fileservers that maintain
>> > ther file qid ver info consistently). its *IS* really a linux
>> > thing where all block device i/o goes thru the buffer cache.
>> >
>> > --
>> > cinap
>> >
>>
>>
>



Re: [9fans] PDP11 (Was: Re: what heavy negativity!)

2018-10-14 Thread Ole-Hjalmar Kristensen
I'm not going to argue with someone who has got his hands dirty by actually
doing this but I don't really get this about the tyranny of 9p. Isn't the
point of the tag field to identify the request? What is stopping the client
from issuing multiple requests and match the replies based on the tag? From
the manual:

Each T-message has a tag field, chosen and used by the
  client to identify the message.  The reply to the message
  will have the same tag.  Clients must arrange that no two
  outstanding messages on the same connection have the same
  tag.  An exception is the tag NOTAG, defined as (ushort)~0
  in : the client can use it, when establishing a
  connection, to override tag matching in version messages.



Den ons. 10. okt. 2018, 23.56 skrev Steven Stallion :

> As the guy who wrote the majority of the code that pushed those 1M 4K
> random IOPS erik mentioned, this thread annoys the shit out of me. You
> don't get an award for writing a driver. In fact, it's probably better
> not to be known at all considering the bloody murder one has to commit
> to marry hardware and software together.
>
> Let's be frank, the I/O handling in the kernel is anachronistic. To
> hit those rates, I had to add support for asynchronous and vectored
> I/O not to mention a sizable bit of work by a co-worker to properly
> handle NUMA on our appliances to hit those speeds. As I recall, we had
> to rewrite the scheduler and re-implement locking, which even Charles
> Forsyth had a hand in. Had we the time and resources to implement
> something like zero-copy we'd have done it in a heartbeat.
>
> In the end, it doesn't matter how "fast" a storage driver is in Plan 9
> - as soon as you put a 9P-based filesystem on it, it's going to be
> limited to a single outstanding operation. This is the tyranny of 9P.
> We (Coraid) got around this by avoiding filesystems altogether.
>
> Go solve that problem first.
> On Wed, Oct 10, 2018 at 12:36 PM  wrote:
> >
> > > But the reason I want this is to reduce latency to the first
> > > access, especially for very large files. With read() I have
> > > to wait until the read completes. With mmap() processing can
> > > start much earlier and can be interleaved with background
> > > data fetch or prefetch. With read() a lot more resources
> > > are tied down. If I need random access and don't need to
> > > read all of the data, the application has to do pread(),
> > > pwrite() a lot thus complicating it. With mmap() I can just
> > > map in the whole file and excess reading (beyond what the
> > > app needs) will not be a large fraction.
> >
> > you think doing single 4K page sized reads in the pagefault
> > handler is better than doing precise >4K reads from your
> > application? possibly in a background thread so you can
> > overlap processing with data fetching?
> >
> > the advantage of mmap is not prefetch. its about not to do
> > any I/O when data is already in the *SHARED* buffer cache!
> > which plan9 does not have (except the mntcache, but that is
> > optional and only works for the disk fileservers that maintain
> > ther file qid ver info consistently). its *IS* really a linux
> > thing where all block device i/o goes thru the buffer cache.
> >
> > --
> > cinap
> >
>
>


Re: [9fans] zero copy & 9p (was Re: PDP11 (Was: Re: what heavy negativity!)

2018-10-14 Thread hiro
thanks, this will allow us to know where to look more closely.

On 10/14/18, Francisco J Ballesteros  wrote:
> Pure "producer/cosumer" stuff, like sending things through a pipe as long as
> the source didn't need to touch the data ever more.
> Regarding bugs, I meant "producing bugs" not "fixing bugs", btw.
>
>> On 14 Oct 2018, at 09:34, hiro <23h...@gmail.com> wrote:
>>
>> well, finding bugs is always good :)
>> but since i got curious could you also tell which things exactly got
>> much faster, so that we know what might be possible?
>>
>> On 10/14/18, FJ Ballesteros  wrote:
>>> yes. bugs, on my side at least.
>>> The copy isolates from others.
>>> But some experiments in nix and in a thing I wrote for leanxcale show
>>> that
>>> some things can be much faster.
>>> It’s fun either way.
>>>
 El 13 oct 2018, a las 23:11, hiro <23h...@gmail.com> escribió:

 and, did it improve anything noticeably?

> On 10/13/18, Charles Forsyth  wrote:
> I did several versions of one part of zero copy, inspired by several
> things
> in x-kernel, replacing Blocks by another structure throughout the
> network
> stacks and kernel, then made messages visible to user level. Nemo did
> another part, on his way to Clive
>
>> On Fri, 12 Oct 2018, 07:05 Ori Bernstein,  wrote:
>>
>> On Thu, 11 Oct 2018 13:43:00 -0700, Lyndon Nerenberg
>> 
>> wrote:
>>
>>> Another case to ponder ...   We're handling the incoming I/Q data
>>> stream, but need to fan that out to many downstream consumers.  If
>>> we already read the data into a page, then flip it to the first
>>> consumer, is there a benefit to adding a reference counter to that
>>> read-only page and leaving the page live until the counter expires?
>>>
>>> Hiro clamours for benchmarks.  I agree.  Some basic searches I've
>>> done don't show anyone trying this out with P9 (and publishing
>>> their results).  Anybody have hints/references to prior work?
>>>
>>> --lyndon
>>>
>>
>> I don't believe anyone has done the work yet. I'd be interested
>> to see what you come up with.
>>
>>
>> --
>>   Ori Bernstein
>>
>>
>

>>>
>>>
>>>
>>
>
>
>



Re: [9fans] zero copy & 9p (was Re: PDP11 (Was: Re: what heavy negativity!)

2018-10-14 Thread Francisco J Ballesteros
Pure "producer/cosumer" stuff, like sending things through a pipe as long as 
the source didn't need to touch the data ever more.
Regarding bugs, I meant "producing bugs" not "fixing bugs", btw.

> On 14 Oct 2018, at 09:34, hiro <23h...@gmail.com> wrote:
> 
> well, finding bugs is always good :)
> but since i got curious could you also tell which things exactly got
> much faster, so that we know what might be possible?
> 
> On 10/14/18, FJ Ballesteros  wrote:
>> yes. bugs, on my side at least.
>> The copy isolates from others.
>> But some experiments in nix and in a thing I wrote for leanxcale show that
>> some things can be much faster.
>> It’s fun either way.
>> 
>>> El 13 oct 2018, a las 23:11, hiro <23h...@gmail.com> escribió:
>>> 
>>> and, did it improve anything noticeably?
>>> 
 On 10/13/18, Charles Forsyth  wrote:
 I did several versions of one part of zero copy, inspired by several
 things
 in x-kernel, replacing Blocks by another structure throughout the
 network
 stacks and kernel, then made messages visible to user level. Nemo did
 another part, on his way to Clive
 
> On Fri, 12 Oct 2018, 07:05 Ori Bernstein,  wrote:
> 
> On Thu, 11 Oct 2018 13:43:00 -0700, Lyndon Nerenberg
> 
> wrote:
> 
>> Another case to ponder ...   We're handling the incoming I/Q data
>> stream, but need to fan that out to many downstream consumers.  If
>> we already read the data into a page, then flip it to the first
>> consumer, is there a benefit to adding a reference counter to that
>> read-only page and leaving the page live until the counter expires?
>> 
>> Hiro clamours for benchmarks.  I agree.  Some basic searches I've
>> done don't show anyone trying this out with P9 (and publishing
>> their results).  Anybody have hints/references to prior work?
>> 
>> --lyndon
>> 
> 
> I don't believe anyone has done the work yet. I'd be interested
> to see what you come up with.
> 
> 
> --
>   Ori Bernstein
> 
> 
 
>>> 
>> 
>> 
>> 
> 




Re: [9fans] zero copy & 9p (was Re: PDP11 (Was: Re: what heavy negativity!)

2018-10-14 Thread hiro
well, finding bugs is always good :)
but since i got curious could you also tell which things exactly got
much faster, so that we know what might be possible?

On 10/14/18, FJ Ballesteros  wrote:
> yes. bugs, on my side at least.
> The copy isolates from others.
> But some experiments in nix and in a thing I wrote for leanxcale show that
> some things can be much faster.
> It’s fun either way.
>
>> El 13 oct 2018, a las 23:11, hiro <23h...@gmail.com> escribió:
>>
>> and, did it improve anything noticeably?
>>
>>> On 10/13/18, Charles Forsyth  wrote:
>>> I did several versions of one part of zero copy, inspired by several
>>> things
>>> in x-kernel, replacing Blocks by another structure throughout the
>>> network
>>> stacks and kernel, then made messages visible to user level. Nemo did
>>> another part, on his way to Clive
>>>
 On Fri, 12 Oct 2018, 07:05 Ori Bernstein,  wrote:

 On Thu, 11 Oct 2018 13:43:00 -0700, Lyndon Nerenberg
 
 wrote:

> Another case to ponder ...   We're handling the incoming I/Q data
> stream, but need to fan that out to many downstream consumers.  If
> we already read the data into a page, then flip it to the first
> consumer, is there a benefit to adding a reference counter to that
> read-only page and leaving the page live until the counter expires?
>
> Hiro clamours for benchmarks.  I agree.  Some basic searches I've
> done don't show anyone trying this out with P9 (and publishing
> their results).  Anybody have hints/references to prior work?
>
> --lyndon
>

 I don't believe anyone has done the work yet. I'd be interested
 to see what you come up with.


 --
Ori Bernstein


>>>
>>
>
>
>