Re: RFC: New userspace fetch/store API

2019-02-25 Thread Jason Thorpe



> On Feb 24, 2019, at 3:48 PM, Mouse  wrote:
> 
> C does not have a `pointer' type; what it has is a `pointer to
> OTHER-TYPE' for each type OTHER-TYPE.
> 
> On all NetBSD ports I'm aware of, this is a distinction without a
> difference, since all memory is the same and all pointers are just
> memory addresses.  Whether this is something to care about depends on
> how much you are willing to depend on the compiler letting you get away
> with things that C does not, strictly, promise.

There are a lot of places in the NetBSD code base that make the assumption that 
“void *” is a substitute for a pointer to any other type, and that any pointer 
type can be converted to any number of integer types, including, but not 
limited to, “intptr_t”, “uintptr_t”, and “vaddr_t”.

So, when I say “MI”, I’m referring to the NetBSD notion of “MI”, which includes 
a number of baked-in assumptions about implementation.

> [still thorpej, but another message]
>> If weâ??re concerned about portability of the things using this API,
>> then we simply specify the alignment to be sizeof(type).  That check
>> is straight-forward in MI C.
> 
> Not really.  There is no MI way to check the alignment of a pointer.
> Pointers cannot be arguments to & or %, and there is nothing MI about
> converting a pointer to an integer.
> 
> Again, on all NetBSD ports I'm aware of, the situation is nicer; it
> would be reasonable to say that it's straightforward in MI NetBSD code,
> but it's not really MI C.

See above.

-- thorpej



Re: RFC: New userspace fetch/store API

2019-02-25 Thread Jason Thorpe



> On Feb 25, 2019, at 9:18 AM, Andrew Cagney  wrote:
> 
> So the out-of-band error check becomes the slow memory write?
> I don't see it as a problem as the data would presumably be written to
> the write-back cached's stack (besides, if the function is short, LTO
> will eliminate it).
> 
> FWIW, I suspect the most "efficient" way to return the value+error as
> a struct - ABIs would return the pair in registers or, failing that,
> pass a single stack address - but I don't think this interface should
> go there.

Agreed, this is way off in the weeds of “premature optimization”.  Besides, why 
make this function “weird” compared to everything else that just returns an 
error code in a normal way?

-- thorpej



Re: RFC: New userspace fetch/store API

2019-02-25 Thread Eduardo Horvath
On Mon, 25 Feb 2019, Andrew Cagney wrote:

> On Mon, 25 Feb 2019 at 11:35, Eduardo Horvath  wrote:
> >
> > On Sat, 23 Feb 2019, Jason Thorpe wrote:
> >
> > > int ufetch_8(const uint8_t *uaddr, uint8_t *valp);
> > > int ufetch_16(const uint16_t *uaddr, uint16_t *valp);
> > > int ufetch_32(const uint32_t *uaddr, uint32_t *valp);
> > > #ifdef _LP64
> > > int ufetch_64(const uint64_t *uaddr, uint64_t *valp);
> > > #endif
> >
> > etc.
> >
> > I'd prefer to return the fetched value and have an out of band error
> > check.
> >
> > With this API the routine does a userland load, then a store into kernel
> > memory.  Then the kernel needs to reload that value.  On modern CPUs
> > memory accesses tend to be sloow.
> 
> So the out-of-band error check becomes the slow memory write?

Even cached accesses are slower than register accesses.  And the compiler 
is limited in what it can do to reorder the instruction stream while 
maintaining C language semantics.

> I don't see it as a problem as the data would presumably be written to
> the write-back cached's stack (besides, if the function is short, LTO
> will eliminate it).

The compiler can eliminate the function but need to keep the defined C 
language synchronization points so the memory accesses limit the ability 
of the compiler to properly optimize the code.  

Plus, there are a number of applications where you may want to do a 
series of userland fetches in a row and operate on them.  In this case it 
would be much easier to do the series and only check for success when 
you're done.

And all the existing code expects the value to be returned by the function 
not in one of the parameters, so existing code would require less 
rewriting.

I'd do something like:

uint64_t ufetch_64(const uint64_t *uaddr, int *errp);

where *errp needs to be initialized to zero and is set on fault so you can 
do:

int err = 0;
long hisflags = ufetch_64(flag1p, ) | ufetch_64(flag2p, );

if (err) return EFAULT;

do_something(hisflags);

Eduardo



Re: RFC: New userspace fetch/store API

2019-02-25 Thread Andrew Cagney
On Mon, 25 Feb 2019 at 11:35, Eduardo Horvath  wrote:
>
> On Sat, 23 Feb 2019, Jason Thorpe wrote:
>
> > int ufetch_8(const uint8_t *uaddr, uint8_t *valp);
> > int ufetch_16(const uint16_t *uaddr, uint16_t *valp);
> > int ufetch_32(const uint32_t *uaddr, uint32_t *valp);
> > #ifdef _LP64
> > int ufetch_64(const uint64_t *uaddr, uint64_t *valp);
> > #endif
>
> etc.
>
> I'd prefer to return the fetched value and have an out of band error
> check.
>
> With this API the routine does a userland load, then a store into kernel
> memory.  Then the kernel needs to reload that value.  On modern CPUs
> memory accesses tend to be sloow.

So the out-of-band error check becomes the slow memory write?
I don't see it as a problem as the data would presumably be written to
the write-back cached's stack (besides, if the function is short, LTO
will eliminate it).

FWIW, I suspect the most "efficient" way to return the value+error as
a struct - ABIs would return the pair in registers or, failing that,
pass a single stack address - but I don't think this interface should
go there.


Re: RFC: New userspace fetch/store API

2019-02-25 Thread Eduardo Horvath
On Sat, 23 Feb 2019, Jason Thorpe wrote:

> int ufetch_8(const uint8_t *uaddr, uint8_t *valp);
> int ufetch_16(const uint16_t *uaddr, uint16_t *valp);
> int ufetch_32(const uint32_t *uaddr, uint32_t *valp);
> #ifdef _LP64
> int ufetch_64(const uint64_t *uaddr, uint64_t *valp);
> #endif

etc.

I'd prefer to return the fetched value and have an out of band error 
check.  

With this API the routine does a userland load, then a store into kernel 
memory.  Then the kernel needs to reload that value.  On modern CPUs 
memory accesses tend to be sloow.

Eduardo