Re: svn commit: r341682 - head/sys/sys

2018-12-11 Thread Konstantin Belousov
On Mon, Dec 10, 2018 at 09:57:08PM -0700, Scott Long wrote:
> 
> 
> > On Dec 10, 2018, at 4:47 PM, Konstantin Belousov  
> > wrote:
> > 
> > On Mon, Dec 10, 2018 at 02:15:20PM -0800, John Baldwin wrote:
> >> On 12/8/18 7:43 PM, Warner Losh wrote:
> >>> 
> >>> 
> >>> On Sat, Dec 8, 2018, 8:36 PM Kevin Bowling  >>>  wrote:
> >>> 
> >>>On Sat, Dec 8, 2018 at 12:09 AM Mateusz Guzik  >>> > wrote:
> >>> 
>  
>  Fully satisfying solution would be that all architectures get 64-bit
>  ops, even if in the worst case they end up taking a lock. Then
>  subsystems would not have to ifdef on anything. However, there
>  was some opposition to this proposal and I don't think this is
>  important enough to push.
> >>> 
> >>>Mateusz,
> >>> 
> >>>Who is opposing this particular polyfill solution?  Scott Long brought
> >>>up a situation in driver development where this would be useful as
> >>>well.  The polyfills lower the cognitive load and #ifdef soup which
> >>>are the right call here regardless of performance on toy ports.
> >>> 
> >>> 
> >>> I don't recall seeing the opposition either. It would have to be a global 
> >>> lock for all 64bit atomics but I think it would only be 2 atomics on 
> >>> those architectures. 
> >> 
> >> It would have to be a spin lock, so in the case of unrl you would be 
> >> trading
> >> an operation on one of N regular mutexes for a single spin lock that was
> >> also contested by other things.  This would be pretty crappy.  For drivers
> >> that aren't actually used on platforms without 32-bit atomics we can simply
> >> not build them in sys/modules/Makefile or not put them in GENERIC.  For
> >> something in the core kernel like unrl I think we will have to do what
> >> Mateusz has done here.
> > 
> > It is worse. All atomics that acess the same location must use the same
> > lock. Otherwise, you could observe torn writes and out of thin air
> > values. Since you cannot know in advance which locations are acceses
> > by the locked variant, all freebsd atomics ops have to be switched to
> > locked variant on the architecture.
> 
> 64bit atomics on I486 already suffer the risk of torn reads; the 
> implementation
> merely does a CLI to protect against local preemption (though you could still
> get unlucky with an NMI).  I suppose you could argue that SMP isn’t really
> viable on I486 and therefore this fact is irrelevant, but it does illustrate
> precedence for having API completeness in a platform.
64bit atomics on 486 are fine, because we only support SMP on machines
which have cmpxchg8b.  Even then, I am not sure that we really support
the kind of SMP configurations from the Pentium times, at least I am certain
that this was not exercised for quite long time.

> 
> Really, this isn’t that hard.  Part of the existing contract of using atomics 
> is
> that you carefully evaluate all uses of the variable and decide when to use
> an atomic instruction.  Arguing that we can’t make this process automatic
> and foolproof for 64bit quantities, especially for a subset of subset of
> platforms/architectures, and therefore we should be even more of a difficult
> landmine, is not…. I don’t know what to say… sensical?

> 
> 64bit operations are a reality for MI code in a modern OS, and I’m tired of
> having to tip-toe around them due to incomplete MD implementations.  The
> instructions have been available on Intel CPUs for 25 years!  My
> very strong preference is to have a complete and functional implementation
> of atomic.h for any architecture that is hooked up to the build.  We can then
> tackle the details of optimization and edge case refinement, just like we do
> with every other API and service that we work on.  It doesn’t have to be
> perfect to be useful, and at this point we’re providing neither perfection nor
> utility, just “buts” and “what ifs”.
I do not understand this rant. Provide working implementation for 64bit
atomics on the arches which lack them, everybody will be happy.

My point is that implementing e.g. only atomic_add_64() using lock
is not a solution. Exactly because it makes inconsistent KPI which
does not satisfy basic guarantees which are provided on other arches,
heavily relied upon in FreeBSD code, and documented in atomic(9) in
the free prose, with further references to C11. I.e. instead of the
cross-arch KPI such implementation would require consumers to know arch
pecularities.  Isn't this complete failure of the goals ?

BTW, this is why C11 standard provides lockless predicate for atomic
types and not for atomic ops.  If one atomic op is locked, all of them
must be.

> 
> Going forward, I’m going to start using 64bit atomics where they’re prudent,
> instead of avoiding them due to this niche 32bit argument.  If that means
> more and more of what I do no longer compiles on a mips or a ppc32, then
> that’s a sacrifice that is fine with me.  It still creates ext

Re: svn commit: r341682 - head/sys/sys

2018-12-11 Thread Poul-Henning Kamp

In message 
, Warner Losh writes:

>We haven't ever supported SMP on i486, to my knowledge.

There were never any usable i486 SMP hardware.

The i486 CPU was not designed to do SMP so getting two CPUs to talk
together (IPIs and all that) required a lot of glue-logic, which
never got chip-ified.

A few prototypes were built, but nothing ever reached production,
least of all HP's 1000xi486 chip "mainfram" project.

-- 
Poul-Henning Kamp   | UNIX since Zilog Zeus 3.20
p...@freebsd.org | TCP/IP since RFC 956
FreeBSD committer   | BSD since 4.3-tahoe
Never attribute to malice what can adequately be explained by incompetence.
___
svn-src-all@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/svn-src-all
To unsubscribe, send any mail to "svn-src-all-unsubscr...@freebsd.org"


Re: svn commit: r341682 - head/sys/sys

2018-12-11 Thread Warner Losh
On Mon, Dec 10, 2018 at 9:57 PM Scott Long  wrote:

>
>
> > On Dec 10, 2018, at 4:47 PM, Konstantin Belousov 
> wrote:
> >
> > On Mon, Dec 10, 2018 at 02:15:20PM -0800, John Baldwin wrote:
> >> On 12/8/18 7:43 PM, Warner Losh wrote:
> >>>
> >>>
> >>> On Sat, Dec 8, 2018, 8:36 PM Kevin Bowling   wrote:
> >>>
> >>>On Sat, Dec 8, 2018 at 12:09 AM Mateusz Guzik  > wrote:
> >>>
> 
>  Fully satisfying solution would be that all architectures get 64-bit
>  ops, even if in the worst case they end up taking a lock. Then
>  subsystems would not have to ifdef on anything. However, there
>  was some opposition to this proposal and I don't think this is
>  important enough to push.
> >>>
> >>>Mateusz,
> >>>
> >>>Who is opposing this particular polyfill solution?  Scott Long
> brought
> >>>up a situation in driver development where this would be useful as
> >>>well.  The polyfills lower the cognitive load and #ifdef soup which
> >>>are the right call here regardless of performance on toy ports.
> >>>
> >>>
> >>> I don't recall seeing the opposition either. It would have to be a
> global lock for all 64bit atomics but I think it would only be 2
> atomics on those architectures.
> >>
> >> It would have to be a spin lock, so in the case of unrl you would be
> trading
> >> an operation on one of N regular mutexes for a single spin lock that was
> >> also contested by other things.  This would be pretty crappy.  For
> drivers
> >> that aren't actually used on platforms without 32-bit atomics we can
> simply
> >> not build them in sys/modules/Makefile or not put them in GENERIC.  For
> >> something in the core kernel like unrl I think we will have to do what
> >> Mateusz has done here.
> >
> > It is worse. All atomics that acess the same location must use the same
> > lock. Otherwise, you could observe torn writes and out of thin air
> > values. Since you cannot know in advance which locations are acceses
> > by the locked variant, all freebsd atomics ops have to be switched to
> > locked variant on the architecture.
>
> 64bit atomics on I486 already suffer the risk of torn reads; the
> implementation
> merely does a CLI to protect against local preemption (though you could
> still
> get unlucky with an NMI).  I suppose you could argue that SMP isn’t really
> viable on I486 and therefore this fact is irrelevant, but it does
> illustrate
> precedence for having API completeness in a platform.
>

We haven't ever supported SMP on i486, to my knowledge. Certainly by the
5.x time frame with SMPng it wasn't there. The 64-bit ops that are there
are mostly to smoothly support some of the (now older) embedded boards. I
haven't looked at the SMP work smp did for 4.x, but IIRC, it wasn't even
supported there.


> Really, this isn’t that hard.  Part of the existing contract of using
> atomics is
> that you carefully evaluate all uses of the variable and decide when to use
> an atomic instruction.  Arguing that we can’t make this process automatic
> and foolproof for 64bit quantities, especially for a subset of subset of
> platforms/architectures, and therefore we should be even more of a
> difficult
> landmine, is not…. I don’t know what to say… sensical?
>

I think it's fine to say that 64-bit atomics need to be efficient on the
supported platforms (more on that below).


> 64bit operations are a reality for MI code in a modern OS, and I’m tired of
> having to tip-toe around them due to incomplete MD implementations.  The
> instructions have been available on Intel CPUs for 25 years!  My
> very strong preference is to have a complete and functional implementation
> of atomic.h for any architecture that is hooked up to the build.  We can
> then
> tackle the details of optimization and edge case refinement, just like we
> do
> with every other API and service that we work on.  It doesn’t have to be
> perfect to be useful, and at this point we’re providing neither perfection
> nor
> utility, just “buts” and “what ifs”.
>

I think you miss the point of discussion, at least on my part. I'm looking
at the MIPS side and asking the question whatever 32-bit SMP support we may
have had in the past should die. We only ever supported it on one
not-so-common board that's aged out of relevance (JZ4780 is the only one I
found that needs 32-bit MIPS SMP, and it's a 4 year old embedded board
that's not that relevant today and there's no successor products in the
market or as far as I can tell planned). We also have kernels that run in
32-bit mode on 64-bit hardware, but those provide little value and w
already transitioned our largest 64-bit mips platform away from that
support, so we can deorbit as well, I think. They give little value to the
project. some house keeping here is likely in order.

If that just leaves an odd PPC thing, then I think it's perfectly fine to
start conversations there as well about trimming that support with the
power

Re: svn commit: r341682 - head/sys/sys

2018-12-10 Thread Scott Long


> On Dec 10, 2018, at 4:47 PM, Konstantin Belousov  wrote:
> 
> On Mon, Dec 10, 2018 at 02:15:20PM -0800, John Baldwin wrote:
>> On 12/8/18 7:43 PM, Warner Losh wrote:
>>> 
>>> 
>>> On Sat, Dec 8, 2018, 8:36 PM Kevin Bowling >>  wrote:
>>> 
>>>On Sat, Dec 8, 2018 at 12:09 AM Mateusz Guzik >> > wrote:
>>> 
 
 Fully satisfying solution would be that all architectures get 64-bit
 ops, even if in the worst case they end up taking a lock. Then
 subsystems would not have to ifdef on anything. However, there
 was some opposition to this proposal and I don't think this is
 important enough to push.
>>> 
>>>Mateusz,
>>> 
>>>Who is opposing this particular polyfill solution?  Scott Long brought
>>>up a situation in driver development where this would be useful as
>>>well.  The polyfills lower the cognitive load and #ifdef soup which
>>>are the right call here regardless of performance on toy ports.
>>> 
>>> 
>>> I don't recall seeing the opposition either. It would have to be a global 
>>> lock for all 64bit atomics but I think it would only be 2 atomics on 
>>> those architectures. 
>> 
>> It would have to be a spin lock, so in the case of unrl you would be trading
>> an operation on one of N regular mutexes for a single spin lock that was
>> also contested by other things.  This would be pretty crappy.  For drivers
>> that aren't actually used on platforms without 32-bit atomics we can simply
>> not build them in sys/modules/Makefile or not put them in GENERIC.  For
>> something in the core kernel like unrl I think we will have to do what
>> Mateusz has done here.
> 
> It is worse. All atomics that acess the same location must use the same
> lock. Otherwise, you could observe torn writes and out of thin air
> values. Since you cannot know in advance which locations are acceses
> by the locked variant, all freebsd atomics ops have to be switched to
> locked variant on the architecture.

64bit atomics on I486 already suffer the risk of torn reads; the implementation
merely does a CLI to protect against local preemption (though you could still
get unlucky with an NMI).  I suppose you could argue that SMP isn’t really
viable on I486 and therefore this fact is irrelevant, but it does illustrate
precedence for having API completeness in a platform.

Really, this isn’t that hard.  Part of the existing contract of using atomics is
that you carefully evaluate all uses of the variable and decide when to use
an atomic instruction.  Arguing that we can’t make this process automatic
and foolproof for 64bit quantities, especially for a subset of subset of
platforms/architectures, and therefore we should be even more of a difficult
landmine, is not…. I don’t know what to say… sensical?

64bit operations are a reality for MI code in a modern OS, and I’m tired of
having to tip-toe around them due to incomplete MD implementations.  The
instructions have been available on Intel CPUs for 25 years!  My
very strong preference is to have a complete and functional implementation
of atomic.h for any architecture that is hooked up to the build.  We can then
tackle the details of optimization and edge case refinement, just like we do
with every other API and service that we work on.  It doesn’t have to be
perfect to be useful, and at this point we’re providing neither perfection nor
utility, just “buts” and “what ifs”.

Going forward, I’m going to start using 64bit atomics where they’re prudent,
instead of avoiding them due to this niche 32bit argument.  If that means
more and more of what I do no longer compiles on a mips or a ppc32, then
that’s a sacrifice that is fine with me.  It still creates extra development 
work,
and having a uniformly available implementation would be much nicer.

Scott

___
svn-src-all@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/svn-src-all
To unsubscribe, send any mail to "svn-src-all-unsubscr...@freebsd.org"


Re: svn commit: r341682 - head/sys/sys

2018-12-10 Thread Kevin Bowling
Humor me with a kernel feature that will sue 64b atomics while both
instruction streams are ping ponging on the hypothetical lock because
this thread is getting pretty far out there..
On Mon, Dec 10, 2018 at 5:27 PM Justin Hibbits  wrote:
>
>
>
> On Mon, Dec 10, 2018, 17:57 Ian Lepore >
>> On Mon, 2018-12-10 at 14:15 -0800, John Baldwin wrote:
>> > On 12/8/18 7:43 PM, Warner Losh wrote:
>> > >
>> > >
>> > >
>> > > On Sat, Dec 8, 2018, 8:36 PM Kevin Bowling > > > m  wrote:
>> > >
>> > > On Sat, Dec 8, 2018 at 12:09 AM Mateusz Guzik > > > m > wrote:
>> > >
>> > > >
>> > > > Fully satisfying solution would be that all architectures get
>> > > 64-bit
>> > > > ops, even if in the worst case they end up taking a lock.
>> > > Then
>> > > > subsystems would not have to ifdef on anything. However,
>> > > there
>> > > > was some opposition to this proposal and I don't think this
>> > > is
>> > > > important enough to push.
>> > >
>> > > Mateusz,
>> > >
>> > > Who is opposing this particular polyfill solution?  Scott Long
>> > > brought
>> > > up a situation in driver development where this would be useful
>> > > as
>> > > well.  The polyfills lower the cognitive load and #ifdef soup
>> > > which
>> > > are the right call here regardless of performance on toy ports.
>> > >
>> > >
>> > > I don't recall seeing the opposition either. It would have to be a
>> > > global lock for all 64bit atomics but I think it would only be
>> > > 2 atomics on those architectures.
>> > It would have to be a spin lock, so in the case of unrl you would be
>> > trading
>> > an operation on one of N regular mutexes for a single spin lock that
>> > was
>> > also contested by other things.  This would be pretty crappy.  For
>> > drivers
>> > that aren't actually used on platforms without 32-bit atomics we can
>> > simply
>> > not build them in sys/modules/Makefile or not put them in
>> > GENERIC.  For
>> > something in the core kernel like unrl I think we will have to do
>> > what
>> > Mateusz has done here.
>> >
>>
>> On a single-core system all you need to implement 64-bit atomics in the
>> kernel is to disable interrupts around using normal load/store
>> operations on the values. Do we have any platforms that are SMP but
>> don't have hardware primitives for 64-bit atomics?
>>
>> -- Ian
>
>
> There were some dual processor G4 machines. I have one.  It doesn't have 64 
> bit atomics.
>
> - Justin
___
svn-src-all@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/svn-src-all
To unsubscribe, send any mail to "svn-src-all-unsubscr...@freebsd.org"


Re: svn commit: r341682 - head/sys/sys

2018-12-10 Thread Warner Losh
On Mon, Dec 10, 2018, 5:27 PM Justin Hibbits 
>
> On Mon, Dec 10, 2018, 17:57 Ian Lepore 
>> On Mon, 2018-12-10 at 14:15 -0800, John Baldwin wrote:
>> > On 12/8/18 7:43 PM, Warner Losh wrote:
>> > >
>> > >
>> > >
>> > > On Sat, Dec 8, 2018, 8:36 PM Kevin Bowling > > > m  wrote:
>> > >
>> > > On Sat, Dec 8, 2018 at 12:09 AM Mateusz Guzik > > > m > wrote:
>> > >
>> > > >
>> > > > Fully satisfying solution would be that all architectures get
>> > > 64-bit
>> > > > ops, even if in the worst case they end up taking a lock.
>> > > Then
>> > > > subsystems would not have to ifdef on anything. However,
>> > > there
>> > > > was some opposition to this proposal and I don't think this
>> > > is
>> > > > important enough to push.
>> > >
>> > > Mateusz,
>> > >
>> > > Who is opposing this particular polyfill solution?  Scott Long
>> > > brought
>> > > up a situation in driver development where this would be useful
>> > > as
>> > > well.  The polyfills lower the cognitive load and #ifdef soup
>> > > which
>> > > are the right call here regardless of performance on toy ports.
>> > >
>> > >
>> > > I don't recall seeing the opposition either. It would have to be a
>> > > global lock for all 64bit atomics but I think it would only be
>> > > 2 atomics on those architectures.
>> > It would have to be a spin lock, so in the case of unrl you would be
>> > trading
>> > an operation on one of N regular mutexes for a single spin lock that
>> > was
>> > also contested by other things.  This would be pretty crappy.  For
>> > drivers
>> > that aren't actually used on platforms without 32-bit atomics we can
>> > simply
>> > not build them in sys/modules/Makefile or not put them in
>> > GENERIC.  For
>> > something in the core kernel like unrl I think we will have to do
>> > what
>> > Mateusz has done here.
>> >
>>
>> On a single-core system all you need to implement 64-bit atomics in the
>> kernel is to disable interrupts around using normal load/store
>> operations on the values. Do we have any platforms that are SMP but
>> don't have hardware primitives for 64-bit atomics?
>>
>> -- Ian
>>
>
> There were some dual processor G4 machines. I have one.  It doesn't have
> 64 bit atomics.
>

There is a 32 bit mips machine like this as well. For drivers it's not too
bad, but for core functions in the MI part of the kernel, all known
implementations super duper suck.

Warner

>
___
svn-src-all@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/svn-src-all
To unsubscribe, send any mail to "svn-src-all-unsubscr...@freebsd.org"


Re: svn commit: r341682 - head/sys/sys

2018-12-10 Thread Justin Hibbits
On Mon, Dec 10, 2018, 17:57 Ian Lepore  On Mon, 2018-12-10 at 14:15 -0800, John Baldwin wrote:
> > On 12/8/18 7:43 PM, Warner Losh wrote:
> > >
> > >
> > >
> > > On Sat, Dec 8, 2018, 8:36 PM Kevin Bowling  > > m  wrote:
> > >
> > > On Sat, Dec 8, 2018 at 12:09 AM Mateusz Guzik  > > m > wrote:
> > >
> > > >
> > > > Fully satisfying solution would be that all architectures get
> > > 64-bit
> > > > ops, even if in the worst case they end up taking a lock.
> > > Then
> > > > subsystems would not have to ifdef on anything. However,
> > > there
> > > > was some opposition to this proposal and I don't think this
> > > is
> > > > important enough to push.
> > >
> > > Mateusz,
> > >
> > > Who is opposing this particular polyfill solution?  Scott Long
> > > brought
> > > up a situation in driver development where this would be useful
> > > as
> > > well.  The polyfills lower the cognitive load and #ifdef soup
> > > which
> > > are the right call here regardless of performance on toy ports.
> > >
> > >
> > > I don't recall seeing the opposition either. It would have to be a
> > > global lock for all 64bit atomics but I think it would only be
> > > 2 atomics on those architectures.
> > It would have to be a spin lock, so in the case of unrl you would be
> > trading
> > an operation on one of N regular mutexes for a single spin lock that
> > was
> > also contested by other things.  This would be pretty crappy.  For
> > drivers
> > that aren't actually used on platforms without 32-bit atomics we can
> > simply
> > not build them in sys/modules/Makefile or not put them in
> > GENERIC.  For
> > something in the core kernel like unrl I think we will have to do
> > what
> > Mateusz has done here.
> >
>
> On a single-core system all you need to implement 64-bit atomics in the
> kernel is to disable interrupts around using normal load/store
> operations on the values. Do we have any platforms that are SMP but
> don't have hardware primitives for 64-bit atomics?
>
> -- Ian
>

There were some dual processor G4 machines. I have one.  It doesn't have 64
bit atomics.

- Justin

>
___
svn-src-all@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/svn-src-all
To unsubscribe, send any mail to "svn-src-all-unsubscr...@freebsd.org"


Re: svn commit: r341682 - head/sys/sys

2018-12-10 Thread Kevin Bowling
Right we are talking about a polyfill for systems that have 1-2 cores
in practice.  You're not going to crank high parallelism on these
global locks in practice and the common lock may help performance due
to cache residence for all we know.  This is a lot of ballyhoo for a
decision that should favor the reduction of complexity, clean KPIs,
and overwhelming majority of machines that have 64b atomics, not
scattering ifdefs in the code for niche performance.

Regards,
Kevin
On Mon, Dec 10, 2018 at 4:57 PM Ian Lepore  wrote:
>
> On Mon, 2018-12-10 at 14:15 -0800, John Baldwin wrote:
> > On 12/8/18 7:43 PM, Warner Losh wrote:
> > >
> > >
> > >
> > > On Sat, Dec 8, 2018, 8:36 PM Kevin Bowling  > > m  wrote:
> > >
> > > On Sat, Dec 8, 2018 at 12:09 AM Mateusz Guzik  > > m > wrote:
> > >
> > > >
> > > > Fully satisfying solution would be that all architectures get
> > > 64-bit
> > > > ops, even if in the worst case they end up taking a lock.
> > > Then
> > > > subsystems would not have to ifdef on anything. However,
> > > there
> > > > was some opposition to this proposal and I don't think this
> > > is
> > > > important enough to push.
> > >
> > > Mateusz,
> > >
> > > Who is opposing this particular polyfill solution?  Scott Long
> > > brought
> > > up a situation in driver development where this would be useful
> > > as
> > > well.  The polyfills lower the cognitive load and #ifdef soup
> > > which
> > > are the right call here regardless of performance on toy ports.
> > >
> > >
> > > I don't recall seeing the opposition either. It would have to be a
> > > global lock for all 64bit atomics but I think it would only be
> > > 2 atomics on those architectures.
> > It would have to be a spin lock, so in the case of unrl you would be
> > trading
> > an operation on one of N regular mutexes for a single spin lock that
> > was
> > also contested by other things.  This would be pretty crappy.  For
> > drivers
> > that aren't actually used on platforms without 32-bit atomics we can
> > simply
> > not build them in sys/modules/Makefile or not put them in
> > GENERIC.  For
> > something in the core kernel like unrl I think we will have to do
> > what
> > Mateusz has done here.
> >
>
> On a single-core system all you need to implement 64-bit atomics in the
> kernel is to disable interrupts around using normal load/store
> operations on the values. Do we have any platforms that are SMP but
> don't have hardware primitives for 64-bit atomics?
>
> -- Ian
___
svn-src-all@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/svn-src-all
To unsubscribe, send any mail to "svn-src-all-unsubscr...@freebsd.org"


Re: svn commit: r341682 - head/sys/sys

2018-12-10 Thread Ian Lepore
On Mon, 2018-12-10 at 14:15 -0800, John Baldwin wrote:
> On 12/8/18 7:43 PM, Warner Losh wrote:
> > 
> > 
> > 
> > On Sat, Dec 8, 2018, 8:36 PM Kevin Bowling  > m  wrote:
> > 
> > On Sat, Dec 8, 2018 at 12:09 AM Mateusz Guzik  > m > wrote:
> > 
> > >
> > > Fully satisfying solution would be that all architectures get
> > 64-bit
> > > ops, even if in the worst case they end up taking a lock.
> > Then
> > > subsystems would not have to ifdef on anything. However,
> > there
> > > was some opposition to this proposal and I don't think this
> > is
> > > important enough to push.
> > 
> > Mateusz,
> > 
> > Who is opposing this particular polyfill solution?  Scott Long
> > brought
> > up a situation in driver development where this would be useful
> > as
> > well.  The polyfills lower the cognitive load and #ifdef soup
> > which
> > are the right call here regardless of performance on toy ports.
> > 
> > 
> > I don't recall seeing the opposition either. It would have to be a
> > global lock for all 64bit atomics but I think it would only be
> > 2 atomics on those architectures. 
> It would have to be a spin lock, so in the case of unrl you would be
> trading
> an operation on one of N regular mutexes for a single spin lock that
> was
> also contested by other things.  This would be pretty crappy.  For
> drivers
> that aren't actually used on platforms without 32-bit atomics we can
> simply
> not build them in sys/modules/Makefile or not put them in
> GENERIC.  For
> something in the core kernel like unrl I think we will have to do
> what
> Mateusz has done here.
> 

On a single-core system all you need to implement 64-bit atomics in the
kernel is to disable interrupts around using normal load/store
operations on the values. Do we have any platforms that are SMP but
don't have hardware primitives for 64-bit atomics?

-- Ian
___
svn-src-all@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/svn-src-all
To unsubscribe, send any mail to "svn-src-all-unsubscr...@freebsd.org"


Re: svn commit: r341682 - head/sys/sys

2018-12-10 Thread Konstantin Belousov
On Mon, Dec 10, 2018 at 02:15:20PM -0800, John Baldwin wrote:
> On 12/8/18 7:43 PM, Warner Losh wrote:
> > 
> > 
> > On Sat, Dec 8, 2018, 8:36 PM Kevin Bowling  >  wrote:
> > 
> > On Sat, Dec 8, 2018 at 12:09 AM Mateusz Guzik  > > wrote:
> > 
> > >
> > > Fully satisfying solution would be that all architectures get 64-bit
> > > ops, even if in the worst case they end up taking a lock. Then
> > > subsystems would not have to ifdef on anything. However, there
> > > was some opposition to this proposal and I don't think this is
> > > important enough to push.
> > 
> > Mateusz,
> > 
> > Who is opposing this particular polyfill solution?  Scott Long brought
> > up a situation in driver development where this would be useful as
> > well.  The polyfills lower the cognitive load and #ifdef soup which
> > are the right call here regardless of performance on toy ports.
> > 
> > 
> > I don't recall seeing the opposition either. It would have to be a global 
> > lock for all 64bit atomics but I think it would only be 2 atomics on 
> > those architectures. 
> 
> It would have to be a spin lock, so in the case of unrl you would be trading
> an operation on one of N regular mutexes for a single spin lock that was
> also contested by other things.  This would be pretty crappy.  For drivers
> that aren't actually used on platforms without 32-bit atomics we can simply
> not build them in sys/modules/Makefile or not put them in GENERIC.  For
> something in the core kernel like unrl I think we will have to do what
> Mateusz has done here.

It is worse. All atomics that acess the same location must use the same
lock. Otherwise, you could observe torn writes and out of thin air
values. Since you cannot know in advance which locations are acceses
by the locked variant, all freebsd atomics ops have to be switched to
locked variant on the architecture.
___
svn-src-all@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/svn-src-all
To unsubscribe, send any mail to "svn-src-all-unsubscr...@freebsd.org"


Re: svn commit: r341682 - head/sys/sys

2018-12-10 Thread John Baldwin
On 12/8/18 7:43 PM, Warner Losh wrote:
> 
> 
> On Sat, Dec 8, 2018, 8:36 PM Kevin Bowling   wrote:
> 
> On Sat, Dec 8, 2018 at 12:09 AM Mateusz Guzik  > wrote:
> 
> >
> > Fully satisfying solution would be that all architectures get 64-bit
> > ops, even if in the worst case they end up taking a lock. Then
> > subsystems would not have to ifdef on anything. However, there
> > was some opposition to this proposal and I don't think this is
> > important enough to push.
> 
> Mateusz,
> 
> Who is opposing this particular polyfill solution?  Scott Long brought
> up a situation in driver development where this would be useful as
> well.  The polyfills lower the cognitive load and #ifdef soup which
> are the right call here regardless of performance on toy ports.
> 
> 
> I don't recall seeing the opposition either. It would have to be a global 
> lock for all 64bit atomics but I think it would only be 2 atomics on 
> those architectures. 

It would have to be a spin lock, so in the case of unrl you would be trading
an operation on one of N regular mutexes for a single spin lock that was
also contested by other things.  This would be pretty crappy.  For drivers
that aren't actually used on platforms without 32-bit atomics we can simply
not build them in sys/modules/Makefile or not put them in GENERIC.  For
something in the core kernel like unrl I think we will have to do what
Mateusz has done here.

-- 
John Baldwin


___
svn-src-all@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/svn-src-all
To unsubscribe, send any mail to "svn-src-all-unsubscr...@freebsd.org"


Re: svn commit: r341682 - head/sys/sys

2018-12-08 Thread Warner Losh
On Sat, Dec 8, 2018, 8:36 PM Kevin Bowling  On Sat, Dec 8, 2018 at 12:09 AM Mateusz Guzik  wrote:
>
> >
> > Fully satisfying solution would be that all architectures get 64-bit
> > ops, even if in the worst case they end up taking a lock. Then
> > subsystems would not have to ifdef on anything. However, there
> > was some opposition to this proposal and I don't think this is
> > important enough to push.
>
> Mateusz,
>
> Who is opposing this particular polyfill solution?  Scott Long brought
> up a situation in driver development where this would be useful as
> well.  The polyfills lower the cognitive load and #ifdef soup which
> are the right call here regardless of performance on toy ports.
>

I don't recall seeing the opposition either. It would have to be a global
lock for all 64bit atomics but I think it would only be 2 atomics on
those architectures.

Warner

>
___
svn-src-all@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/svn-src-all
To unsubscribe, send any mail to "svn-src-all-unsubscr...@freebsd.org"


Re: svn commit: r341682 - head/sys/sys

2018-12-08 Thread Kevin Bowling
On Sat, Dec 8, 2018 at 12:09 AM Mateusz Guzik  wrote:

>
> Fully satisfying solution would be that all architectures get 64-bit
> ops, even if in the worst case they end up taking a lock. Then
> subsystems would not have to ifdef on anything. However, there
> was some opposition to this proposal and I don't think this is
> important enough to push.

Mateusz,

Who is opposing this particular polyfill solution?  Scott Long brought
up a situation in driver development where this would be useful as
well.  The polyfills lower the cognitive load and #ifdef soup which
are the right call here regardless of performance on toy ports.

Regards,
Kevin
___
svn-src-all@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/svn-src-all
To unsubscribe, send any mail to "svn-src-all-unsubscr...@freebsd.org"


Re: svn commit: r341682 - head/sys/sys

2018-12-08 Thread Rodney W. Grimes
> On 12/7/18, Ian Lepore  wrote:
> > On Fri, 2018-12-07 at 12:05 +, Mateusz Guzik wrote:
> >> Author: mjg
> >> Date: Fri Dec  7 12:05:11 2018
> >> New Revision: 341682
> >> URL: https://svnweb.freebsd.org/changeset/base/341682
> >>
> >> Log:
> >>   unr64: use locked variant if not __LP64__
> >>
> >>   The current ifdefs are not sufficient to distinguish 32- and 64-
> >> bit
> >>   variants, which results e.g. in powerpc64 not using atomics.
> >>
> >>   While some 32-bit archs provide 64-bit atomics, there is no huge
> >> advantage
> >>   of using them on these platforms.
> >>
> [..]
> > This seems like a wholly unsatisfying solution compared to how trivial
> > it would be to do something like have each arch's atomic.h set a symbol
> > to indicate whether 64-bit atomics are available. Dismissing 32-bit
> > arches because you don't perceive performance to be important there
> > doesn't seem like a valid argument.
> >
> 
> But performance *is* improved on 32-bit architectures as well.
> 
> Bitmap handling would try to very hard to reduce memory usage, which
> had a lot of single-threaded overhead (e.g. it allocates memory just in
> case and then frees it). Since 64-bit inode numbers can simply grow
> there is no need for any of it and memory use is 64 bit to store the
> variable. And that's what unr64 is doing.
> 
> The main difference here is in scalability - taking a lock, bumping a
> variable and releasing the lock scales much worse than an atomic
> (which still scales poorly if heavily used). 32-bit arches don't really
> have enough concurrency to see a difference with this code.

All your high thread Intel and Amd CPU's can
still run in 32bit with all those threads active,
so you can get high concurrency on 32-bit arches.

> 
> single-threaded this is indeed a little bit slower, but this is not
> running in any hot path.
> 
> Fully satisfying solution would be that all architectures get 64-bit
> ops, even if in the worst case they end up taking a lock. Then
> subsystems would not have to ifdef on anything. However, there
> was some opposition to this proposal and I don't think this is
> important enough to push.

-- 
Rod Grimes rgri...@freebsd.org
___
svn-src-all@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/svn-src-all
To unsubscribe, send any mail to "svn-src-all-unsubscr...@freebsd.org"


Re: svn commit: r341682 - head/sys/sys

2018-12-07 Thread Mateusz Guzik
On 12/7/18, Ian Lepore  wrote:
> On Fri, 2018-12-07 at 12:05 +, Mateusz Guzik wrote:
>> Author: mjg
>> Date: Fri Dec  7 12:05:11 2018
>> New Revision: 341682
>> URL: https://svnweb.freebsd.org/changeset/base/341682
>>
>> Log:
>>   unr64: use locked variant if not __LP64__
>>
>>   The current ifdefs are not sufficient to distinguish 32- and 64-
>> bit
>>   variants, which results e.g. in powerpc64 not using atomics.
>>
>>   While some 32-bit archs provide 64-bit atomics, there is no huge
>> advantage
>>   of using them on these platforms.
>>
[..]
> This seems like a wholly unsatisfying solution compared to how trivial
> it would be to do something like have each arch's atomic.h set a symbol
> to indicate whether 64-bit atomics are available. Dismissing 32-bit
> arches because you don't perceive performance to be important there
> doesn't seem like a valid argument.
>

But performance *is* improved on 32-bit architectures as well.

Bitmap handling would try to very hard to reduce memory usage, which
had a lot of single-threaded overhead (e.g. it allocates memory just in
case and then frees it). Since 64-bit inode numbers can simply grow
there is no need for any of it and memory use is 64 bit to store the
variable. And that's what unr64 is doing.

The main difference here is in scalability - taking a lock, bumping a
variable and releasing the lock scales much worse than an atomic
(which still scales poorly if heavily used). 32-bit arches don't really
have enough concurrency to see a difference with this code.

single-threaded this is indeed a little bit slower, but this is not
running in any hot path.

Fully satisfying solution would be that all architectures get 64-bit
ops, even if in the worst case they end up taking a lock. Then
subsystems would not have to ifdef on anything. However, there
was some opposition to this proposal and I don't think this is
important enough to push.

-- 
Mateusz Guzik 
___
svn-src-all@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/svn-src-all
To unsubscribe, send any mail to "svn-src-all-unsubscr...@freebsd.org"


Re: svn commit: r341682 - head/sys/sys

2018-12-07 Thread John Baldwin
On 12/7/18 10:10 AM, Ian Lepore wrote:
> On Fri, 2018-12-07 at 12:05 +, Mateusz Guzik wrote:
>> Author: mjg
>> Date: Fri Dec  7 12:05:11 2018
>> New Revision: 341682
>> URL: https://svnweb.freebsd.org/changeset/base/341682
>>
>> Log:
>>   unr64: use locked variant if not __LP64__
>>   
>>   The current ifdefs are not sufficient to distinguish 32- and 64-
>> bit
>>   variants, which results e.g. in powerpc64 not using atomics.
>>   
>>   While some 32-bit archs provide 64-bit atomics, there is no huge
>> advantage
>>   of using them on these platforms.
>>   
>>   Reported by:   many
>>   Suggested by:  jhb
>>   Sponsored by:  The FreeBSD Foundation
>>
>> Modified:
>>   head/sys/sys/systm.h
>>
>> Modified: head/sys/sys/systm.h
>> =
>> =
>> --- head/sys/sys/systm.h Fri Dec  7 12:02:31 2018(r341
>> 681)
>> +++ head/sys/sys/systm.h Fri Dec  7 12:05:11 2018(r341
>> 682)
>> @@ -523,7 +523,7 @@ int alloc_unr_specific(struct unrhdr *uh, u_int
>> item);
>>  int alloc_unrl(struct unrhdr *uh);
>>  void free_unr(struct unrhdr *uh, u_int item);
>>  
>> -#if defined(__mips__) || defined(__powerpc__)
>> +#ifndef __LP64__
>>  #define UNR64_LOCKED
>>  #endif
>>  
>>
> 
> This seems like a wholly unsatisfying solution compared to how trivial
> it would be to do something like have each arch's atomic.h set a symbol
> to indicate whether 64-bit atomics are available. Dismissing 32-bit
> arches because you don't perceive performance to be important there
> doesn't seem like a valid argument.

I think you are free to adjust the #ifdef if you find that it actually
makes a difference.  unr lists have been using a mutex on 32-bit architectures
the entire time the API has existed, and I'm not sure you are running -j128
poudriere package builds on a raspberry pi to get the kind of workload where
this lock contention matters.

-- 
John Baldwin


___
svn-src-all@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/svn-src-all
To unsubscribe, send any mail to "svn-src-all-unsubscr...@freebsd.org"


Re: svn commit: r341682 - head/sys/sys

2018-12-07 Thread Ian Lepore
On Fri, 2018-12-07 at 12:05 +, Mateusz Guzik wrote:
> Author: mjg
> Date: Fri Dec  7 12:05:11 2018
> New Revision: 341682
> URL: https://svnweb.freebsd.org/changeset/base/341682
> 
> Log:
>   unr64: use locked variant if not __LP64__
>   
>   The current ifdefs are not sufficient to distinguish 32- and 64-
> bit
>   variants, which results e.g. in powerpc64 not using atomics.
>   
>   While some 32-bit archs provide 64-bit atomics, there is no huge
> advantage
>   of using them on these platforms.
>   
>   Reported by:many
>   Suggested by:   jhb
>   Sponsored by:   The FreeBSD Foundation
> 
> Modified:
>   head/sys/sys/systm.h
> 
> Modified: head/sys/sys/systm.h
> =
> =
> --- head/sys/sys/systm.h  Fri Dec  7 12:02:31 2018(r341
> 681)
> +++ head/sys/sys/systm.h  Fri Dec  7 12:05:11 2018(r341
> 682)
> @@ -523,7 +523,7 @@ int alloc_unr_specific(struct unrhdr *uh, u_int
> item);
>  int alloc_unrl(struct unrhdr *uh);
>  void free_unr(struct unrhdr *uh, u_int item);
>  
> -#if defined(__mips__) || defined(__powerpc__)
> +#ifndef __LP64__
>  #define UNR64_LOCKED
>  #endif
>  
> 

This seems like a wholly unsatisfying solution compared to how trivial
it would be to do something like have each arch's atomic.h set a symbol
to indicate whether 64-bit atomics are available. Dismissing 32-bit
arches because you don't perceive performance to be important there
doesn't seem like a valid argument.

-- Ian

___
svn-src-all@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/svn-src-all
To unsubscribe, send any mail to "svn-src-all-unsubscr...@freebsd.org"


svn commit: r341682 - head/sys/sys

2018-12-07 Thread Mateusz Guzik
Author: mjg
Date: Fri Dec  7 12:05:11 2018
New Revision: 341682
URL: https://svnweb.freebsd.org/changeset/base/341682

Log:
  unr64: use locked variant if not __LP64__
  
  The current ifdefs are not sufficient to distinguish 32- and 64- bit
  variants, which results e.g. in powerpc64 not using atomics.
  
  While some 32-bit archs provide 64-bit atomics, there is no huge advantage
  of using them on these platforms.
  
  Reported by:  many
  Suggested by: jhb
  Sponsored by: The FreeBSD Foundation

Modified:
  head/sys/sys/systm.h

Modified: head/sys/sys/systm.h
==
--- head/sys/sys/systm.hFri Dec  7 12:02:31 2018(r341681)
+++ head/sys/sys/systm.hFri Dec  7 12:05:11 2018(r341682)
@@ -523,7 +523,7 @@ int alloc_unr_specific(struct unrhdr *uh, u_int item);
 int alloc_unrl(struct unrhdr *uh);
 void free_unr(struct unrhdr *uh, u_int item);
 
-#if defined(__mips__) || defined(__powerpc__)
+#ifndef __LP64__
 #define UNR64_LOCKED
 #endif
 
___
svn-src-all@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/svn-src-all
To unsubscribe, send any mail to "svn-src-all-unsubscr...@freebsd.org"