Re: [RFC PATCH 06/13] m68k: Drop ARCH_USES_GETTIMEOFFSET

2018-11-14 Thread Michael Schmitz

Hi Finn

Am 15.11.2018 um 12:54 schrieb Michael Schmitz:

That one does appear to work - different versions of ARAnyM, and
different userland versions though. I'll test that again with the setup
that saw c606b5cf902 fail.


Still fails on that emulator / userland.


Must be a quirk of ARAnyM 1.0.2 (or powerpc). With 0.9.15 on x86_64,
it's fine.

I'm sufficiently convinced to try this on actual hardware now.


Well, it sort of works - I've seen one login timeout on the console 
under load (less than 10 seconds after typing in the password), but most 
attempts went OK. Couldn't log in through SSH without increasing fatal: 
Timeout before authenticationthe LoginGraceTime parameter though.


I usually get reliable login using ssh key files with the default 
setting of 120 seconds (takes around 90 to 100 seconds to complete). 
With your patch, even increasing this to 4800 doesn't result in reliable 
login.


The error I see in the logs is 'fatal: Timeout before authentication'.

Cheers,

Michael



Re: [RFC PATCH 06/13] m68k: Drop ARCH_USES_GETTIMEOFFSET

2018-11-14 Thread Finn Thain
On Thu, 15 Nov 2018, Michael Schmitz wrote:

> Hi Finn,
> 
> On 14/11/18 3:58 PM, Michael Schmitz wrote:
> > Hi Finn,
> > 
> > Am 14.11.2018 um 14:08 schrieb Michael Schmitz:
> > > > Can you also test tree fbf8405cd982 please?
> > > > 
> > > My tests were on c606b5cf902 in case it wasn't clear. I've now seen
> > > fbf8405cd982, one moment please ...
> > > 
> > > That one does appear to work - different versions of ARAnyM, and
> > > different userland versions though. I'll test that again with the setup
> > > that saw c606b5cf902 fail.
> > 
> > Still fails on that emulator / userland.
> > 
> Must be a quirk of ARAnyM 1.0.2 (or powerpc). With 0.9.15 on x86_64, it's
> fine.
> 

Could be a regression in aranym?

Maybe it's worth trying 0.9.15 on the powerpc host?

> I'm sufficiently convinced to try this on actual hardware now.
> 

Thanks!

-- 

> Cheers,
> 
> ??? Michael
> 
> 
> 


Re: [RFC PATCH 01/13] arm: Fix mutual exclusion in arch_gettimeoffset

2018-11-14 Thread Finn Thain
On Wed, 14 Nov 2018, Russell King - ARM Linux wrote:

> However, I now see (having searched mailing lists) what you are trying 
> to do - you have _not_ copied me or the mailing lists I'm on with your 
> cover message, so I'm *totally* lacking in the context of your patch 
> series, particularly where you are converting m68k to use clocksources 
> without needing the gettimeoffset() stuff.
> 

True. I should have included all interested parties in the recipients of 
the cover letter. My bad.

> You have failed to explain that in this thread - probably assuming that 
> I've read your cover message.

I offered to write a patch to add a clocksource to replace the 
arch_gettimeoffset functionality for RPC and EBSA110.

You even replied to that offer.

I did not propose degrading functionality while there is someone able to 
test modernization patches (assuming there is...).

Would you consider merging untested modernization patches for the 
arch_gettimeoffset API?

> I haven't until now, because you never sent it to me or the 
> linux-arm-kernel mailing list.
> 
> I have found this thread _very_ frustrating, and frankly a waste of my 
> time discussing the finer points because of this lack of context.

Sorry for any frustration I've caused you.

The thread went way off-topic when Christoph took the opportunity to 
suggest the removal of RPC and EBSA110. That doesn't interest me.

My interest remains API modernization. The actual patches I've sent are 
intended to modernize the clock API *without* degrading any functionality.

> Please ensure that if you're going to be sending a patch series, that 
> the cover message at least finds its way to the intended audience of 
> your patches, so that everyone has the context they need when looking at 
> (eg) the single patch they may receive.
> 

OK. I'll have to improve my patch submission scripts.

-- 

> Alternatively, if someone raises a problem with the patch, and you 
> _know_ you haven't done that, then please consider informing them where 
> they can get more context, eg, by providing a link to your archived 
> cover message.  It would help avoid misunderstandings.
> 
> Thanks.
> 
> 


Re: [RFC PATCH 01/13] arm: Fix mutual exclusion in arch_gettimeoffset

2018-11-14 Thread Michael Schmitz



On 14/11/18 8:58 PM, Russell King - ARM Linux wrote:



Are you saying that's not possible on arm, because the current timer rundown
counter can't be read while the timer is running?

If I were to run a second timer at higher rate for clocksource, but keeping
the 10 ms timer as clock event (could easily do that by using timer D on
Atari Falcon) - how would that improve my timekeeping? Clock events still
only happen 10 ms apart ...

Ah, I think you're talking about something else.

You seem to be talking about what happens when time keeping interrupts
happen.

That's what I understood your comment was about.

I'm talking about the resolution of gettimeofday() and the other
interfaces that are used (eg) for packet capture timestamping and
the like - the _user_ visible effects of the timekeeping system.

With the existing implementation, all these have better-than-jiffy
resolution - in the case of RPC, that's 500ns, rather than 10ms
which would be the case without gettimeoffset().  Removing
gettimeoffset() as Finn has proposed without preserving that
resolution is simply completely unacceptable.


I agree - but Finn had also offered to supply patches to arm that would 
have added a clocksource_read() function very much like for those m68k 
platforms that had used gettimeoffset(). I wondered what reason there 
was for these not to work on arm


I now realize you'd never seen that offer.

The proposal to drop architectures still relying on arch_gettimeoffset() 
had been raising enough of a response on linux-m68k to make me conclude 
that same proposal had been kicked on to other arch MLs affected as 
well. I'm a bit naive that way.


Now your criticism of arch_gettimeoffset() (missing timer rollover 
because the timer interrupt has been cleared by the interrupt handler) 
still stands. I just can't find the offset() functions shown in the 
5cfc8ee0bb51 patch. Any hints?


Cheers,

    Michael




Re: [RFC PATCH 06/13] m68k: Drop ARCH_USES_GETTIMEOFFSET

2018-11-14 Thread Michael Schmitz

Hi Finn,

On 14/11/18 3:58 PM, Michael Schmitz wrote:

Hi Finn,

Am 14.11.2018 um 14:08 schrieb Michael Schmitz:

Can you also test tree fbf8405cd982 please?


My tests were on c606b5cf902 in case it wasn't clear. I've now seen
fbf8405cd982, one moment please ...

That one does appear to work - different versions of ARAnyM, and
different userland versions though. I'll test that again with the setup
that saw c606b5cf902 fail.


Still fails on that emulator / userland.

Must be a quirk of ARAnyM 1.0.2 (or powerpc). With 0.9.15 on x86_64, 
it's fine.


I'm sufficiently convinced to try this on actual hardware now.

Cheers,

    Michael




Re: [RFC PATCH 01/13] arm: Fix mutual exclusion in arch_gettimeoffset

2018-11-14 Thread Russell King - ARM Linux
On Wed, Nov 14, 2018 at 03:58:36PM +0100, Geert Uytterhoeven wrote:
> Hi Russell,
> 
> On Wed, Nov 14, 2018 at 3:16 PM Russell King - ARM Linux
>  wrote:
> > On Wed, Nov 14, 2018 at 02:17:09PM +1100, Finn Thain wrote:
> > > So, even assuming that you're right about the limitations of single-timer
> > > platforms in general, removal of arch_gettimeoffset wouldn't require the
> > > removal of any platforms, AFAICT.
> >
> > I haven't proposed removing platforms.
> >
> > I'm just objecting to the idea of removing arch_gettimeoffset(),
> > thereby causing a regression by changing the resolution of
> > gettimeofday() without any sign of equivalent functionality.
> >
> > However, I now see (having searched mailing lists) what you are
> > trying to do - you have _not_ copied me or the mailing lists I'm
> > on with your cover message, so I'm *totally* lacking in the context
> > of your patch series, particularly where you are converting m68k
> > to use clocksources without needing the gettimeoffset() stuff.
> >
> > You have failed to explain that in this thread - probably assuming
> > that I've read your cover message.  I haven't until now, because
> > you never sent it to me or the linux-arm-kernel mailing list.
> >
> > I have found this thread _very_ frustrating, and frankly a waste of
> > my time discussing the finer points because of this lack of context.
> > Please ensure that if you're going to be sending a patch series,
> > that the cover message at least finds its way to the intended
> > audience of your patches, so that everyone has the context they
> > need when looking at (eg) the single patch they may receive.
> >
> > Alternatively, if someone raises a problem with the patch, and you
> > _know_ you haven't done that, then please consider informing them
> > where they can get more context, eg, by providing a link to your
> > archived cover message.  It would help avoid misunderstandings.
> 
> Sorry for the lack of context.
> The real trigger was also not explained in the cover message, and was a
> the threat to remove platforms not using modern timekeeping APIs, cfr.
>  "Removing support for old hardware from the kernel"
> (https://lwn.net/Articles/769468/).

And naturally, because kernel developers are oh so great at
communication, that decision has been communicated to those
affected by it. *NOT*.

Clearly there is a need for much better communication.  We're
better at it when dealing with patches, but not when it comes to
physical meetings.

Maybe when some decision like "we're going to kill stuff still
using the old gettimeoffset API" then _someone_ needs to identify
which platforms are using that and make sure that those maintainers
know about that decision, much the same way that if a patch were
to touch the gettimeoffset API, we'd make damn sure that such a
patch went to those affected by the change.

-- 
RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line in suburbia: sync at 12.1Mbps down 622kbps up
According to speedtest.net: 11.9Mbps down 500kbps up


Re: [RFC PATCH 01/13] arm: Fix mutual exclusion in arch_gettimeoffset

2018-11-14 Thread Geert Uytterhoeven
Hi Russell,

On Wed, Nov 14, 2018 at 3:16 PM Russell King - ARM Linux
 wrote:
> On Wed, Nov 14, 2018 at 02:17:09PM +1100, Finn Thain wrote:
> > So, even assuming that you're right about the limitations of single-timer
> > platforms in general, removal of arch_gettimeoffset wouldn't require the
> > removal of any platforms, AFAICT.
>
> I haven't proposed removing platforms.
>
> I'm just objecting to the idea of removing arch_gettimeoffset(),
> thereby causing a regression by changing the resolution of
> gettimeofday() without any sign of equivalent functionality.
>
> However, I now see (having searched mailing lists) what you are
> trying to do - you have _not_ copied me or the mailing lists I'm
> on with your cover message, so I'm *totally* lacking in the context
> of your patch series, particularly where you are converting m68k
> to use clocksources without needing the gettimeoffset() stuff.
>
> You have failed to explain that in this thread - probably assuming
> that I've read your cover message.  I haven't until now, because
> you never sent it to me or the linux-arm-kernel mailing list.
>
> I have found this thread _very_ frustrating, and frankly a waste of
> my time discussing the finer points because of this lack of context.
> Please ensure that if you're going to be sending a patch series,
> that the cover message at least finds its way to the intended
> audience of your patches, so that everyone has the context they
> need when looking at (eg) the single patch they may receive.
>
> Alternatively, if someone raises a problem with the patch, and you
> _know_ you haven't done that, then please consider informing them
> where they can get more context, eg, by providing a link to your
> archived cover message.  It would help avoid misunderstandings.

Sorry for the lack of context.
The real trigger was also not explained in the cover message, and was a
the threat to remove platforms not using modern timekeeping APIs, cfr.
 "Removing support for old hardware from the kernel"
(https://lwn.net/Articles/769468/).

Gr{oetje,eeting}s,

Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- ge...@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds


Re: [RFC PATCH 01/13] arm: Fix mutual exclusion in arch_gettimeoffset

2018-11-14 Thread Russell King - ARM Linux
On Wed, Nov 14, 2018 at 02:17:09PM +1100, Finn Thain wrote:
> On Tue, 13 Nov 2018, Russell King - ARM Linux wrote:
> 
> > 
> > A clocksource provides a cycle counter that monotonically changes and 
> > does not wrap between clockevent events.
> > 
> > A clock event is responsible for providing events to the system when 
> > some work is needing to be done, limited by the wrap interval of the 
> > clocksource.
> > 
> > Each time the clock event triggers an interrupt, the clocksource is
> > read to determine how much time has passed, using:
> > 
> > count = (new_value - old_value) & available_bits
> > nanosecs = count * scale >> shift;
> > 
> > If you try to combine the clocksource and clockevent because you only
> > have a single counter, and the counter has the behaviour of:
> > - counting down towards zero
> > - when reaching zero, triggers an interrupt, and reloads with N
> > 
> > then this provides your clockevent, but you can't use this as a clock
> > source, because each time you receive an interrupt and try to read the
> > counter value, it will be approximately the same value.  This means
> > that the above calculation fails to register the correct number of
> > nanoseconds passing.  Hence, this does not work.
> > 
> > Also note where I said above that the clock event device must be able
> > to provide an interrupt _before_ the clocksource wraps - clearly with
> > such a timer, that is utterly impossible.
> > 
> > The simple solution is to not use such a counter as the clocksource,
> > which means you fall back to using the jiffies clocksource, and your
> > timekeeping has jiffy resolution - so 10ms, or possibly 1ms if you
> > want a 1kHz timer interval.  For most applications, that's simply way
> > to coarse, as was realised in the very early days of Linux.
> > 
> > If only there was a way to interpolate between timer interrupts...
> > which is exactly what arch_gettimeoffset() does, and is a historical
> > reminant of the pre-clocksource/clockevent days of the kernel - but
> > it is the only way to get better resolution from this sort of setup.
> > 
> 
> Both of the platforms in question (RPC and EBSA110) have not 
> defined(CONFIG_GENERIC_CLOCKEVENTS) and have not defined any struct 
> clock_event_device, AFAICT.

Correct, because they don't use clockevents.  This is pre-clocksource,
pre-clockevent code, it uses the old timekeeping infrastructure,
which is xtime_update() and providing a gettimeoffset implementation.

> So, even assuming that you're right about the limitations of single-timer 
> platforms in general, removal of arch_gettimeoffset wouldn't require the 
> removal of any platforms, AFAICT.

I haven't proposed removing platforms.

I'm just objecting to the idea of removing arch_gettimeoffset(),
thereby causing a regression by changing the resolution of
gettimeofday() without any sign of equivalent functionality.

However, I now see (having searched mailing lists) what you are
trying to do - you have _not_ copied me or the mailing lists I'm
on with your cover message, so I'm *totally* lacking in the context
of your patch series, particularly where you are converting m68k
to use clocksources without needing the gettimeoffset() stuff.

You have failed to explain that in this thread - probably assuming
that I've read your cover message.  I haven't until now, because
you never sent it to me or the linux-arm-kernel mailing list.

I have found this thread _very_ frustrating, and frankly a waste of
my time discussing the finer points because of this lack of context.
Please ensure that if you're going to be sending a patch series,
that the cover message at least finds its way to the intended
audience of your patches, so that everyone has the context they
need when looking at (eg) the single patch they may receive.

Alternatively, if someone raises a problem with the patch, and you
_know_ you haven't done that, then please consider informing them
where they can get more context, eg, by providing a link to your
archived cover message.  It would help avoid misunderstandings.

Thanks.

-- 
RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line in suburbia: sync at 12.1Mbps down 622kbps up
According to speedtest.net: 11.9Mbps down 500kbps up


Re: [RFC PATCH 01/13] arm: Fix mutual exclusion in arch_gettimeoffset

2018-11-14 Thread Russell King - ARM Linux
On Mon, Nov 12, 2018 at 03:12:39PM +1100, Finn Thain wrote:
> Implementations of arch_gettimeoffset are generally not re-entrant
> and assume that interrupts have been disabled. Unfortunately this
> pre-condition got broken in v2.6.32.

To me, it looks way worse than what you think.

The original code sequence before conversion to arch_gettimeoffset()
was:

1. lock (inc. disabling IRQs)
2. read gettimeoffset correction
3. add offset to current xtime
4. unlock

This means there is no chance for a timer interrupt to happen between
reading the current offset and adding it to the current kernel time.
If the timer does roll over, then the interrupt will remain pending,
so the whole "read offset and add to xtime" is one atomic block and
we can use the pending interrupt to indicate whether the timer has
rolled over and apply the appropriate offset correction.

With the arch_gettimeoffset() code, that changes:

1. read clocksource (which will be jiffies)
2. compute clocksource delta
3. increment nanoseconds
4. add gettimeoffset correction

If we receive a timer interrupt part-way through this, for example,
between 2 and 3, then jiffies will increment (via do_timer()) and the
interrupt will be cleared.  This means that the check in
ioc_timer_gettimeoffset() for a pending interrupt, for example, will
be false, and we will not know that an interrupt has happened.

So, unless I'm missing something, commit 5cfc8ee0bb51 well and truely
broke the accuracy of timekeeping on these platforms, and will result
in timekeeping spontaneously jumping backwards.  I had been wondering
why NTP had become less stable on EBSA110, but never investigated.

However, that still does not justify blowing away this functionality
without replacement, as Finn has been proposing.

-- 
RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line in suburbia: sync at 12.1Mbps down 622kbps up
According to speedtest.net: 11.9Mbps down 500kbps up


Re: m68k using deprecated internal APIs?

2018-11-14 Thread Linus Walleij
On Sat, Nov 10, 2018 at 10:47 PM Arnd Bergmann  wrote:
> On Fri, Nov 9, 2018 at 12:42 AM Finn Thain  wrote:
> > On Sun, 28 Oct 2018, Geert Uytterhoeven wrote:
> > > > > The example I gave was GENERIC_CLOCKEVENTS on m68, which is
> > > > > supported on most but not all machines there.
> > > >
> > > > That suggests that the removal of just those machines would suffice
> > > > (as opposed to the removal of the entire arch).
> > > >
> > > > Also, Documentation/features/time/clockevents/arch-support.txt says,
> > > > |m68k: |  ok  |
> > > >
> > > > These two observations make me wonder whether the clockevents feature
> > > > is related to the discussion quoted above (?)
> > >
> > > GENERIC_CLOCKEVENTS is selected only for a few Coldfire CPU types.
> > >
> >
> > !GENERIC_CLOCKEVENTS implies PREEMPT_NONE, and disables the "Timers
> > subsystem" (i.e. the NO_HZ config options), but it works fine -- I was
> > able to convert the Mac port to !ARCH_USES_GETTIMEOFFSET &&
> > !GENERIC_CLOCKEVENTS. (Like many of the Coldfire CPU types.)
> >
> > You can see those patches here,
> > https://github.com/fthain/linux/commits/mac68k-queue/

This is looking very good. FWIW:
Acked-by: Linus Walleij 

Apart from this (which is the most important step!) I think the custom
LED heartbeat code in kernel/time.c needs to be replaced with a standard
drivers/leds driver for each LED using the "heartbeat" trigger as is custom
these days.

That should clean out another chunk of legacy time-related code.

I suppose you are currently keeping the call to timer_interrupt() for
exactly this reason (i.e. keep the heartbeat LED blinking)?

> > Note that !ARCH_USES_GETTIMEOFFSET is a build-time choice, meaning that
> > all platforms need to be converted together.
> >
> > Also, some platforms will need to adopt the clocksource API, otherwise the
> > built-in "jiffies" clocksource will get used, causing a regression in
> > timer accuracy.
> >
> > Conversion to the clocksource API is straight-forward where the platform
> > already implements arch_gettimeoffset. The conversion is discussed in
> > include/linux/time.h:
> >
> > /* Some architectures do not supply their own clocksource.
> >  * This is mainly the case in architectures that get their
> >  * inter-tick times by reading the counter on their interval
> >  * timer. Since these timers wrap every tick, they're not really
> >  * useful as clocksources. Wrapping them to act like one is possible
> >  * but not very efficient. So we provide a callout these arches
> >  * can implement for use with the jiffies clocksource to provide
> >  * finer then tick granular time.
> >  */
> >
> > (Not sure what is meant by "not very efficient" here.)
> >
> > Certain 680x0 platforms do not implement arch_gettimeoffset: apollo, q40,
> > sun3, sun3x. These platforms can fall back on the "jiffies" clocksource
> > with no loss of timer accuracy, so conversion for these platforms is
> > trivial.
> >
> > Should I continue with the clocksource API conversion for the other
> > platforms: amiga, atari, bvme6000, hp300, mvme147, mvme16x? This would
> > allow for removal of "select ARCH_USES_GETTIMEOFFSET" (and satisfy
> > Documentation/features/time/modern-timekeeping) without loss of timer
> > accuracy.

If you ask me: forge ahead.

> > Alternatively, we could defer the clocksource API conversion, leaving it
> > up to the platform maintainers (who can actually test the driver changes).
> > But that would mean that many platforms would suffer a regression in timer
> > accuracy upon removal of arch_gettimeoffset.
>
> Adding the timekeeping maintainers and Linus Walleij to Cc. Linus
> worked on removing ARCH_USES_GETTIMEOFFSET on ARM
> several platforms in the past.
>
> I noticed that the last timer rework involving
> CONFIG_ARCH_USES_GETTIMEOFFSET was back in 2012 with
> commit 7b1f62076bba ("time: convert arch_gettimeoffset to a pointer").
> At the time, we had cris, m32r, blackfin, m68k and lots of ARM
> platforms, now it's only two StrongARM platforms (RPC and
> ARCH_EBSA110)  and classic m68k.

Believe it or not, I managed to procure both machines (RPC and
EBSA110). Whether I can get them to boot Linux is still an open
question.

I suppose it would be possible to use a conversion strategy
similar to Finn's and get rid of gettimeoffset altogether, if I could
only test it on these boards.

Yours.
Linus Walleij