[reiserfs-list] Re: [PATCH] Significant performace improvements on reiserfs systems

2001-09-20 Thread Robert Love

On Thu, 2001-09-20 at 19:15, Andrea Arcangeli wrote:
> All I'm saying is that you should check for >= 0, not == 0.

And I am saying we already keep track of that, we have a preemption
counter.

> But anwyays it's pretty depressing to see such a costly check needed to
> get latency right with the preemptive kernel approch, with
> non-preemptive kernel we'd need to just check need_resched and a call
> schedule in the unlikely case so it would be even lighter :) and no
> fixed costs in UP spinlocks, per-cpu datastrctures etc... The point of
> preemptive kernel would be just to prevent us to put such kind of
> explicit (costly) checks in the code. My theory was that the cpu-costly
> loops are mostly protected by some lock anyways and the fact you're
> writing such horrid (but helpful) code is a kind of proof.

Well, with the preemptive kernel we already account for 90% of the high
latency areas.

Doing the "horrid" solution may solve some other issues, but agreeably
you are right its not the prettiest solution.

I don't agree, however, that its that much more costly, and maybe I am
missing something.  Assuming we are SMP (and thus have locks), where is
there a lot more overhead?  We check current->need_resched (which we
dont _have_ to), call unlock_kernel() and then call lock_kernel(), with
preemption happening automatically in between.  The preemption code
merely checks a counter and calls preempt_schedule() if needed.

Now, yes, this is not ideal.  Ideally we don't need any of this muck. 
Ideally preemption provides everything we need.  So, towards the future,
we can work towards that.  For very short locks, we could just disable
interrupts (a lot less overhead).  For long-held locks, we can replace
them with a more efficient lock -- spin-then-sleep or a
priority-inheriting mutex lock, for example.

I don't want to confuse the above, which is perhaps an ideal system for
inclusion in 2.5, with trying to lower latency further for those who
care via conditional scheduling and the such.

We already have average latency of <1ms with peaks 10-50ms.

-- 
Robert M. Love
rml at ufl.edu
rml at tech9.net




[reiserfs-list] Re: [PATCH] Significant performace improvements on reiserfs systems

2001-09-20 Thread Dieter Nützel

On Thu, 20 Sep 2001, Beau Kuiper wrote:
> > On Thu, 20 Sep 2001, Chris Mason wrote:
> > >
> > On Thursday, September 20, 2001 03:12:44 PM +0800 Beau Kuiper
> <[EMAIL PROTECTED]> wrote:
> >
> > > Hi,
> > >
> > > Resierfs on 2.4 has always been bog slow.
> > >
> > > I have identified kupdated as the culprit, and have 3 patches that fix
> > > the peformance problems I have had been suffering.
> >
> > Thanks for sending these along.
> >
> > >
> > > I would like these patches to be reviewed an put into the mainline
> > > kernel so that others can testthe changes.
> >
> > > Patch 1.
> > >
> > > This patch fixes reiserfs to use the kupdated code path when told to
> > > resync its super block, like it did in 2.2.19. This is the culpit for
> > > bad reiserfs performace in 2.4. Unfortunately, this fix relies on the
> > > second patch to work properly.
> >
> > I promised linus I would never reactivate this code, it is just too nasty
> > ;-)  The problem is that write_super doesn't know if it is called from
> > sync or from kupdated.  The right fix is to have an extra param to
> > write_super, or another super_block method that gets called instead of
> > write_super when an immediate commit is not required.

I examined that ReiserFS suffer from kupdated since 2.4.7-ac3.
When ever I do "kill -STOP kupdated" the performance is much better.
I know this is unsafe...

> I don't think that this could happen until 2.5.x though, as either
> solution touches every file system. However, if we added an extra methed,
> we could do this while only slightly touching the other filesystems (where
> kupdated sync == real sync) Simply see if the method exists (is non-null)
> and call that instead with a kupdate sync instead of the normal
> super_sync. Are you interested in me writing a patch to do this?
>
> >
> > It is possible to get almost the same behaviour as 2.2.x by changing the
> > metadata sync interval in bdflush to 30 seconds.
> >
>
> But then kupdate doesn't flush normal data as regularly as it should, plus
> it is almost as messy as Patch 1 :-)
>
> > >
> > > Patch 2
> > >
> > > This patch implements a simple mechinism to ensure that each superblock
> > > only gets told to be flushed once. With reiserfs and the first patch,
> > > the superblock is still dirty after being told to sync (probably
> > > becasue it doesn't want to write out the entire journal every 5 seconds
> > > when kupdate calls it). This caused an infinite loop because sync_supers
> > > would always find the reiserfs superblock dirty when called from
> > > kupdated. I am not convinced that this patch is the best one for this
> > > problem (suggestions?)
> >
> > It is ok to leave the superblock dirty, after all, since the commit wasn't
> > done, the super is still dirty.  If the checks from reiserfs_write_super
> > are actually slowing things down, then it is probably best to fix the
> > checks.
>
> I meant, there might be better wway to prevent the endless loop than
> adding an extra field to the superblock data structure. I beleive (I
> havn't explored reiserfs code much) the slowdown is caused by the journal
> being synced with the superblock, thus causing:
>
> 1) Too much contention for disk resources.
> 2) A huge increase in the number of times programs must be suspended to
> wait for the disk

Please have a look at Robert Love's Linux kernel preemption patches and the 
conversation about my reported latency results.

It seems that ReiserFS is involved in the poor audio behavior (hiccups during 
MP2/MP3/Ogg-Vorbis playback).

Re: [PATCH] Preemption Latency Measurement Tool
http://marc.theaimsgroup.com/?l=linux-kernel&m=100097432006605&w=2

Taken from Andrea's latest post:

> those are kernel addresses, can you resolve them via System.map rather
> than trying to find their start/end line number?
>
> > Worst 20 latency times of 8033 measured in this period.
> >   usec  cause mask   start line/file  address   end line/file
> >  10856  spin_lock1  1376/sched.c c0114db3   697/sched.c

I can (with Randy Dunlap's ksysmap, 
http://www.osdlab.org/sw_resources/scripts/ksysmap).

SunWave1>./ksysmap /boot/System.map c0114db3
ksysmap: searching '/boot/System.map' for 'c0114db3'

c0114d60 T preempt_schedule
c0114db3 . <
c0114e10 T wake_up_process

> with dbench 48 we gone to 10msec latency as far I can see (still far
> from 0.5~1 sec). dbench 48 is longer so more probability to get the
> higher latency, and it does more I/O, probably also seeks more, so thre
> are many variables (slower insection in I/O queues first of all, etcll).
> However 10msec isn't that bad, it means 100hz, something that the human
> eye cannot see. 0.5~1 sec would been horribly bad latency instead.. :)
>
> >   10705BKL1  1302/inode.c c016f359   697/sched.c

c016f300 T reiserfs_dirty_inode
c016f359 . <
c016f3f0 T reiserfs_sync_inode

> >   10577  spin_lock1  1376/sched.c c0114db3   303/namei.c

c0114d60 T preemp

[reiserfs-list] Re: [PATCH] Significant performace improvements on reiserfs systems

2001-09-20 Thread Chris Mason



On Thursday, September 20, 2001 07:08:25 PM +0200 Dieter Nützel
<[EMAIL PROTECTED]> wrote:


> Please have a look at Robert Love's Linux kernel preemption patches and
> the  conversation about my reported latency results.
> 

Andrew Morton has patches that significantly improve the reiserfs latency,
looks like the last one he sent me was 2.4.7-pre9.  He and I did a bunch of
work to make sure they introduce schedules only when it was safe.

Andrew, are these still maintained or should I pull out the reiserfs bits?

-chris




[reiserfs-list] Re: [PATCH] Significant performace improvements on reiserfs systems

2001-09-20 Thread Robert Love

On Thu, 2001-09-20 at 18:37, Andrea Arcangeli wrote:
> On Thu, Sep 20, 2001 at 06:24:48PM -0400, Robert Love wrote:
> >
> > if (current->need_resched && current->lock_depth == 0) {
> > unlock_kernel();
> > lock_kernel();
> > }

> nitpicking: the above is fine but it isn't complete, it may work for
> most cases but for a generic function it would be better implemented
> similarly to release_kernel_lock_save/restore so you take care of
> lock_depth > 0 too:

Let me explain a little about the patch, and then I am interested in if
your opinion changes.

When unlock_kernel() is called, the preemption code will be enabled and
check if the preemption count is non-zero -- its handled just like a
recursive lock.  If there are other locks held, preemption won't
happen.  Thus, we return to the caller code patch and lock_kernel() is
called and everyone is happy.

If unlock_kernel() is called, and the preemption code (which hangs off
the bottom of the locking code) sees that the lock depth is now 0, it
will call preempt_schedule (if needed) and allow a new process to run.
Then we return to the original code patch, call lock_kernel, and
continue.

Or am I missing something?

-- 
Robert M. Love
rml at ufl.edu
rml at tech9.net




[reiserfs-list] Re: [PATCH] Significant performace improvements on reiserfs systems

2001-09-20 Thread Robert Love

On Thu, 2001-09-20 at 13:08, Dieter Nützel wrote:
> I examined that ReiserFS suffer from kupdated since 2.4.7-ac3.
> When ever I do "kill -STOP kupdated" the performance is much better.
> I know this is unsafe...

The patches that are going around in this thread (stuff from Andrew
Morton and Andrea) should address some of the ReiserFS issues.  Have you
tried them?

Note that many of the patches will _not_ improve recorded latency,
because anytime current->need_resched is unset, the spinlocks will not
be unlocked and thus preemption will not be enabled.  The patches only
measure the time premption is not enabled.

Of course, true latency will decrease, so see if your `hiccups'
decrease.

Perhaps if we start putting in some conditional schedules like this, I
can write a macro to use the instrumentation stuff to report the shorter
latencies.

> Please have a look at Robert Love's Linux kernel preemption patches and the 
> conversation about my reported latency results.
> 
> It seems that ReiserFS is involved in the poor audio behavior (hiccups during 
> MP2/MP3/Ogg-Vorbis playback).
> 
> Re: [PATCH] Preemption Latency Measurement Tool
> http://marc.theaimsgroup.com/?l=linux-kernel&m=100097432006605&w=2

> Taken from Andrea's latest post:
> 
> I can (with Randy Dunlap's ksysmap, 
> http://www.osdlab.org/sw_resources/scripts/ksysmap).
> 
> SunWave1>./ksysmap /boot/System.map c0114db3
> ksysmap: searching '/boot/System.map' for 'c0114db3'
> 
> c0114d60 T preempt_schedule
> c0114db3 . <
> c0114e10 T wake_up_process

Its really just as easy if not easier to lookup the file and line number
that /proc/latencytimes reports.

> Unneeded kernel locks/stalls which hurt latency and (global) throughput.
> 
> I will do some benchmarks against Andrea's VM
> 2.4.10-pre12 + patch-rml-2.4.10-pre12-preempt-kernel-1 + 
> patch-rml-2.4.10-pre12-preempt-stats-1
> 
> Hope this post isn't to long and nonbody feels offended.

No, thank you for it.

-- 
Robert M. Love
rml at ufl.edu
rml at tech9.net




[reiserfs-list] Re: [PATCH] Significant performace improvements on reiserfs systems

2001-09-20 Thread Robert Love

On Thu, 2001-09-20 at 13:39, Andrew Morton wrote:
> > Andrew, are these still maintained or should I pull out the reiserfs bits?
> 
> This is the reiserfs part - it applies to 2.4.10-pre12 OK.
> 
> For the purposes of Robert's patch, conditional_schedule()
> should be defined as 
> 
>   if (current->need_resched && current->lock_depth == 0) {
>   unlock_kernel();
>   lock_kernel();
>   }
> 
> which is somewhat crufty, because the implementation of lock_kernel()
> is arch-specific.  But all architectures seem to implement it the same way.

Note that we only support preemption on i386 thus far, so this is not an
issue (yet).

One other note, in not all cases is it OK to just drop the one lock
(above it is).  Say we hold a spinlock, but another lock (most notably
the BKL) is also held.  Since the preemption counter is recursive, we
still won't allow preemption until all locks are dropped.  In this case,
we need to see what else needs to be dropped, or just call schedule()
explicitly.

> 

Looks nice, Andrew.

Anyone try this? (I don't use ReiserFS).

I am putting together a conditional scheduling patch to fix some of the
worst cases, for use in conjunction with the preemption patch, and this
might be useful.

-- 
Robert M. Love
rml at ufl.edu
rml at tech9.net




[reiserfs-list] Re: [PATCH] Significant performace improvements on reiserfs systems

2001-09-20 Thread Dieter Nützel

Am Donnerstag, 20. September 2001 22:52 schrieb Robert Love:
> On Thu, 2001-09-20 at 13:39, Andrew Morton wrote:
> > > Andrew, are these still maintained or should I pull out the reiserfs
> > > bits?
> >
> > This is the reiserfs part - it applies to 2.4.10-pre12 OK.
> >
> > For the purposes of Robert's patch, conditional_schedule()
> > should be defined as
> >
> > if (current->need_resched && current->lock_depth == 0) {
> > unlock_kernel();
> > lock_kernel();
> > }
> >
> > which is somewhat crufty, because the implementation of lock_kernel()
> > is arch-specific.  But all architectures seem to implement it the same
> > way.

> > 
>
> Looks nice, Andrew.
>
> Anyone try this? (I don't use ReiserFS).

Yes, I will...:-)
Send it along.

>
> I am putting together a conditional scheduling patch to fix some of the
> worst cases, for use in conjunction with the preemption patch, and this
> might be useful.

The conditional_schedule() function hampered me from running it already.

-Dieter



Re: [reiserfs-list] Re: [PATCH] Significant performace improvements on reiserfs systems

2001-09-21 Thread Alan Cox

> In Solaris, before spinning on a busy spin-lock, thread checks whether
> spin-lock holder runs on the same processor. If so, thread goes to sleep
> and holder wakes it up on spin-lock release. The same, I guess is going


> for interrupts that are served as separate threads. This way, one can
> re-schedule with spin-locks held.

This is one of the things interrupt handling by threads gives you, but the
performance cost is not nice. When you consider that ksoftirqd when it
kicks in (currently far too often) takes up to 10% off gigabit ethernet
performance, you can appreciate why we don't want to go that path.

Our spinlock paths are supposed to be very small and predictable. Where 
there is sleeping involved we have semaphores.

As lockmeter shows we still have a few io_request_lock cases at least where
we lock for far too long

Alan