On Tue, Jan 03, 2017 at 10:29:58PM +1000, Nicholas Piggin wrote: > > kernel building showed nothing unusual on any machine > > > > git checkout in a loop showed; > > o minor gains with Nick's patch > > o no impact from Linus's patch > > o flat performance from PeterZ's > > > > git test suite showed > > o close to flat performance on all patches > > o Linus' patch on top showed increased variability but not serious > > I'd be really surprised if Linus's patch is actually adding variability > unless it is just some random cache or branch predictor or similar change > due to changed code sizes. Testing on skylake CPU showed the old sequence > takes a big stall with the load-after-lock;op hazard. > > So I wouldn't worry about it too much, but maybe something interesting to > look at for someone who knows x86 microarchitectures well. >
Agreed, it looked like a testing artifact. Later in the day it was obvious that different results were obtained between boots and minor changes in timing. It's compounded by the fact that this particular test is based on /tmp so different people will get different results depending on the filesystem. In my case, that was btrfs. > > > > will-it-scale pagefault tests > > o page_fault1 and page_fault2 showed no differences in processes > > > > o page_fault3 using processes did show some large losses at some > > process counts on all patches. The losses were not consistent on > > each run. There also was no consistently at loss with increasing > > process counts. It did appear that Peter's patch had fewer > > problems with only one thread count showing problems so it > > *may* be more resistent to the problem but not completely and > > it's not obvious why it might be so it could be a testing > > anomaly > > Okay. page_fault3 has each process doing repeated page faults on their > own 128MB file in /tmp. Unless they fill memory and start to reclaim, > (which I believe must be happening in Dave's case) there should be no > contention on page lock. After the patch, the uncontended case should > be strictly faster when there is no contention. > It's possible in the test that I setup that it's getting screwed by the filesystem used. It's writing that array and on a COW filesystem there is a whole bucket of variables such as new block allocations, writeback timing etc. The writeback timing alone means that this test is questionable because the results will depend on when background writeback triggers. I'm not sure how Dave is controlling for these factors. As an aside, will-it-scale page_fault was one of the tests I had dropped way down on my list of priorities to watch a long time ago. It's a short-lived filesystem-based test vunerable to timing issues and highly variable that didn't seem worth controlling for at the time. I queued the test for a look on the rough offchance your patches were falling over something obvious. If it is, I haven't spotted it yet. Profiles are very similar. Separate runs with lock stat show detectable but negligable contentions on page_wait_table. The bulk of the contention is within the filesystem. > When there is contention, there is an added cost of setting and clearing > page waiters bit. Maybe there is some other issue there... are you seeing > the losses in uncontended case, contended, or both? > I couldn't determine from the profile whether it was uncontended or not. The fact that there are boot-to-boot variations makes it difficult to determine if a profile vs !profile run is equivalent without using debugging patches to gather stats. lock_stat does show large numbers of contentions all right but the vast bulk of them are internal to btrfs. -- Mel Gorman SUSE Labs