Bug#561203: threads and fork on machine with VIPT-WB cache

2010-06-10 Thread dann frazier
On Thu, Jun 10, 2010 at 07:30:45PM +0300, Modestas Vainius wrote:
> Hello,
> 
> On sekmadienis 06 Bir??elis 2010 04:01:23 Modestas Vainius wrote:
> > On penktadienis 04 Bir??elis 2010 08:21:06 dann frazier wrote:
> > > > My case and my analysis talked about UP kernel, and John David Anglin
> > > > 
> > > > made a patch:
> > > > http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=561203#144
> > > > 
> > > > After that, the discussion went to SMP cases.
> > > > 
> > > > It would be better to evaluate the patch again, and make sure it works
> > > > for UP case and fix failures of buildd, then apply for Linux in Debian
> > > > (only) for HPPA.
> > > > 
> > > > I know that the patch is not that ideal because it touches
> > > > architecture independent part of Linux, but it is worth for Linux in
> > > > Debian (or Linux for the HPPA machine of buildd, at least).
> > > 
> > > I'm happy to test the patch if necessary to help push this change
> > > upstream. However, we do need the change to go upstream before we can
> > > include it in the Debian kernel.
> > 
> > I made a hackish patch for QProcess in Qt (usleep(1000) before fork())
> > which seems to reduce likelihood of the failure to very rare again. Once a
> > new revision of qt4-x11 is uploaded to sid (soon I believe), KDE
> > applications should be able to build again (hopefully).
> 
> qt4-x11/hppa 4:4.6.3-1 has recently been uploaded to incoming. It has my hppa 
> hack applied. Therefore please give back the following KDE packages on hppa:
> 
> kde4libs basket kdesvn webkitkde kraft konq-plugins

done.




-- 
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org



Bug#561203: threads and fork on machine with VIPT-WB cache

2010-06-10 Thread Modestas Vainius
Hello,

On sekmadienis 06 Birželis 2010 04:01:23 Modestas Vainius wrote:
> On penktadienis 04 Birželis 2010 08:21:06 dann frazier wrote:
> > > My case and my analysis talked about UP kernel, and John David Anglin
> > > 
> > > made a patch:
> > >   http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=561203#144
> > > 
> > > After that, the discussion went to SMP cases.
> > > 
> > > It would be better to evaluate the patch again, and make sure it works
> > > for UP case and fix failures of buildd, then apply for Linux in Debian
> > > (only) for HPPA.
> > > 
> > > I know that the patch is not that ideal because it touches
> > > architecture independent part of Linux, but it is worth for Linux in
> > > Debian (or Linux for the HPPA machine of buildd, at least).
> > 
> > I'm happy to test the patch if necessary to help push this change
> > upstream. However, we do need the change to go upstream before we can
> > include it in the Debian kernel.
> 
> I made a hackish patch for QProcess in Qt (usleep(1000) before fork())
> which seems to reduce likelihood of the failure to very rare again. Once a
> new revision of qt4-x11 is uploaded to sid (soon I believe), KDE
> applications should be able to build again (hopefully).

qt4-x11/hppa 4:4.6.3-1 has recently been uploaded to incoming. It has my hppa 
hack applied. Therefore please give back the following KDE packages on hppa:

kde4libs basket kdesvn webkitkde kraft konq-plugins


-- 
Modestas Vainius 


signature.asc
Description: This is a digitally signed message part.


Bug#561203: threads and fork on machine with VIPT-WB cache

2010-06-07 Thread dann frazier
On Fri, Jun 04, 2010 at 12:44:55PM +0200, Thibaut VARENE wrote:
> On Fri, Jun 4, 2010 at 7:21 AM, dann frazier  wrote:
> > On Fri, Jun 04, 2010 at 10:03:07AM +0900, NIIBE Yutaka wrote:
> >> Modestas Vainius wrote:
>  Note that Debian's buildds run a UP kernel, so as soon as those fixes
>  go upstream we can pull them in. Thanks for all your work here!
> 
> >>>
> >>> Well, as long as this is unfixed or at least "common", I don't see how 
> >>> hppa
> >>> can be considered to be a release arch. Is that UP patch available 
> >>> somewhere?
> >>
> >> My case and my analysis talked about UP kernel, and John David Anglin
> >> made a patch:
> >>       http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=561203#144
> >>
> >> After that, the discussion went to SMP cases.
> >>
> >> It would be better to evaluate the patch again, and make sure it works
> >> for UP case and fix failures of buildd, then apply for Linux in Debian
> >> (only) for HPPA.
> >>
> >> I know that the patch is not that ideal because it touches
> >> architecture independent part of Linux, but it is worth for Linux in
> >> Debian (or Linux for the HPPA machine of buildd, at least).
> >
> > I'm happy to test the patch if necessary to help push this change
> > upstream. However, we do need the change to go upstream before we can
> > include it in the Debian kernel.
> 
> Just for reference, I've summarized the test cases and related patches here:
> http://wiki.parisc-linux.org/TestCases

Cool - that is helpful. I've updated the kernel on peri/penalosa with
the various patches listed there that have gone upstream, but I'm not
seeing better results with any failing packages.

btw, I thought it would be useful to edit that page and tag each patch
with its status in Debian (in-official-kernel, installed-on-buildds,
etc), but the page appears to be immutable.

-- 
dann frazier




--
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org



Bug#561203: threads and fork on machine with VIPT-WB cache

2010-06-05 Thread Modestas Vainius
Hello,

On penktadienis 04 Birželis 2010 08:21:06 dann frazier wrote:
> > My case and my analysis talked about UP kernel, and John David Anglin
> > 
> > made a patch:
> > http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=561203#144
> > 
> > After that, the discussion went to SMP cases.
> > 
> > It would be better to evaluate the patch again, and make sure it works
> > for UP case and fix failures of buildd, then apply for Linux in Debian
> > (only) for HPPA.
> > 
> > I know that the patch is not that ideal because it touches
> > architecture independent part of Linux, but it is worth for Linux in
> > Debian (or Linux for the HPPA machine of buildd, at least).
> 
> I'm happy to test the patch if necessary to help push this change
> upstream. However, we do need the change to go upstream before we can
> include it in the Debian kernel.

I made a hackish patch for QProcess in Qt (usleep(1000) before fork()) which 
seems to reduce likelihood of the failure to very rare again. Once a new 
revision of qt4-x11 is uploaded to sid (soon I believe), KDE applications 
should be able to build again (hopefully).

Obviously it would be better to get this bug fixed for real but at least now 
the whole KDE stack won't be held by it while we wait.

-- 
Modestas Vainius 


signature.asc
Description: This is a digitally signed message part.


Bug#561203: threads and fork on machine with VIPT-WB cache

2010-06-04 Thread Thibaut VARENE
On Fri, Jun 4, 2010 at 7:21 AM, dann frazier  wrote:
> On Fri, Jun 04, 2010 at 10:03:07AM +0900, NIIBE Yutaka wrote:
>> Modestas Vainius wrote:
 Note that Debian's buildds run a UP kernel, so as soon as those fixes
 go upstream we can pull them in. Thanks for all your work here!

>>>
>>> Well, as long as this is unfixed or at least "common", I don't see how hppa
>>> can be considered to be a release arch. Is that UP patch available 
>>> somewhere?
>>
>> My case and my analysis talked about UP kernel, and John David Anglin
>> made a patch:
>>       http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=561203#144
>>
>> After that, the discussion went to SMP cases.
>>
>> It would be better to evaluate the patch again, and make sure it works
>> for UP case and fix failures of buildd, then apply for Linux in Debian
>> (only) for HPPA.
>>
>> I know that the patch is not that ideal because it touches
>> architecture independent part of Linux, but it is worth for Linux in
>> Debian (or Linux for the HPPA machine of buildd, at least).
>
> I'm happy to test the patch if necessary to help push this change
> upstream. However, we do need the change to go upstream before we can
> include it in the Debian kernel.

Just for reference, I've summarized the test cases and related patches here:
http://wiki.parisc-linux.org/TestCases

HTH

-- 
Thibaut VARENE
http://www.parisc-linux.org/~varenet/



--
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org



Bug#561203: threads and fork on machine with VIPT-WB cache

2010-06-04 Thread Bastian Blank
severity 561203 serious
thanks

On Thu, Jun 03, 2010 at 11:50:05AM +0300, Modestas Vainius wrote:
> # Breaks unrelated applications

Sorry, no. Almost all applications are related to the kernel.

> Well, as long as this is unfixed or at least "common", I don't see how hppa 
> can be considered to be a release arch. Is that UP patch available somewhere?

This is up to the release team.

Bastian

-- 
You're dead, Jim.
-- McCoy, "Amok Time", stardate 3372.7



-- 
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org



Bug#561203: threads and fork on machine with VIPT-WB cache

2010-06-03 Thread dann frazier
On Fri, Jun 04, 2010 at 10:03:07AM +0900, NIIBE Yutaka wrote:
> Modestas Vainius wrote:
>>> Note that Debian's buildds run a UP kernel, so as soon as those fixes
>>> go upstream we can pull them in. Thanks for all your work here!
>>>
>>
>> Well, as long as this is unfixed or at least "common", I don't see how hppa
>> can be considered to be a release arch. Is that UP patch available somewhere?
>
> My case and my analysis talked about UP kernel, and John David Anglin
> made a patch:
>   http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=561203#144
>
> After that, the discussion went to SMP cases.
>
> It would be better to evaluate the patch again, and make sure it works
> for UP case and fix failures of buildd, then apply for Linux in Debian
> (only) for HPPA.
>
> I know that the patch is not that ideal because it touches
> architecture independent part of Linux, but it is worth for Linux in
> Debian (or Linux for the HPPA machine of buildd, at least).

I'm happy to test the patch if necessary to help push this change
upstream. However, we do need the change to go upstream before we can
include it in the Debian kernel.



-- 
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org



Bug#561203: threads and fork on machine with VIPT-WB cache

2010-06-03 Thread NIIBE Yutaka

Modestas Vainius wrote:

Note that Debian's buildds run a UP kernel, so as soon as those fixes
go upstream we can pull them in. Thanks for all your work here!



Well, as long as this is unfixed or at least "common", I don't see how hppa
can be considered to be a release arch. Is that UP patch available somewhere?


My case and my analysis talked about UP kernel, and John David Anglin
made a patch:
http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=561203#144

After that, the discussion went to SMP cases.

It would be better to evaluate the patch again, and make sure it works
for UP case and fix failures of buildd, then apply for Linux in Debian
(only) for HPPA.

I know that the patch is not that ideal because it touches
architecture independent part of Linux, but it is worth for Linux in
Debian (or Linux for the HPPA machine of buildd, at least).
--



--
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org



Bug#561203: threads and fork on machine with VIPT-WB cache

2010-06-03 Thread Modestas Vainius
# Breaks unrelated applications
tags 561203 critical
thanks

Hello,

On trečiadienis 02 Birželis 2010 20:56:17 dann frazier wrote:
> On Wed, Jun 02, 2010 at 01:16:01PM -0400, John David Anglin wrote:
> > On Wed, 02 Jun 2010, Modestas Vainius wrote:
> > > Hello,
> > > 
> > > this bug [1] is back to the "very common" department with eglibc 2.11
> > > (libc6- dev_2.11.1-1) builds. Majority of KDE applications are failing
> > > to build on hppa again. Is there really nothing what could be done to
> > > fix it?
> > 
> > I will just say it is very tricky.  I think a fix is possible (arm and
> > mips had similar cache problems) but the victim replacement present in
> > PA8800/PA8900 caches makes the problem especially difficult  for
> > hardware using these processors.
> > 
> > I have spent the last few months testing various alternatives and have
> > now done hundreds of kernel builds.  I did post some experimental patches
> > that fix the problem on UP kernels.  However, the problem is not resolved
> > for SMP kernels.
> 
> Note that Debian's buildds run a UP kernel, so as soon as those fixes
> go upstream we can pull them in. Thanks for all your work here!
> 

Well, as long as this is unfixed or at least "common", I don't see how hppa 
can be considered to be a release arch. Is that UP patch available somewhere?

All KDE applications have been stuck in unstable before due to this and 
history is about to repeat itself unless something is done. While apparently a 
failing test in eglibc can be ignored, other applications have to suffer real 
world problems...

-- 
Modestas Vainius 


signature.asc
Description: This is a digitally signed message part.


Bug#561203: threads and fork on machine with VIPT-WB cache

2010-06-02 Thread dann frazier
On Wed, Jun 02, 2010 at 01:16:01PM -0400, John David Anglin wrote:
> On Wed, 02 Jun 2010, Modestas Vainius wrote:
> 
> > Hello,
> > 
> > this bug [1] is back to the "very common" department with eglibc 2.11 
> > (libc6-
> > dev_2.11.1-1) builds. Majority of KDE applications are failing to build on 
> > hppa again. Is there really nothing what could be done to fix it?
> 
> I will just say it is very tricky.  I think a fix is possible (arm and mips
> had similar cache problems) but the victim replacement present in 
> PA8800/PA8900
> caches makes the problem especially difficult  for hardware using these
> processors.
> 
> I have spent the last few months testing various alternatives and have
> now done hundreds of kernel builds.  I did post some experimental patches
> that fix the problem on UP kernels.  However, the problem is not resolved
> for SMP kernels.

Note that Debian's buildds run a UP kernel, so as soon as those fixes
go upstream we can pull them in. Thanks for all your work here!

> The minifail test is a good one to demonstrate the problem.  Indeed,
> a very similar test was given in the thread below:
> http://readlist.com/lists/vger.kernel.org/linux-kernel/54/270861.html
> 
> This thread also discusses the PA8800 problem:
> http://readlist.com/lists/vger.kernel.org/linux-kernel/54/271417.html
> 
> I currently surmise that we have a problem with the cache victim
> replacement, although the cause isn't clear.  I did find recently
> that the cache prefetch in copy_user_page_asm extends to the line
> beyond the end of the page, but fixing this doesn't resolve the problem.
> 
> I am still experimenting with using equivalent aliasing.  It does
> help to flush in ptep_set_wrprotect.
> 
> Dave

-- 
dann frazier




-- 
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org



Bug#561203: threads and fork on machine with VIPT-WB cache

2010-06-02 Thread John David Anglin
On Wed, 02 Jun 2010, Modestas Vainius wrote:

> Hello,
> 
> this bug [1] is back to the "very common" department with eglibc 2.11 (libc6-
> dev_2.11.1-1) builds. Majority of KDE applications are failing to build on 
> hppa again. Is there really nothing what could be done to fix it?

I will just say it is very tricky.  I think a fix is possible (arm and mips
had similar cache problems) but the victim replacement present in PA8800/PA8900
caches makes the problem especially difficult  for hardware using these
processors.

I have spent the last few months testing various alternatives and have
now done hundreds of kernel builds.  I did post some experimental patches
that fix the problem on UP kernels.  However, the problem is not resolved
for SMP kernels.

The minifail test is a good one to demonstrate the problem.  Indeed,
a very similar test was given in the thread below:
http://readlist.com/lists/vger.kernel.org/linux-kernel/54/270861.html

This thread also discusses the PA8800 problem:
http://readlist.com/lists/vger.kernel.org/linux-kernel/54/271417.html

I currently surmise that we have a problem with the cache victim
replacement, although the cause isn't clear.  I did find recently
that the cache prefetch in copy_user_page_asm extends to the line
beyond the end of the page, but fixing this doesn't resolve the problem.

I am still experimenting with using equivalent aliasing.  It does
help to flush in ptep_set_wrprotect.

Dave
-- 
J. David Anglin  dave.ang...@nrc-cnrc.gc.ca
National Research Council of Canada  (613) 990-0752 (FAX: 952-6602)



-- 
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org



Bug#561203: threads and fork on machine with VIPT-WB cache

2010-06-02 Thread Modestas Vainius
Hello,

this bug [1] is back to the "very common" department with eglibc 2.11 (libc6-
dev_2.11.1-1) builds. Majority of KDE applications are failing to build on 
hppa again. Is there really nothing what could be done to fix it?

1. http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=561203
2. 
https://buildd.debian.org/fetch.cgi?pkg=kde4libs;ver=4%3A4.4.4-1;arch=hppa;stamp=1275467025
3. 
https://buildd.debian.org/fetch.cgi?pkg=basket;ver=1.80-1;arch=hppa;stamp=1275483241

-- 
Modestas Vainius 


signature.asc
Description: This is a digitally signed message part.


Bug#561203: threads and fork on machine with VIPT-WB cache

2010-04-08 Thread John David Anglin
> On Thu, 08 Apr 2010, Helge Deller wrote:

> > I tested your patch today on one of my machines with plain kernel 2.6.33 
> > (32bit, SMP, B2000 I think).
> > Sadly I still did see the minifail bug.
> > 
> > Are you sure, that the patch fixed this bug for you?
> 
> Seemed to, but I have a bunch of other changes installed.  Possibly,
> the change to cacheflush.h is important.  It affects all PA8000.

I also think the change suggested by James

+   if (pte_dirty(old_pte))

is important for SMP.  With the patch set that I sent, my rp3440 and
gsyprf11 seem reasonably stable running 2.6.33.2 SMP.  I doubt all
problems are solved but things are a lot better than before.

Dave
-- 
J. David Anglin  dave.ang...@nrc-cnrc.gc.ca
National Research Council of Canada  (613) 990-0752 (FAX: 952-6602)



-- 
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org



Bug#561203: threads and fork on machine with VIPT-WB cache

2010-04-08 Thread John David Anglin
On Thu, 08 Apr 2010, Helge Deller wrote:

> On 04/02/2010 09:35 PM, John David Anglin wrote:
> > On Fri, 02 Apr 2010, NIIBE Yutaka wrote:
> > 
> >> NIIBE Yutaka wrote:
> >>> To have same semantics as other archs, I think that VIPT-WB cache
> >>> machine should have cache flush at ptep_set_wrprotect, so that memory
> >>> of the page has up-to-date data.  Yes, it will be huge performance
> >>> impact for fork.  But I don't find any good solution other than this
> >>> yet.
> >>
> >> I think we could do something like (only for VIPT-WB cache machine):
> >>
> >> -  static inline void ptep_set_wrprotect(struct mm_struct *mm, unsigned 
> >> long 
> >> address, pte_t *ptep)
> >>
> >> +  static inline void ptep_set_wrprotect(struct vm_area_struct *vma, 
> >> struct 
> >> mm_struct *mm, unsigned long addr, pte_t *ptep)
> >>{
> >>pte_t old_pte = *ptep;
> >> +  if (atomic_read(&mm->mm_users) > 1)
> >> +  flush_cache_page(vma, addr, pte_pfn(old_pte));
> >>set_pte_at(mm, addr, ptep, pte_wrprotect(old_pte));
> >>}
> > 
> > I tested the hack below on two machines currently running 2.6.33.2
> > UP kernels.  The change seems to fix Debian #561203 (minifail bug)!
> > Thus, I definitely think you are on the right track.  I'll continue
> > to test.
> > 
> > I suspect the same issue is present for SMP kernels.
> 
> Hi Dave,
> 
> I tested your patch today on one of my machines with plain kernel 2.6.33 
> (32bit, SMP, B2000 I think).
> Sadly I still did see the minifail bug.
> 
> Are you sure, that the patch fixed this bug for you?

Seemed to, but I have a bunch of other changes installed.  Possibly,
the change to cacheflush.h is important.  It affects all PA8000.

Dave
-- 
J. David Anglin  dave.ang...@nrc-cnrc.gc.ca
National Research Council of Canada  (613) 990-0752 (FAX: 952-6602)

diff --git a/arch/parisc/hpux/wrappers.S b/arch/parisc/hpux/wrappers.S
index 58c53c8..bdcea33 100644
--- a/arch/parisc/hpux/wrappers.S
+++ b/arch/parisc/hpux/wrappers.S
@@ -88,7 +88,7 @@ ENTRY(hpux_fork_wrapper)
 
STREG   %r2,-20(%r30)
ldo 64(%r30),%r30
-   STREG   %r2,PT_GR19(%r1);! save for child
+   STREG   %r2,PT_SYSCALL_RP(%r1)  ;! save for child
STREG   %r30,PT_GR21(%r1)   ;! save for child
 
LDREG   PT_GR30(%r1),%r25
@@ -132,7 +132,7 @@ ENTRY(hpux_child_return)
bl,nschedule_tail, %r2
 #endif
 
-   LDREG   TASK_PT_GR19-TASK_SZ_ALGN-128(%r30),%r2
+   LDREG   TASK_PT_SYSCALL_RP-TASK_SZ_ALGN-128(%r30),%r2
b fork_return
copy %r0,%r28
 ENDPROC(hpux_child_return)
diff --git a/arch/parisc/include/asm/atomic.h b/arch/parisc/include/asm/atomic.h
index 716634d..d7fabc4 100644
--- a/arch/parisc/include/asm/atomic.h
+++ b/arch/parisc/include/asm/atomic.h
@@ -24,29 +24,46 @@
  * Hash function to index into a different SPINLOCK.
  * Since "a" is usually an address, use one spinlock per cacheline.
  */
-#  define ATOMIC_HASH_SIZE 4
-#  define ATOMIC_HASH(a) (&(__atomic_hash[ (((unsigned long) 
(a))/L1_CACHE_BYTES) & (ATOMIC_HASH_SIZE-1) ]))
+#  define ATOMIC_HASH_SIZE (4096/L1_CACHE_BYTES)  /* 4 */
+#  define ATOMIC_HASH(a)  (&(__atomic_hash[ (((unsigned long) 
(a))/L1_CACHE_BYTES) & (ATOMIC_HASH_SIZE-1) ]))
+#  define ATOMIC_USER_HASH(a) (&(__atomic_user_hash[ (((unsigned long) 
(a))/L1_CACHE_BYTES) & (ATOMIC_HASH_SIZE-1) ]))
 
 extern arch_spinlock_t __atomic_hash[ATOMIC_HASH_SIZE] __lock_aligned;
+extern arch_spinlock_t __atomic_user_hash[ATOMIC_HASH_SIZE] __lock_aligned;
 
 /* Can't use raw_spin_lock_irq because of #include problems, so
  * this is the substitute */
-#define _atomic_spin_lock_irqsave(l,f) do {\
-   arch_spinlock_t *s = ATOMIC_HASH(l);\
+#define _atomic_spin_lock_irqsave_template(l,f,hash_func) do { \
+   arch_spinlock_t *s = hash_func; \
local_irq_save(f);  \
arch_spin_lock(s);  \
 } while(0)
 
-#define _atomic_spin_unlock_irqrestore(l,f) do {   \
-   arch_spinlock_t *s = ATOMIC_HASH(l);\
+#define _atomic_spin_unlock_irqrestore_template(l,f,hash_func) do {\
+   arch_spinlock_t *s = hash_func; \
arch_spin_unlock(s);\
local_irq_restore(f);   \
 } while(0)
 
+/* kernel memory locks */
+#define _atomic_spin_lock_irqsave(l,f) \
+   _atomic_spin_lock_irqsave_template(l,f,ATOMIC_HASH(l))
+
+#define _atomic_spin_unlock_irqrestore(l,f)\
+   _atomic_spin_unlock_irqrestore_template(l,f,ATOMIC_HASH(l))
+
+/* userspace memory locks */
+#define _atomic_spin_lock_irqsave_user(l,f)\
+   _atomic_spin_lock_irqsave_template(l,f,ATOMIC_USER_HASH(l))
+
+#define _atomic_spin_unlock_irqrestore_user(l,f)   \
+   _atomic_spin_unlock_irqrestore_template(l,f,ATOMIC_USER_HASH(l))
 
 #else
 #  define _atomic_spin_lock_irqsave(l,f) do { local_

Bug#561203: threads and fork on machine with VIPT-WB cache

2010-04-08 Thread Helge Deller
On 04/02/2010 09:35 PM, John David Anglin wrote:
> On Fri, 02 Apr 2010, NIIBE Yutaka wrote:
> 
>> NIIBE Yutaka wrote:
>>> To have same semantics as other archs, I think that VIPT-WB cache
>>> machine should have cache flush at ptep_set_wrprotect, so that memory
>>> of the page has up-to-date data.  Yes, it will be huge performance
>>> impact for fork.  But I don't find any good solution other than this
>>> yet.
>>
>> I think we could do something like (only for VIPT-WB cache machine):
>>
>> -static inline void ptep_set_wrprotect(struct mm_struct *mm, unsigned 
>> long 
>> address, pte_t *ptep)
>>
>> +static inline void ptep_set_wrprotect(struct vm_area_struct *vma, 
>> struct 
>> mm_struct *mm, unsigned long addr, pte_t *ptep)
>>  {
>>  pte_t old_pte = *ptep;
>> +if (atomic_read(&mm->mm_users) > 1)
>> +flush_cache_page(vma, addr, pte_pfn(old_pte));
>>  set_pte_at(mm, addr, ptep, pte_wrprotect(old_pte));
>>  }
> 
> I tested the hack below on two machines currently running 2.6.33.2
> UP kernels.  The change seems to fix Debian #561203 (minifail bug)!
> Thus, I definitely think you are on the right track.  I'll continue
> to test.
> 
> I suspect the same issue is present for SMP kernels.

Hi Dave,

I tested your patch today on one of my machines with plain kernel 2.6.33 
(32bit, SMP, B2000 I think).
Sadly I still did see the minifail bug.

Are you sure, that the patch fixed this bug for you?

Helge

do_page_fault() pid=21470 command='minifail3' type=6 address=0x0003
do_page_fault() pid=7986 command='minifail3' type=6 address=0x0003  
   
do_page_fault() pid=19952 command='minifail3' type=6 address=0x0003 
   
do_page_fault() pid=13549 command='minifail3' type=6 address=0x0003
do_page_fault() pid=21862 command='minifail3' type=6 address=0x0003
do_page_fault() pid=4615 command='minifail3' type=6 address=0x0003
do_page_fault() pid=17336 command='minifail3' type=6 address=0x0003
do_page_fault() pid=21986 command='minifail3' type=6 address=0x0003
do_page_fault() pid=2157 command='minifail3' type=15 address=0x00dc
do_page_fault() pid=23886 command='minifail3' type=6 address=0x0003
do_page_fault() pid=2681 command='minifail3' type=6 address=0x0003
do_page_fault() pid=3229 command='minifail3' type=15 address=0x00ec
do_page_fault() pid=26095 command='minifail3' type=6 address=0x0003
do_page_fault() pid=20722 command='minifail3' type=6 address=0x0003
do_page_fault() pid=19912 command='minifail3' type=15 address=0x00ec
...
pagealloc: memory corruption
7db0c780: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  
7db0c790: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  
7db0c7a0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  
7db0c7b0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  
Backtrace:
 [<1011ec14>] show_stack+0x18/0x28
 [<10117ba0>] dump_stack+0x1c/0x2c
 [<101c6594>] kernel_map_pages+0x2a0/0x2b8
 [<1019e6c8>] get_page_from_freelist+0x3d4/0x614
 [<1019ea3c>] __alloc_pages_nodemask+0x134/0x610
 [<101b1d20>] do_wp_page+0x268/0xac0
 [<101b3b34>] handle_mm_fault+0x4d4/0x7c4
 [<1011d854>] do_page_fault+0x1f8/0x2fc
 [<1011f450>] handle_interruption+0xec/0x730
 [<10103078>] intr_check_sig+0x0/0x34
...
do_page_fault() pid=13414 command='minifail3' type=15 address=0x00dc
do_page_fault() pid=22776 command='minifail3' type=15 address=0x
do_page_fault() pid=26290 command='minifail3' type=15 address=0x00ec
do_page_fault() pid=1399 command='minifail3' type=6 address=0x0003
do_page_fault() pid=16130 command='minifail3' type=6 address=0x0003
do_page_fault() pid=26401 command='minifail3' type=6 address=0x0003
do_page_fault() pid=3383 command='minifail3' type=6 address=0x0003
do_page_fault() pid=3400 command='minifail3' type=15 address=0x0004
do_page_fault() pid=18659 command='minifail3' type=6 address=0x0003
do_page_fault() pid=3730 command='minifail3' type=6 address=0x0003
do_page_fault() pid=28828 command='minifail3' type=6 address=0x0003



-- 
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org



Bug#561203: threads and fork on machine with VIPT-WB cache

2010-04-06 Thread James Bottomley
On Tue, 2010-04-06 at 13:57 +0900, NIIBE Yutaka wrote:
> John David Anglin wrote:
> > It is interesting that in the case of the Debian bug that
> > a thread of the parent process causes the COW break and thereby corrupts
> > its own memory.  As far as I can tell, the fork'd child never writes
> > to the memory that causes the fault.
> 
> Thanks for writing and testing a patch.
> 
> The case of #561203 is second scenario.  I think that this case is
> relevant to VIVT-WB machine too (provided kernel does copy by kernel
> address).
> 
> James Bottomley wrote:
> > So this is going to be a hard sell because of the arch churn. There are,
> > however, three ways to do it with the original signature.
> 
> Currently, I think that signature change would be inevitable for
> ptep_set_wrprotect.

Well we can't do it by claiming several architectures are wrong in their
implementation.  We might do it by claiming to need vma knowledge ...
however, even if you want the flush, as I said, you don't need to change
the signature.

> >  1. implement copy_user_highpage ... this allows us to copy through
> > the child's page cache (which is coherent with the parent's
> > before the cow) and thus pick up any cache changes without a
> > flush
> 
> Let me think about this way.
> 
> Well, this would improve both cases of the first scenario of mine and
> the second scenario.
> 
> But... I think that even if we would have copy_user_highpage which
> does copy by user address, we need to flush at ptep_set_wrprotect.  I
> think that we need to keep the condition: no dirty cache for COW page.
> 
> Think about third scenario of threads and fork:
> 
> (1) In process A, there are multiple threads, and a thread A-1 invokes
> fork.  We have process B, with a different space identifier color.

I don't understand what you mean by space colour ... there's cache
colour which refers to the line in the cache to which the the physical
memory maps.  The way PA is set up, space ID doesn't factor into cache
colour.

> (2) Another thread A-2 in process A runs while A-1 copies memory by
> dup_mmap.  A-2 writes to the address  in a page.  Let's call
> this page .
> 
> (3) We have dirty cache for  by A-2 at the time of
> ptep_set_wrprotect of thread A-1.  Suppose that we don't flush
> here.
> 
> (4) A-1 finishes copy, and sleeps.
> 
> (5) Child process B is waken up and sees old value at  in ,
> through different cache line.  B sleeps.

This isn't possible.  at this point, A and B have the same virtual
address and mapping for  this means they are the same cache
colour, so they both see the cached value.

James

> (6) A-2 is waken up.  A-2 touches the memory again, breaks COW.  A-2
> copies data on  to .  OK,  is
> consistent with copy_user_highpage by user address.
> 
> Note that during this copy, the cache line of  by A-2 is
> flushed out to .  It invokes another memory fault and COW
> break.  (I think that this memory fault is unhealthy.)
> Then, new value goes to  on  (when it's physically
> tagged cache).
> 
> A-2 sleeps.
> 
> (7) Child process B is waken up.  When it accesses at , it sees new
> value suddenly.
> 
> 
> If we flush cache to  at ptep_set_wrprotect, this couldn't
> occur.
> 
> 
>   *   *   *
> 
> 
> I know that we should not do "threads and fork".  It is difficult to
> define clean semantics.  Because another thread may touch memory while
> a thread which does memory copy for fork, the memory what the child
> process will see may be inconsistent.  For the child, a page might be
> new, while another page might be old.
> 
> For VIVT-WB cache machine, I am considering a possibility for the
> child process to have inconsistent memory even within a single page
> (when we have no flush at ptep_set_wrprotect).
> 
> It will be needed for me to talk to linux-arch soon or later.





-- 
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org



Bug#561203: threads and fork on machine with VIPT-WB cache

2010-04-06 Thread James Bottomley
On Tue, 2010-04-06 at 08:37 -0500, James Bottomley wrote:
> > (5) Child process B is waken up and sees old value at  in
> ,
> > through different cache line.  B sleeps.
> 
> This isn't possible.  at this point, A and B have the same virtual
> address and mapping for  this means they are the same cache
> colour, so they both see the cached value.

Perhaps to add more detail to this.  In spite of what the arch manual
says (it says the congruence stride is 16MB), the congruence stride on
all manufactured parisc processors is 4MB.  This means that any virtual
addresses, regardless of space id, that are equal modulo 4MB have the
same cache colour.

James
 




-- 
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org



Bug#561203: threads and fork on machine with VIPT-WB cache

2010-04-05 Thread NIIBE Yutaka
John David Anglin wrote:
> It is interesting that in the case of the Debian bug that
> a thread of the parent process causes the COW break and thereby corrupts
> its own memory.  As far as I can tell, the fork'd child never writes
> to the memory that causes the fault.

Thanks for writing and testing a patch.

The case of #561203 is second scenario.  I think that this case is
relevant to VIVT-WB machine too (provided kernel does copy by kernel
address).

James Bottomley wrote:
> So this is going to be a hard sell because of the arch churn. There are,
> however, three ways to do it with the original signature.

Currently, I think that signature change would be inevitable for
ptep_set_wrprotect.

>  1. implement copy_user_highpage ... this allows us to copy through
> the child's page cache (which is coherent with the parent's
> before the cow) and thus pick up any cache changes without a
> flush

Let me think about this way.

Well, this would improve both cases of the first scenario of mine and
the second scenario.

But... I think that even if we would have copy_user_highpage which
does copy by user address, we need to flush at ptep_set_wrprotect.  I
think that we need to keep the condition: no dirty cache for COW page.

Think about third scenario of threads and fork:

(1) In process A, there are multiple threads, and a thread A-1 invokes
fork.  We have process B, with a different space identifier color.

(2) Another thread A-2 in process A runs while A-1 copies memory by
dup_mmap.  A-2 writes to the address  in a page.  Let's call
this page .

(3) We have dirty cache for  by A-2 at the time of
ptep_set_wrprotect of thread A-1.  Suppose that we don't flush
here.

(4) A-1 finishes copy, and sleeps.

(5) Child process B is waken up and sees old value at  in ,
through different cache line.  B sleeps.

(6) A-2 is waken up.  A-2 touches the memory again, breaks COW.  A-2
copies data on  to .  OK,  is
consistent with copy_user_highpage by user address.

Note that during this copy, the cache line of  by A-2 is
flushed out to .  It invokes another memory fault and COW
break.  (I think that this memory fault is unhealthy.)
Then, new value goes to  on  (when it's physically
tagged cache).

A-2 sleeps.

(7) Child process B is waken up.  When it accesses at , it sees new
value suddenly.


If we flush cache to  at ptep_set_wrprotect, this couldn't
occur.


*   *   *


I know that we should not do "threads and fork".  It is difficult to
define clean semantics.  Because another thread may touch memory while
a thread which does memory copy for fork, the memory what the child
process will see may be inconsistent.  For the child, a page might be
new, while another page might be old.

For VIVT-WB cache machine, I am considering a possibility for the
child process to have inconsistent memory even within a single page
(when we have no flush at ptep_set_wrprotect).

It will be needed for me to talk to linux-arch soon or later.
-- 



-- 
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org



Bug#561203: threads and fork on machine with VIPT-WB cache

2010-04-05 Thread James Bottomley
On Sun, 2010-04-04 at 22:51 -0400, John David Anglin wrote:
> > Thanks a lot for the discussion.
> > 
> > James Bottomley wrote:
> > > So your theory is that the data the kernel sees doing the page copy can
> > > be stale because of dirty cache lines in userspace (which is certainly
> > > possible in the ordinary way)?
> > 
> > Yes.
> > 
> > > By design that shouldn't happen: the idea behind COW breaking is
> > > that before it breaks, the page is read only ... this means that
> > > processes can have clean cache copies of it, but never dirty cache
> > > copies (because writes are forbidden).
> > 
> > That must be design, I agree.
> > 
> > To keep this condition (no dirty cache for COW page), we need to flush
> > cache before ptep_set_wrprotect.  That's my point.
> > 
> > Please look at the code path:
> >(kernel/fork.c)
> >do_fork -> copy_process -> copy_mm -> dup_mm -> dup_mmap ->
> >(mm/memory.c)
> >copy_page_range -> copy_p*d_range -> copy_one_pte -> ptep_set_wrprotect
> > 
> > The function flush_cache_dup_mm is called from dup_mmap, that's enough
> > for a case of a process with single thread.
> > I think that:
> > We need to flush cache before ptep_set_wrprotect for a process with
> > multiple threads.  Other threads may change memory after a thread
> > invokes do_fork and before calling ptep_set_wrprotect.  Specifically,
> > a process may sleep at pte_alloc function to get a page.
> 
> I agree.  It is interesting that in the case of the Debian bug that
> a thread of the parent process causes the COW break and thereby corrupts
> its own memory.  As far as I can tell, the fork'd child never writes
> to the memory that causes the fault.
> 
> My testing indicates that your suggested change fixes the Debian
> bug.  I've attached below my latest test version.  This seems to fix
> the bug on both SMP and UP kernels.
> 
> However, it doesn't fix all page/cache related issues on parisc
> SMP kernels that I commonly see.
> 
> My first inclination after even before reading your analysis was
> to assume that copy_user_page was broken (i.e, that even if a
> processor cache was dirty when the COW page was write protected,
> it should be possible to do the flush before the page is copied).
> However, this didn't seem to work...  Possibly, there are issues
> with aliased addresses.
> 
> I note that sparc flushes the entire cache and purges the entire
> tlb in kmap_atomic/kunmap_atomic for highmem.  Although the breakage
> that I see is not limited to PA8800/PA8900, I'm not convinced
> that we maintain coherency that is required for these processors
> in copy_user_page when we have multiple threads.
> 
> As a side note, kmap_atomic/kunmap_atomic seem to lack calls to
> pagefault_disable()/pagefault_enable() on PA8800.
> 
> Dave
> -- 
> J. David Anglin  dave.ang...@nrc-cnrc.gc.ca
> National Research Council of Canada  (613) 990-0752 (FAX: 
> 952-6602)
> 
> diff --git a/arch/parisc/include/asm/pgtable.h 
> b/arch/parisc/include/asm/pgtable.h
> index a27d2e2..b140d5c 100644
> --- a/arch/parisc/include/asm/pgtable.h
> +++ b/arch/parisc/include/asm/pgtable.h
> @@ -14,6 +14,7 @@
>  #include 
>  #include 
>  #include 
> +extern void flush_cache_page(struct vm_area_struct *vma, unsigned long 
> vmaddr, unsigned long pfn);
>  
>  /*
>   * kern_addr_valid(ADDR) tests if ADDR is pointing to valid kernel
> @@ -456,17 +457,22 @@ static inline pte_t ptep_get_and_clear(struct mm_struct 
> *mm, unsigned long addr,
>   return old_pte;
>  }
>  
> -static inline void ptep_set_wrprotect(struct mm_struct *mm, unsigned long 
> addr, pte_t *ptep)
> +static inline void ptep_set_wrprotect(struct vm_area_struct *vma, struct 
> mm_struct *mm, unsigned long addr, pte_t *ptep)
>  {
>  #ifdef CONFIG_SMP
>   unsigned long new, old;
> +#endif
> + pte_t old_pte = *ptep;
> +
> + if (atomic_read(&mm->mm_users) > 1)

Just to verify there's nothing this is hiding, can you make this 

if (pte_dirty(old_pte))

and reverify?  The if clause should only trip on the case where the
parent has dirtied the line between flush and now.

> + flush_cache_page(vma, addr, pte_pfn(old_pte));
>  
> +#ifdef CONFIG_SMP
>   do {
>   old = pte_val(*ptep);
>   new = pte_val(pte_wrprotect(__pte (old)));
>   } while (cmpxchg((unsigned long *) ptep, old, new) != old);
>  #else
> - pte_t old_pte = *ptep;
>   set_pte_at(mm, addr, ptep, pte_wrprotect(old_pte));
>  #endif
>  }
> diff --git a/mm/memory.c b/mm/memory.c
> index 09e4b1b..21c2916 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -616,7 +616,7 @@ copy_one_pte(struct mm_struct *dst_mm, struct mm_struct 
> *src_mm,
>* in the parent and the child
>*/
>   if (is_cow_mapping(vm_flags)) {
> - ptep_set_wrprotect(src_mm, addr, src_pte);
> + ptep_set_wrprotect(vma, src_mm, addr, src_pte);

So this is going to be a hard sell because of the arch churn. 

Bug#561203: threads and fork on machine with VIPT-WB cache

2010-04-04 Thread John David Anglin
> > > By design that shouldn't happen: the idea behind COW breaking is
> > > that before it breaks, the page is read only ... this means that
> > > processes can have clean cache copies of it, but never dirty cache
> > > copies (because writes are forbidden).
> > 
> > That must be design, I agree.
> > 
> > To keep this condition (no dirty cache for COW page), we need to flush
> > cache before ptep_set_wrprotect.  That's my point.

Is it possible that a sleep/reschedule could cause the cache to become
dirty again before it is write protected?

Dave
-- 
J. David Anglin  dave.ang...@nrc-cnrc.gc.ca
National Research Council of Canada  (613) 990-0752 (FAX: 952-6602)



-- 
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org



Bug#561203: threads and fork on machine with VIPT-WB cache

2010-04-04 Thread John David Anglin
> Thanks a lot for the discussion.
> 
> James Bottomley wrote:
> > So your theory is that the data the kernel sees doing the page copy can
> > be stale because of dirty cache lines in userspace (which is certainly
> > possible in the ordinary way)?
> 
> Yes.
> 
> > By design that shouldn't happen: the idea behind COW breaking is
> > that before it breaks, the page is read only ... this means that
> > processes can have clean cache copies of it, but never dirty cache
> > copies (because writes are forbidden).
> 
> That must be design, I agree.
> 
> To keep this condition (no dirty cache for COW page), we need to flush
> cache before ptep_set_wrprotect.  That's my point.
> 
> Please look at the code path:
>(kernel/fork.c)
>do_fork -> copy_process -> copy_mm -> dup_mm -> dup_mmap ->
>(mm/memory.c)
>copy_page_range -> copy_p*d_range -> copy_one_pte -> ptep_set_wrprotect
> 
> The function flush_cache_dup_mm is called from dup_mmap, that's enough
> for a case of a process with single thread.
> I think that:
> We need to flush cache before ptep_set_wrprotect for a process with
> multiple threads.  Other threads may change memory after a thread
> invokes do_fork and before calling ptep_set_wrprotect.  Specifically,
> a process may sleep at pte_alloc function to get a page.

I agree.  It is interesting that in the case of the Debian bug that
a thread of the parent process causes the COW break and thereby corrupts
its own memory.  As far as I can tell, the fork'd child never writes
to the memory that causes the fault.

My testing indicates that your suggested change fixes the Debian
bug.  I've attached below my latest test version.  This seems to fix
the bug on both SMP and UP kernels.

However, it doesn't fix all page/cache related issues on parisc
SMP kernels that I commonly see.

My first inclination after even before reading your analysis was
to assume that copy_user_page was broken (i.e, that even if a
processor cache was dirty when the COW page was write protected,
it should be possible to do the flush before the page is copied).
However, this didn't seem to work...  Possibly, there are issues
with aliased addresses.

I note that sparc flushes the entire cache and purges the entire
tlb in kmap_atomic/kunmap_atomic for highmem.  Although the breakage
that I see is not limited to PA8800/PA8900, I'm not convinced
that we maintain coherency that is required for these processors
in copy_user_page when we have multiple threads.

As a side note, kmap_atomic/kunmap_atomic seem to lack calls to
pagefault_disable()/pagefault_enable() on PA8800.

Dave
-- 
J. David Anglin  dave.ang...@nrc-cnrc.gc.ca
National Research Council of Canada  (613) 990-0752 (FAX: 952-6602)

diff --git a/arch/parisc/include/asm/pgtable.h 
b/arch/parisc/include/asm/pgtable.h
index a27d2e2..b140d5c 100644
--- a/arch/parisc/include/asm/pgtable.h
+++ b/arch/parisc/include/asm/pgtable.h
@@ -14,6 +14,7 @@
 #include 
 #include 
 #include 
+extern void flush_cache_page(struct vm_area_struct *vma, unsigned long vmaddr, 
unsigned long pfn);
 
 /*
  * kern_addr_valid(ADDR) tests if ADDR is pointing to valid kernel
@@ -456,17 +457,22 @@ static inline pte_t ptep_get_and_clear(struct mm_struct 
*mm, unsigned long addr,
return old_pte;
 }
 
-static inline void ptep_set_wrprotect(struct mm_struct *mm, unsigned long 
addr, pte_t *ptep)
+static inline void ptep_set_wrprotect(struct vm_area_struct *vma, struct 
mm_struct *mm, unsigned long addr, pte_t *ptep)
 {
 #ifdef CONFIG_SMP
unsigned long new, old;
+#endif
+   pte_t old_pte = *ptep;
+
+   if (atomic_read(&mm->mm_users) > 1)
+   flush_cache_page(vma, addr, pte_pfn(old_pte));
 
+#ifdef CONFIG_SMP
do {
old = pte_val(*ptep);
new = pte_val(pte_wrprotect(__pte (old)));
} while (cmpxchg((unsigned long *) ptep, old, new) != old);
 #else
-   pte_t old_pte = *ptep;
set_pte_at(mm, addr, ptep, pte_wrprotect(old_pte));
 #endif
 }
diff --git a/mm/memory.c b/mm/memory.c
index 09e4b1b..21c2916 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -616,7 +616,7 @@ copy_one_pte(struct mm_struct *dst_mm, struct mm_struct 
*src_mm,
 * in the parent and the child
 */
if (is_cow_mapping(vm_flags)) {
-   ptep_set_wrprotect(src_mm, addr, src_pte);
+   ptep_set_wrprotect(vma, src_mm, addr, src_pte);
pte = pte_wrprotect(pte);
}
 



-- 
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org



Bug#561203: threads and fork on machine with VIPT-WB cache

2010-04-04 Thread NIIBE Yutaka
retitle 561203 threads and fork on machine with VIPT-WB cache
reassign 561203 linux-2.6
thanks

As I am sure that this bug lives in kernel, I do reassign and retitle.
-- 



-- 
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org



Bug#561203: threads and fork on machine with VIPT-WB cache

2010-04-04 Thread NIIBE Yutaka
Thanks a lot for the discussion.

James Bottomley wrote:
> So your theory is that the data the kernel sees doing the page copy can
> be stale because of dirty cache lines in userspace (which is certainly
> possible in the ordinary way)?

Yes.

> By design that shouldn't happen: the idea behind COW breaking is
> that before it breaks, the page is read only ... this means that
> processes can have clean cache copies of it, but never dirty cache
> copies (because writes are forbidden).

That must be design, I agree.

To keep this condition (no dirty cache for COW page), we need to flush
cache before ptep_set_wrprotect.  That's my point.

Please look at the code path:
   (kernel/fork.c)
   do_fork -> copy_process -> copy_mm -> dup_mm -> dup_mmap ->
   (mm/memory.c)
   copy_page_range -> copy_p*d_range -> copy_one_pte -> ptep_set_wrprotect

The function flush_cache_dup_mm is called from dup_mmap, that's enough
for a case of a process with single thread.

I think that:
We need to flush cache before ptep_set_wrprotect for a process with
multiple threads.  Other threads may change memory after a thread
invokes do_fork and before calling ptep_set_wrprotect.  Specifically,
a process may sleep at pte_alloc function to get a page.
-- 



-- 
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org



Bug#561203: threads and fork on machine with VIPT-WB cache

2010-04-02 Thread John David Anglin
On Fri, 02 Apr 2010, NIIBE Yutaka wrote:

> NIIBE Yutaka wrote:
>> To have same semantics as other archs, I think that VIPT-WB cache
>> machine should have cache flush at ptep_set_wrprotect, so that memory
>> of the page has up-to-date data.  Yes, it will be huge performance
>> impact for fork.  But I don't find any good solution other than this
>> yet.
>
> I think we could do something like (only for VIPT-WB cache machine):
>
> - static inline void ptep_set_wrprotect(struct mm_struct *mm, unsigned 
> long 
> address, pte_t *ptep)
>
> + static inline void ptep_set_wrprotect(struct vm_area_struct *vma, 
> struct 
> mm_struct *mm, unsigned long addr, pte_t *ptep)
>   {
>   pte_t old_pte = *ptep;
> + if (atomic_read(&mm->mm_users) > 1)
> + flush_cache_page(vma, addr, pte_pfn(old_pte));
>   set_pte_at(mm, addr, ptep, pte_wrprotect(old_pte));
>   }

I tested the hack below on two machines currently running 2.6.33.2
UP kernels.  The change seems to fix Debian #561203 (minifail bug)!
Thus, I definitely think you are on the right track.  I'll continue
to test.

I suspect the same issue is present for SMP kernels.

Thanks,
Dave
-- 
J. David Anglin  dave.ang...@nrc-cnrc.gc.ca
National Research Council of Canada  (613) 990-0752 (FAX: 952-6602)

diff --git a/arch/parisc/include/asm/pgtable.h 
b/arch/parisc/include/asm/pgtable.h
index a27d2e2..a5d730f 100644
--- a/arch/parisc/include/asm/pgtable.h
+++ b/arch/parisc/include/asm/pgtable.h
@@ -14,6 +14,7 @@
 #include 
 #include 
 #include 
+extern void flush_cache_page(struct vm_area_struct *vma, unsigned long vmaddr, 
unsigned long pfn);
 
 /*
  * kern_addr_valid(ADDR) tests if ADDR is pointing to valid kernel
@@ -456,7 +457,7 @@ static inline pte_t ptep_get_and_clear(struct mm_struct 
*mm, unsigned long addr,
return old_pte;
 }
 
-static inline void ptep_set_wrprotect(struct mm_struct *mm, unsigned long 
addr, pte_t *ptep)
+static inline void ptep_set_wrprotect(struct vm_area_struct *vma, struct 
mm_struct *mm, unsigned long addr, pte_t *ptep)
 {
 #ifdef CONFIG_SMP
unsigned long new, old;
@@ -467,6 +468,8 @@ static inline void ptep_set_wrprotect(struct mm_struct *mm, 
unsigned long addr,
} while (cmpxchg((unsigned long *) ptep, old, new) != old);
 #else
pte_t old_pte = *ptep;
+   if (atomic_read(&mm->mm_users) > 1)
+   flush_cache_page(vma, addr, pte_pfn(old_pte));
set_pte_at(mm, addr, ptep, pte_wrprotect(old_pte));
 #endif
 }
diff --git a/mm/memory.c b/mm/memory.c
index 09e4b1b..21c2916 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -616,7 +616,7 @@ copy_one_pte(struct mm_struct *dst_mm, struct mm_struct 
*src_mm,
 * in the parent and the child
 */
if (is_cow_mapping(vm_flags)) {
-   ptep_set_wrprotect(src_mm, addr, src_pte);
+   ptep_set_wrprotect(vma, src_mm, addr, src_pte);
pte = pte_wrprotect(pte);
}
 



-- 
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org



Bug#561203: threads and fork on machine with VIPT-WB cache

2010-04-02 Thread James Bottomley
On Fri, 2010-04-02 at 12:48 +0900, NIIBE Yutaka wrote:
> Thanks for your quick reply.
> 
> James Bottomley wrote:
> > In COW breaking, the page table entry is copied, so A and B no longer
> > have page table entries at the same physical location.  If the COW is
> > intact, A and B have the same physical page, but it's also accessed by
> > the same virtual address, hence no aliasing.
> 
> Let me explain more.
> 
> In the scenario, I assume:
> 
>   No aliasing between A and B.
>   We have aliasing between kernel access and user access.
> 
> Before COW breaking A and B share same data (with no aliasing same
> space identifier color), and B sees data in cache, while memory has
> stale data.
> 
> At COW breaking, kernel copies the memory, it doesn't see new data
> in cache because of aliasing.
> 
> Isn't it possible?

So your theory is that the data the kernel sees doing the page copy can
be stale because of dirty cache lines in userspace (which is certainly
possible in the ordinary way)?  By design that shouldn't happen: the
idea behind COW breaking is that before it breaks, the page is read
only ... this means that processes can have clean cache copies of it,
but never dirty cache copies (because writes are forbidden).  As soon as
one or other process tries to write to the page, it gets a memory
protection trap long before the data it's trying to write goes into the
cache.  By the time the write is allowed to complete (and the cache
becomes dirty), the process will have the new copy of the page which
belongs exclusively to it.

James





-- 
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org



Bug#561203: threads and fork on machine with VIPT-WB cache

2010-04-02 Thread NIIBE Yutaka

NIIBE Yutaka wrote:

To have same semantics as other archs, I think that VIPT-WB cache
machine should have cache flush at ptep_set_wrprotect, so that memory
of the page has up-to-date data.  Yes, it will be huge performance
impact for fork.  But I don't find any good solution other than this
yet.


I think we could do something like (only for VIPT-WB cache machine):

-	static inline void ptep_set_wrprotect(struct mm_struct *mm, unsigned 
long address, pte_t *ptep)


+	static inline void ptep_set_wrprotect(struct vm_area_struct *vma, 
struct mm_struct *mm, unsigned long addr, pte_t *ptep)

{
pte_t old_pte = *ptep;
+   if (atomic_read(&mm->mm_users) > 1)
+   flush_cache_page(vma, addr, pte_pfn(old_pte));
set_pte_at(mm, addr, ptep, pte_wrprotect(old_pte));
}

Here, we can add condition for the call of flush_cache_page
to avoid big performance impact for non threads case.
--



--
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org



Bug#561203: threads and fork on machine with VIPT-WB cache

2010-04-01 Thread NIIBE Yutaka
Thanks for your quick reply.

James Bottomley wrote:
> In COW breaking, the page table entry is copied, so A and B no longer
> have page table entries at the same physical location.  If the COW is
> intact, A and B have the same physical page, but it's also accessed by
> the same virtual address, hence no aliasing.

Let me explain more.

In the scenario, I assume:

No aliasing between A and B.
We have aliasing between kernel access and user access.

Before COW breaking A and B share same data (with no aliasing same
space identifier color), and B sees data in cache, while memory has
stale data.

At COW breaking, kernel copies the memory, it doesn't see new data
in cache because of aliasing.

Isn't it possible?
-- 



-- 
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org



Bug#561203: threads and fork on machine with VIPT-WB cache

2010-04-01 Thread James Bottomley
On Fri, 2010-04-02 at 11:41 +0900, NIIBE Yutaka wrote:
> (9) Process B does read-access on memory, which gets *NEW* data in
> cache (if process space identifier color is same).
> Process B does write-access on memory which causes memory fault,
> as it's COW memory.
> 
> Note: Process B sees *NEW* data because it's VIPT-WB cache.
> It shares same memory in this situation.

So I think the bug here is that you're confusing aliasing with SMP cache
coherence.  In an alias situation, the same physical line is mapped to
multiple lines in a processor's cache (at different virtual addresses),
which means you can get a different answer depending on which alias you
read.

In COW breaking, the page table entry is copied, so A and B no longer
have page table entries at the same physical location.  If the COW is
intact, A and B have the same physical page, but it's also accessed by
the same virtual address, hence no aliasing.

In an SMP incoherent system, A and B could get different results (if on
different CPUs) because the write protect is in the cache of A but not
B.  However, PA is SMP coherent, so the act of B reading a line which is
dirty in A's cache causes a flush before the read completes via the
cache chequerboard logic and B ends up reading the same value A would
have read.

James





-- 
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org



Bug#561203: threads and fork on machine with VIPT-WB cache

2010-04-01 Thread NIIBE Yutaka
Hi there,

I think that I am catching a bug for threads and fork.  I found it
when debugging FTBFS of Gauche, a Scheme interpreter.  As I think that
the Debian bug #561203 has same cause, I am CC:-ing to the BTS too.
Please send Cc: to me, I am not on linux-parisc list.

Here, I am talking uniprocessor system case.
I assume that PARISC has virtually indexed, physically tagged, write
back cache, I call it VIPT-WB.

I am reading the source in Debian:
linux-source-2.6.32/kernel/fork.c
linux-source-2.6.32/mm/memory.c
linux-source-2.6.32/arch/parisc/include/asm/pgtable.h

To have same semantics as other archs, I think that VIPT-WB cache
machine should have cache flush at ptep_set_wrprotect, so that memory
of the page has up-to-date data.  Yes, it will be huge performance
impact for fork.  But I don't find any good solution other than this
yet.  Well, I will need to write to linux-arch.

Let me explain our case.  As I couldn't catch the scene, but the
result, it includes imagination and interpretation of mine.  Correct
me if I'm wrong.

(1) We have process A with threads.  One of threads calls fork(2) (in
fact, it's clone(2) without CLONE_VM) when other threads are still
live.  Let's call this thread A-1.

(2) As a result of clone(2), we have process B.

(3) The memory of process A are copied to process B by dup_mmap
(fork.c) by A-1 with the context of process A.  There,
flush_cache_dup_mm is called.

In case of single thread, flush_cache_dup_mm is enough.  All data
in cache go to memory.  But we have other threads, in this case.

(4) From dup_mmap, copy_page_range (memory.c) is called.

Note that there is a possibility of sleep in copy_page_range.
Allocation of page in pud_alloc, pmd_alloc, or pte_alloc_map_lock
may need the A-1 thread to be scheduled off (and wakes up the
swapper or other processes).

(5) Suppose the A-1 thread sleeps in copy_page_range, and another
thread of A-2 of process A is waken up, and touches memory.  Then
we have data only in cache, memory has stale data.

(6) A-2 thread sleeps, and A-1 thread is waken up to continue
copy_page_range -> copy_*_range -> copy_one_pte.

(7) From copy_one_pte, A-1 thread call ptep_set_wrprotect as
this is COW mapping. (*)

(8) A-1 thread sleeps again in copy_page_range and process B is waken up.

(9) Process B does read-access on memory, which gets *NEW* data in
cache (if process space identifier color is same).
Process B does write-access on memory which causes memory fault,
as it's COW memory.

Note: Process B sees *NEW* data because it's VIPT-WB cache.
It shares same memory in this situation.

(10) New page is allocated and memory contents are copied, with
 stale data.

 I assume that kernel access to the memory is by different
 cache line and does not see cache data of A-2.

(11) After falut, process B gets *OLD* data on memory.


(*) When we make COW memory mapping between process A and process B,
we assume memory were up-to-date.  As this assumption is
incorrect, I think that we need to flush cache data to memory
here.


If you have more interest or like ASCII art, please keep reading.

In our Gauche case, we saw this problem on the linked list handling of
pthread implementation (NPTL).  We have two linked list heads, 
and .

Initially, situation of process A is like this:

  +-+
  | |
used  v ELEM|
+-+ +-+ +-+ +-+ |
|   --->|   --->|   --->|   +
+-+ +-+ +-+ +-+
| | | | | |
+-+ +-+ +-+

  +-+
  | |
cache v |
+-+ +-+ |
|   --->|   +
+-+ +-+   This is in memory
| |
+-+

A-2 thread removes ELEM during fork.
This is Process A's final situation, and what Process B sees initially.

  +-+
  | |
used  v |
+-+ +-+ +-+ |
|   --->|   --->|   +
+-+ +-+ +-+
| | | |
+-+ +-+

  +---+
  | ELEM  |
  | +-+   |
  | +-->|   -+|
  | |   +-+  ||
  | |   | |  ||
cache v |   +-+  ||This is in cache
+-+ ||   +-+  |
|   ++-->|   -+
+-+  +-+
 | |
 +-+


Process B scans through linked list with  and update data
in linked list.  After process