subject:"Bug#561203\: threads and fork on machine with VIPT\-WB cache"

Bug#561203: threads and fork on machine with VIPT-WB cache

2010-06-10 Thread dann frazier

On Thu, Jun 10, 2010 at 07:30:45PM +0300, Modestas Vainius wrote:
> Hello,
> 
> On sekmadienis 06 Bir??elis 2010 04:01:23 Modestas Vainius wrote:
> > On penktadienis 04 Bir??elis 2010 08:21:06 dann frazier wrote:
> > > > My case and my analysis talked about UP kernel, and John David Anglin
> > > > 
> > > > made a patch:
> > > > http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=561203#144
> > > > 
> > > > After that, the discussion went to SMP cases.
> > > > 
> > > > It would be better to evaluate the patch again, and make sure it works
> > > > for UP case and fix failures of buildd, then apply for Linux in Debian
> > > > (only) for HPPA.
> > > > 
> > > > I know that the patch is not that ideal because it touches
> > > > architecture independent part of Linux, but it is worth for Linux in
> > > > Debian (or Linux for the HPPA machine of buildd, at least).
> > > 
> > > I'm happy to test the patch if necessary to help push this change
> > > upstream. However, we do need the change to go upstream before we can
> > > include it in the Debian kernel.
> > 
> > I made a hackish patch for QProcess in Qt (usleep(1000) before fork())
> > which seems to reduce likelihood of the failure to very rare again. Once a
> > new revision of qt4-x11 is uploaded to sid (soon I believe), KDE
> > applications should be able to build again (hopefully).
> 
> qt4-x11/hppa 4:4.6.3-1 has recently been uploaded to incoming. It has my hppa 
> hack applied. Therefore please give back the following KDE packages on hppa:
> 
> kde4libs basket kdesvn webkitkde kraft konq-plugins

done.




-- 
To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/20100610172715.ga25...@lackof.org

Bug#561203: threads and fork on machine with VIPT-WB cache

2010-06-10 Thread Modestas Vainius

Hello,

On sekmadienis 06 Birželis 2010 04:01:23 Modestas Vainius wrote:
> On penktadienis 04 Birželis 2010 08:21:06 dann frazier wrote:
> > > My case and my analysis talked about UP kernel, and John David Anglin
> > > 
> > > made a patch:
> > >   http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=561203#144
> > > 
> > > After that, the discussion went to SMP cases.
> > > 
> > > It would be better to evaluate the patch again, and make sure it works
> > > for UP case and fix failures of buildd, then apply for Linux in Debian
> > > (only) for HPPA.
> > > 
> > > I know that the patch is not that ideal because it touches
> > > architecture independent part of Linux, but it is worth for Linux in
> > > Debian (or Linux for the HPPA machine of buildd, at least).
> > 
> > I'm happy to test the patch if necessary to help push this change
> > upstream. However, we do need the change to go upstream before we can
> > include it in the Debian kernel.
> 
> I made a hackish patch for QProcess in Qt (usleep(1000) before fork())
> which seems to reduce likelihood of the failure to very rare again. Once a
> new revision of qt4-x11 is uploaded to sid (soon I believe), KDE
> applications should be able to build again (hopefully).

qt4-x11/hppa 4:4.6.3-1 has recently been uploaded to incoming. It has my hppa 
hack applied. Therefore please give back the following KDE packages on hppa:

kde4libs basket kdesvn webkitkde kraft konq-plugins


-- 
Modestas Vainius 


signature.asc
Description: This is a digitally signed message part.

Bug#561203: threads and fork on machine with VIPT-WB cache

2010-06-07 Thread dann frazier

On Fri, Jun 04, 2010 at 12:44:55PM +0200, Thibaut VARENE wrote:
> On Fri, Jun 4, 2010 at 7:21 AM, dann frazier  wrote:
> > On Fri, Jun 04, 2010 at 10:03:07AM +0900, NIIBE Yutaka wrote:
> >> Modestas Vainius wrote:
>  Note that Debian's buildds run a UP kernel, so as soon as those fixes
>  go upstream we can pull them in. Thanks for all your work here!
> 
> >>>
> >>> Well, as long as this is unfixed or at least "common", I don't see how 
> >>> hppa
> >>> can be considered to be a release arch. Is that UP patch available 
> >>> somewhere?
> >>
> >> My case and my analysis talked about UP kernel, and John David Anglin
> >> made a patch:
> >>       http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=561203#144
> >>
> >> After that, the discussion went to SMP cases.
> >>
> >> It would be better to evaluate the patch again, and make sure it works
> >> for UP case and fix failures of buildd, then apply for Linux in Debian
> >> (only) for HPPA.
> >>
> >> I know that the patch is not that ideal because it touches
> >> architecture independent part of Linux, but it is worth for Linux in
> >> Debian (or Linux for the HPPA machine of buildd, at least).
> >
> > I'm happy to test the patch if necessary to help push this change
> > upstream. However, we do need the change to go upstream before we can
> > include it in the Debian kernel.
> 
> Just for reference, I've summarized the test cases and related patches here:
> http://wiki.parisc-linux.org/TestCases

Cool - that is helpful. I've updated the kernel on peri/penalosa with
the various patches listed there that have gone upstream, but I'm not
seeing better results with any failing packages.

btw, I thought it would be useful to edit that page and tag each patch
with its status in Debian (in-official-kernel, installed-on-buildds,
etc), but the page appears to be immutable.

-- 
dann frazier




--
To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/20100607171136.ga14...@lackof.org

Bug#561203: threads and fork on machine with VIPT-WB cache

2010-06-05 Thread Modestas Vainius

Hello,

On penktadienis 04 Birželis 2010 08:21:06 dann frazier wrote:
> > My case and my analysis talked about UP kernel, and John David Anglin
> > 
> > made a patch:
> > http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=561203#144
> > 
> > After that, the discussion went to SMP cases.
> > 
> > It would be better to evaluate the patch again, and make sure it works
> > for UP case and fix failures of buildd, then apply for Linux in Debian
> > (only) for HPPA.
> > 
> > I know that the patch is not that ideal because it touches
> > architecture independent part of Linux, but it is worth for Linux in
> > Debian (or Linux for the HPPA machine of buildd, at least).
> 
> I'm happy to test the patch if necessary to help push this change
> upstream. However, we do need the change to go upstream before we can
> include it in the Debian kernel.

I made a hackish patch for QProcess in Qt (usleep(1000) before fork()) which 
seems to reduce likelihood of the failure to very rare again. Once a new 
revision of qt4-x11 is uploaded to sid (soon I believe), KDE applications 
should be able to build again (hopefully).

Obviously it would be better to get this bug fixed for real but at least now 
the whole KDE stack won't be held by it while we wait.

-- 
Modestas Vainius 

signature.asc
Description: This is a digitally signed message part.

Bug#561203: threads and fork on machine with VIPT-WB cache

2010-06-04 Thread Thibaut VARENE

On Fri, Jun 4, 2010 at 7:21 AM, dann frazier  wrote:
> On Fri, Jun 04, 2010 at 10:03:07AM +0900, NIIBE Yutaka wrote:
>> Modestas Vainius wrote:
 Note that Debian's buildds run a UP kernel, so as soon as those fixes
 go upstream we can pull them in. Thanks for all your work here!

>>>
>>> Well, as long as this is unfixed or at least "common", I don't see how hppa
>>> can be considered to be a release arch. Is that UP patch available 
>>> somewhere?
>>
>> My case and my analysis talked about UP kernel, and John David Anglin
>> made a patch:
>>       http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=561203#144
>>
>> After that, the discussion went to SMP cases.
>>
>> It would be better to evaluate the patch again, and make sure it works
>> for UP case and fix failures of buildd, then apply for Linux in Debian
>> (only) for HPPA.
>>
>> I know that the patch is not that ideal because it touches
>> architecture independent part of Linux, but it is worth for Linux in
>> Debian (or Linux for the HPPA machine of buildd, at least).
>
> I'm happy to test the patch if necessary to help push this change
> upstream. However, we do need the change to go upstream before we can
> include it in the Debian kernel.

Just for reference, I've summarized the test cases and related patches here:
http://wiki.parisc-linux.org/TestCases

HTH

-- 
Thibaut VARENE
http://www.parisc-linux.org/~varenet/



--
To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org
Archive: 
http://lists.debian.org/aanlktikldmpvgejp54qcnvmcup2ns5lkepwhxwonl...@mail.gmail.com

Bug#561203: threads and fork on machine with VIPT-WB cache

2010-06-04 Thread Bastian Blank

severity 561203 serious
thanks

On Thu, Jun 03, 2010 at 11:50:05AM +0300, Modestas Vainius wrote:
> # Breaks unrelated applications

Sorry, no. Almost all applications are related to the kernel.

> Well, as long as this is unfixed or at least "common", I don't see how hppa 
> can be considered to be a release arch. Is that UP patch available somewhere?

This is up to the release team.

Bastian

-- 
You're dead, Jim.
-- McCoy, "Amok Time", stardate 3372.7



-- 
To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/20100604075644.gb13...@wavehammer.waldi.eu.org

Processed: Re: Bug#561203: threads and fork on machine with VIPT-WB cache

2010-06-04 Thread Debian Bug Tracking System

Processing commands for cont...@bugs.debian.org:

> severity 561203 serious
Bug #561203 [linux-2.6] threads and fork on machine with VIPT-WB cache
Severity set to 'serious' from 'critical'

> thanks
Stopping processing here.

Please contact me if you need assistance.
-- 
561203: http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=561203
Debian Bug Tracking System
Contact ow...@bugs.debian.org with problems


-- 
To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org
Archive: 
http://lists.debian.org/handler.s.c.12756382084261.transcr...@bugs.debian.org

Bug#561203: threads and fork on machine with VIPT-WB cache

2010-06-03 Thread dann frazier

On Fri, Jun 04, 2010 at 10:03:07AM +0900, NIIBE Yutaka wrote:
> Modestas Vainius wrote:
>>> Note that Debian's buildds run a UP kernel, so as soon as those fixes
>>> go upstream we can pull them in. Thanks for all your work here!
>>>
>>
>> Well, as long as this is unfixed or at least "common", I don't see how hppa
>> can be considered to be a release arch. Is that UP patch available somewhere?
>
> My case and my analysis talked about UP kernel, and John David Anglin
> made a patch:
>   http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=561203#144
>
> After that, the discussion went to SMP cases.
>
> It would be better to evaluate the patch again, and make sure it works
> for UP case and fix failures of buildd, then apply for Linux in Debian
> (only) for HPPA.
>
> I know that the patch is not that ideal because it touches
> architecture independent part of Linux, but it is worth for Linux in
> Debian (or Linux for the HPPA machine of buildd, at least).

I'm happy to test the patch if necessary to help push this change
upstream. However, we do need the change to go upstream before we can
include it in the Debian kernel.



-- 
To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/20100604052106.gc15...@lackof.org

Bug#561203: threads and fork on machine with VIPT-WB cache

2010-06-03 Thread NIIBE Yutaka


Modestas Vainius wrote:

Note that Debian's buildds run a UP kernel, so as soon as those fixes
go upstream we can pull them in. Thanks for all your work here!



Well, as long as this is unfixed or at least "common", I don't see how hppa
can be considered to be a release arch. Is that UP patch available somewhere?


My case and my analysis talked about UP kernel, and John David Anglin
made a patch:
http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=561203#144

After that, the discussion went to SMP cases.

It would be better to evaluate the patch again, and make sure it works
for UP case and fix failures of buildd, then apply for Linux in Debian
(only) for HPPA.

I know that the patch is not that ideal because it touches
architecture independent part of Linux, but it is worth for Linux in
Debian (or Linux for the HPPA machine of buildd, at least).
--



--
To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/4c0850cb.20...@fsij.org

Bug#561203: threads and fork on machine with VIPT-WB cache

2010-06-03 Thread Modestas Vainius

# Breaks unrelated applications
tags 561203 critical
thanks

Hello,

On trečiadienis 02 Birželis 2010 20:56:17 dann frazier wrote:
> On Wed, Jun 02, 2010 at 01:16:01PM -0400, John David Anglin wrote:
> > On Wed, 02 Jun 2010, Modestas Vainius wrote:
> > > Hello,
> > > 
> > > this bug [1] is back to the "very common" department with eglibc 2.11
> > > (libc6- dev_2.11.1-1) builds. Majority of KDE applications are failing
> > > to build on hppa again. Is there really nothing what could be done to
> > > fix it?
> > 
> > I will just say it is very tricky.  I think a fix is possible (arm and
> > mips had similar cache problems) but the victim replacement present in
> > PA8800/PA8900 caches makes the problem especially difficult  for
> > hardware using these processors.
> > 
> > I have spent the last few months testing various alternatives and have
> > now done hundreds of kernel builds.  I did post some experimental patches
> > that fix the problem on UP kernels.  However, the problem is not resolved
> > for SMP kernels.
> 
> Note that Debian's buildds run a UP kernel, so as soon as those fixes
> go upstream we can pull them in. Thanks for all your work here!
> 

Well, as long as this is unfixed or at least "common", I don't see how hppa 
can be considered to be a release arch. Is that UP patch available somewhere?

All KDE applications have been stuck in unstable before due to this and 
history is about to repeat itself unless something is done. While apparently a 
failing test in eglibc can be ignored, other applications have to suffer real 
world problems...

-- 
Modestas Vainius 


signature.asc
Description: This is a digitally signed message part.

Bug#561203: threads and fork on machine with VIPT-WB cache

2010-06-02 Thread dann frazier

On Wed, Jun 02, 2010 at 01:16:01PM -0400, John David Anglin wrote:
> On Wed, 02 Jun 2010, Modestas Vainius wrote:
> 
> > Hello,
> > 
> > this bug [1] is back to the "very common" department with eglibc 2.11 
> > (libc6-
> > dev_2.11.1-1) builds. Majority of KDE applications are failing to build on 
> > hppa again. Is there really nothing what could be done to fix it?
> 
> I will just say it is very tricky.  I think a fix is possible (arm and mips
> had similar cache problems) but the victim replacement present in 
> PA8800/PA8900
> caches makes the problem especially difficult  for hardware using these
> processors.
> 
> I have spent the last few months testing various alternatives and have
> now done hundreds of kernel builds.  I did post some experimental patches
> that fix the problem on UP kernels.  However, the problem is not resolved
> for SMP kernels.

Note that Debian's buildds run a UP kernel, so as soon as those fixes
go upstream we can pull them in. Thanks for all your work here!

> The minifail test is a good one to demonstrate the problem.  Indeed,
> a very similar test was given in the thread below:
> http://readlist.com/lists/vger.kernel.org/linux-kernel/54/270861.html
> 
> This thread also discusses the PA8800 problem:
> http://readlist.com/lists/vger.kernel.org/linux-kernel/54/271417.html
> 
> I currently surmise that we have a problem with the cache victim
> replacement, although the cause isn't clear.  I did find recently
> that the cache prefetch in copy_user_page_asm extends to the line
> beyond the end of the page, but fixing this doesn't resolve the problem.
> 
> I am still experimenting with using equivalent aliasing.  It does
> help to flush in ptep_set_wrprotect.
> 
> Dave

-- 
dann frazier




-- 
To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/20100602175616.ga23...@lackof.org

Bug#561203: threads and fork on machine with VIPT-WB cache

2010-06-02 Thread John David Anglin

On Wed, 02 Jun 2010, Modestas Vainius wrote:

> Hello,
> 
> this bug [1] is back to the "very common" department with eglibc 2.11 (libc6-
> dev_2.11.1-1) builds. Majority of KDE applications are failing to build on 
> hppa again. Is there really nothing what could be done to fix it?

I will just say it is very tricky.  I think a fix is possible (arm and mips
had similar cache problems) but the victim replacement present in PA8800/PA8900
caches makes the problem especially difficult  for hardware using these
processors.

I have spent the last few months testing various alternatives and have
now done hundreds of kernel builds.  I did post some experimental patches
that fix the problem on UP kernels.  However, the problem is not resolved
for SMP kernels.

The minifail test is a good one to demonstrate the problem.  Indeed,
a very similar test was given in the thread below:
http://readlist.com/lists/vger.kernel.org/linux-kernel/54/270861.html

This thread also discusses the PA8800 problem:
http://readlist.com/lists/vger.kernel.org/linux-kernel/54/271417.html

I currently surmise that we have a problem with the cache victim
replacement, although the cause isn't clear.  I did find recently
that the cache prefetch in copy_user_page_asm extends to the line
beyond the end of the page, but fixing this doesn't resolve the problem.

I am still experimenting with using equivalent aliasing.  It does
help to flush in ptep_set_wrprotect.

Dave
-- 
J. David Anglin  dave.ang...@nrc-cnrc.gc.ca
National Research Council of Canada  (613) 990-0752 (FAX: 952-6602)

-- 
To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/20100602171600.ga5...@hiauly1.hia.nrc.ca

Bug#561203: threads and fork on machine with VIPT-WB cache

2010-06-02 Thread Modestas Vainius

Hello,

this bug [1] is back to the "very common" department with eglibc 2.11 (libc6-
dev_2.11.1-1) builds. Majority of KDE applications are failing to build on 
hppa again. Is there really nothing what could be done to fix it?

1. http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=561203
2. 
https://buildd.debian.org/fetch.cgi?pkg=kde4libs;ver=4%3A4.4.4-1;arch=hppa;stamp=1275467025
3. 
https://buildd.debian.org/fetch.cgi?pkg=basket;ver=1.80-1;arch=hppa;stamp=1275483241

-- 
Modestas Vainius 


signature.asc
Description: This is a digitally signed message part.

Bug#561203: threads and fork on machine with VIPT-WB cache

2010-04-08 Thread John David Anglin

> On Thu, 08 Apr 2010, Helge Deller wrote:

> > I tested your patch today on one of my machines with plain kernel 2.6.33 
> > (32bit, SMP, B2000 I think).
> > Sadly I still did see the minifail bug.
> > 
> > Are you sure, that the patch fixed this bug for you?
> 
> Seemed to, but I have a bunch of other changes installed.  Possibly,
> the change to cacheflush.h is important.  It affects all PA8000.

I also think the change suggested by James

+   if (pte_dirty(old_pte))

is important for SMP.  With the patch set that I sent, my rp3440 and
gsyprf11 seem reasonably stable running 2.6.33.2 SMP.  I doubt all
problems are solved but things are a lot better than before.

Dave
-- 
J. David Anglin  dave.ang...@nrc-cnrc.gc.ca
National Research Council of Canada  (613) 990-0752 (FAX: 952-6602)



-- 
To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/20100408224446.96f294...@hiauly1.hia.nrc.ca

Bug#561203: threads and fork on machine with VIPT-WB cache

2010-04-08 Thread Helge Deller

On 04/02/2010 09:35 PM, John David Anglin wrote:
> On Fri, 02 Apr 2010, NIIBE Yutaka wrote:
> 
>> NIIBE Yutaka wrote:
>>> To have same semantics as other archs, I think that VIPT-WB cache
>>> machine should have cache flush at ptep_set_wrprotect, so that memory
>>> of the page has up-to-date data.  Yes, it will be huge performance
>>> impact for fork.  But I don't find any good solution other than this
>>> yet.
>>
>> I think we could do something like (only for VIPT-WB cache machine):
>>
>> -static inline void ptep_set_wrprotect(struct mm_struct *mm, unsigned 
>> long 
>> address, pte_t *ptep)
>>
>> +static inline void ptep_set_wrprotect(struct vm_area_struct *vma, 
>> struct 
>> mm_struct *mm, unsigned long addr, pte_t *ptep)
>>  {
>>  pte_t old_pte = *ptep;
>> +if (atomic_read(&mm->mm_users) > 1)
>> +flush_cache_page(vma, addr, pte_pfn(old_pte));
>>  set_pte_at(mm, addr, ptep, pte_wrprotect(old_pte));
>>  }
> 
> I tested the hack below on two machines currently running 2.6.33.2
> UP kernels.  The change seems to fix Debian #561203 (minifail bug)!
> Thus, I definitely think you are on the right track.  I'll continue
> to test.
> 
> I suspect the same issue is present for SMP kernels.

Hi Dave,

I tested your patch today on one of my machines with plain kernel 2.6.33 
(32bit, SMP, B2000 I think).
Sadly I still did see the minifail bug.

Are you sure, that the patch fixed this bug for you?

Helge

do_page_fault() pid=21470 command='minifail3' type=6 address=0x0003
do_page_fault() pid=7986 command='minifail3' type=6 address=0x0003  
   
do_page_fault() pid=19952 command='minifail3' type=6 address=0x0003 
   
do_page_fault() pid=13549 command='minifail3' type=6 address=0x0003
do_page_fault() pid=21862 command='minifail3' type=6 address=0x0003
do_page_fault() pid=4615 command='minifail3' type=6 address=0x0003
do_page_fault() pid=17336 command='minifail3' type=6 address=0x0003
do_page_fault() pid=21986 command='minifail3' type=6 address=0x0003
do_page_fault() pid=2157 command='minifail3' type=15 address=0x00dc
do_page_fault() pid=23886 command='minifail3' type=6 address=0x0003
do_page_fault() pid=2681 command='minifail3' type=6 address=0x0003
do_page_fault() pid=3229 command='minifail3' type=15 address=0x00ec
do_page_fault() pid=26095 command='minifail3' type=6 address=0x0003
do_page_fault() pid=20722 command='minifail3' type=6 address=0x0003
do_page_fault() pid=19912 command='minifail3' type=15 address=0x00ec
...
pagealloc: memory corruption
7db0c780: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  
7db0c790: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  
7db0c7a0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  
7db0c7b0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  
Backtrace:
 [<1011ec14>] show_stack+0x18/0x28
 [<10117ba0>] dump_stack+0x1c/0x2c
 [<101c6594>] kernel_map_pages+0x2a0/0x2b8
 [<1019e6c8>] get_page_from_freelist+0x3d4/0x614
 [<1019ea3c>] __alloc_pages_nodemask+0x134/0x610
 [<101b1d20>] do_wp_page+0x268/0xac0
 [<101b3b34>] handle_mm_fault+0x4d4/0x7c4
 [<1011d854>] do_page_fault+0x1f8/0x2fc
 [<1011f450>] handle_interruption+0xec/0x730
 [<10103078>] intr_check_sig+0x0/0x34
...
do_page_fault() pid=13414 command='minifail3' type=15 address=0x00dc
do_page_fault() pid=22776 command='minifail3' type=15 address=0x
do_page_fault() pid=26290 command='minifail3' type=15 address=0x00ec
do_page_fault() pid=1399 command='minifail3' type=6 address=0x0003
do_page_fault() pid=16130 command='minifail3' type=6 address=0x0003
do_page_fault() pid=26401 command='minifail3' type=6 address=0x0003
do_page_fault() pid=3383 command='minifail3' type=6 address=0x0003
do_page_fault() pid=3400 command='minifail3' type=15 address=0x0004
do_page_fault() pid=18659 command='minifail3' type=6 address=0x0003
do_page_fault() pid=3730 command='minifail3' type=6 address=0x0003
do_page_fault() pid=28828 command='minifail3' type=6 address=0x0003



-- 
To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/4bbe468d.5060...@gmx.de

Bug#561203: threads and fork on machine with VIPT-WB cache

2010-04-06 Thread James Bottomley

On Tue, 2010-04-06 at 13:57 +0900, NIIBE Yutaka wrote:
> John David Anglin wrote:
> > It is interesting that in the case of the Debian bug that
> > a thread of the parent process causes the COW break and thereby corrupts
> > its own memory.  As far as I can tell, the fork'd child never writes
> > to the memory that causes the fault.
> 
> Thanks for writing and testing a patch.
> 
> The case of #561203 is second scenario.  I think that this case is
> relevant to VIVT-WB machine too (provided kernel does copy by kernel
> address).
> 
> James Bottomley wrote:
> > So this is going to be a hard sell because of the arch churn. There are,
> > however, three ways to do it with the original signature.
> 
> Currently, I think that signature change would be inevitable for
> ptep_set_wrprotect.

Well we can't do it by claiming several architectures are wrong in their
implementation.  We might do it by claiming to need vma knowledge ...
however, even if you want the flush, as I said, you don't need to change
the signature.

> >  1. implement copy_user_highpage ... this allows us to copy through
> > the child's page cache (which is coherent with the parent's
> > before the cow) and thus pick up any cache changes without a
> > flush
> 
> Let me think about this way.
> 
> Well, this would improve both cases of the first scenario of mine and
> the second scenario.
> 
> But... I think that even if we would have copy_user_highpage which
> does copy by user address, we need to flush at ptep_set_wrprotect.  I
> think that we need to keep the condition: no dirty cache for COW page.
> 
> Think about third scenario of threads and fork:
> 
> (1) In process A, there are multiple threads, and a thread A-1 invokes
> fork.  We have process B, with a different space identifier color.

I don't understand what you mean by space colour ... there's cache
colour which refers to the line in the cache to which the the physical
memory maps.  The way PA is set up, space ID doesn't factor into cache
colour.

> (2) Another thread A-2 in process A runs while A-1 copies memory by
> dup_mmap.  A-2 writes to the address  in a page.  Let's call
> this page .
> 
> (3) We have dirty cache for  by A-2 at the time of
> ptep_set_wrprotect of thread A-1.  Suppose that we don't flush
> here.
> 
> (4) A-1 finishes copy, and sleeps.
> 
> (5) Child process B is waken up and sees old value at  in ,
> through different cache line.  B sleeps.

This isn't possible.  at this point, A and B have the same virtual
address and mapping for  this means they are the same cache
colour, so they both see the cached value.

James

> (6) A-2 is waken up.  A-2 touches the memory again, breaks COW.  A-2
> copies data on  to .  OK,  is
> consistent with copy_user_highpage by user address.
> 
> Note that during this copy, the cache line of  by A-2 is
> flushed out to .  It invokes another memory fault and COW
> break.  (I think that this memory fault is unhealthy.)
> Then, new value goes to  on  (when it's physically
> tagged cache).
> 
> A-2 sleeps.
> 
> (7) Child process B is waken up.  When it accesses at , it sees new
> value suddenly.
> 
> 
> If we flush cache to  at ptep_set_wrprotect, this couldn't
> occur.
> 
> 
>   *   *   *
> 
> 
> I know that we should not do "threads and fork".  It is difficult to
> define clean semantics.  Because another thread may touch memory while
> a thread which does memory copy for fork, the memory what the child
> process will see may be inconsistent.  For the child, a page might be
> new, while another page might be old.
> 
> For VIVT-WB cache machine, I am considering a possibility for the
> child process to have inconsistent memory even within a single page
> (when we have no flush at ptep_set_wrprotect).
> 
> It will be needed for me to talk to linux-arch soon or later.





-- 
To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/1270561069.4493.29.ca...@mulgrave.site

Bug#561203: threads and fork on machine with VIPT-WB cache

2010-04-06 Thread James Bottomley

On Tue, 2010-04-06 at 08:37 -0500, James Bottomley wrote:
> > (5) Child process B is waken up and sees old value at  in
> ,
> > through different cache line.  B sleeps.
> 
> This isn't possible.  at this point, A and B have the same virtual
> address and mapping for  this means they are the same cache
> colour, so they both see the cached value.

Perhaps to add more detail to this.  In spite of what the arch manual
says (it says the congruence stride is 16MB), the congruence stride on
all manufactured parisc processors is 4MB.  This means that any virtual
addresses, regardless of space id, that are equal modulo 4MB have the
same cache colour.

James

-- 
To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/1270561481.4493.40.ca...@mulgrave.site

Bug#561203: threads and fork on machine with VIPT-WB cache

2010-04-05 Thread NIIBE Yutaka

John David Anglin wrote:
> It is interesting that in the case of the Debian bug that
> a thread of the parent process causes the COW break and thereby corrupts
> its own memory.  As far as I can tell, the fork'd child never writes
> to the memory that causes the fault.

Thanks for writing and testing a patch.

The case of #561203 is second scenario.  I think that this case is
relevant to VIVT-WB machine too (provided kernel does copy by kernel
address).

James Bottomley wrote:
> So this is going to be a hard sell because of the arch churn. There are,
> however, three ways to do it with the original signature.

Currently, I think that signature change would be inevitable for
ptep_set_wrprotect.

>  1. implement copy_user_highpage ... this allows us to copy through
> the child's page cache (which is coherent with the parent's
> before the cow) and thus pick up any cache changes without a
> flush

Let me think about this way.

Well, this would improve both cases of the first scenario of mine and
the second scenario.

But... I think that even if we would have copy_user_highpage which
does copy by user address, we need to flush at ptep_set_wrprotect.  I
think that we need to keep the condition: no dirty cache for COW page.

Think about third scenario of threads and fork:

(1) In process A, there are multiple threads, and a thread A-1 invokes
fork.  We have process B, with a different space identifier color.

(2) Another thread A-2 in process A runs while A-1 copies memory by
dup_mmap.  A-2 writes to the address  in a page.  Let's call
this page .

(3) We have dirty cache for  by A-2 at the time of
ptep_set_wrprotect of thread A-1.  Suppose that we don't flush
here.

(4) A-1 finishes copy, and sleeps.

(5) Child process B is waken up and sees old value at  in ,
through different cache line.  B sleeps.

(6) A-2 is waken up.  A-2 touches the memory again, breaks COW.  A-2
copies data on  to .  OK,  is
consistent with copy_user_highpage by user address.

Note that during this copy, the cache line of  by A-2 is
flushed out to .  It invokes another memory fault and COW
break.  (I think that this memory fault is unhealthy.)
Then, new value goes to  on  (when it's physically
tagged cache).

A-2 sleeps.

(7) Child process B is waken up.  When it accesses at , it sees new
value suddenly.

If we flush cache to  at ptep_set_wrprotect, this couldn't
occur.

*   *   *

I know that we should not do "threads and fork".  It is difficult to
define clean semantics.  Because another thread may touch memory while
a thread which does memory copy for fork, the memory what the child
process will see may be inconsistent.  For the child, a page might be
new, while another page might be old.

For VIVT-WB cache machine, I am considering a possibility for the
child process to have inconsistent memory even within a single page
(when we have no flush at ptep_set_wrprotect).

It will be needed for me to talk to linux-arch soon or later.
-- 

-- 
To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/4bbabf23.1030...@fsij.org

Bug#561203: threads and fork on machine with VIPT-WB cache

2010-04-05 Thread James Bottomley

On Sun, 2010-04-04 at 22:51 -0400, John David Anglin wrote:
> > Thanks a lot for the discussion.
> > 
> > James Bottomley wrote:
> > > So your theory is that the data the kernel sees doing the page copy can
> > > be stale because of dirty cache lines in userspace (which is certainly
> > > possible in the ordinary way)?
> > 
> > Yes.
> > 
> > > By design that shouldn't happen: the idea behind COW breaking is
> > > that before it breaks, the page is read only ... this means that
> > > processes can have clean cache copies of it, but never dirty cache
> > > copies (because writes are forbidden).
> > 
> > That must be design, I agree.
> > 
> > To keep this condition (no dirty cache for COW page), we need to flush
> > cache before ptep_set_wrprotect.  That's my point.
> > 
> > Please look at the code path:
> >(kernel/fork.c)
> >do_fork -> copy_process -> copy_mm -> dup_mm -> dup_mmap ->
> >(mm/memory.c)
> >copy_page_range -> copy_p*d_range -> copy_one_pte -> ptep_set_wrprotect
> > 
> > The function flush_cache_dup_mm is called from dup_mmap, that's enough
> > for a case of a process with single thread.
> > I think that:
> > We need to flush cache before ptep_set_wrprotect for a process with
> > multiple threads.  Other threads may change memory after a thread
> > invokes do_fork and before calling ptep_set_wrprotect.  Specifically,
> > a process may sleep at pte_alloc function to get a page.
> 
> I agree.  It is interesting that in the case of the Debian bug that
> a thread of the parent process causes the COW break and thereby corrupts
> its own memory.  As far as I can tell, the fork'd child never writes
> to the memory that causes the fault.
> 
> My testing indicates that your suggested change fixes the Debian
> bug.  I've attached below my latest test version.  This seems to fix
> the bug on both SMP and UP kernels.
> 
> However, it doesn't fix all page/cache related issues on parisc
> SMP kernels that I commonly see.
> 
> My first inclination after even before reading your analysis was
> to assume that copy_user_page was broken (i.e, that even if a
> processor cache was dirty when the COW page was write protected,
> it should be possible to do the flush before the page is copied).
> However, this didn't seem to work...  Possibly, there are issues
> with aliased addresses.
> 
> I note that sparc flushes the entire cache and purges the entire
> tlb in kmap_atomic/kunmap_atomic for highmem.  Although the breakage
> that I see is not limited to PA8800/PA8900, I'm not convinced
> that we maintain coherency that is required for these processors
> in copy_user_page when we have multiple threads.
> 
> As a side note, kmap_atomic/kunmap_atomic seem to lack calls to
> pagefault_disable()/pagefault_enable() on PA8800.
> 
> Dave
> -- 
> J. David Anglin  dave.ang...@nrc-cnrc.gc.ca
> National Research Council of Canada  (613) 990-0752 (FAX: 
> 952-6602)
> 
> diff --git a/arch/parisc/include/asm/pgtable.h 
> b/arch/parisc/include/asm/pgtable.h
> index a27d2e2..b140d5c 100644
> --- a/arch/parisc/include/asm/pgtable.h
> +++ b/arch/parisc/include/asm/pgtable.h
> @@ -14,6 +14,7 @@
>  #include 
>  #include 
>  #include 
> +extern void flush_cache_page(struct vm_area_struct *vma, unsigned long 
> vmaddr, unsigned long pfn);
>  
>  /*
>   * kern_addr_valid(ADDR) tests if ADDR is pointing to valid kernel
> @@ -456,17 +457,22 @@ static inline pte_t ptep_get_and_clear(struct mm_struct 
> *mm, unsigned long addr,
>   return old_pte;
>  }
>  
> -static inline void ptep_set_wrprotect(struct mm_struct *mm, unsigned long 
> addr, pte_t *ptep)
> +static inline void ptep_set_wrprotect(struct vm_area_struct *vma, struct 
> mm_struct *mm, unsigned long addr, pte_t *ptep)
>  {
>  #ifdef CONFIG_SMP
>   unsigned long new, old;
> +#endif
> + pte_t old_pte = *ptep;
> +
> + if (atomic_read(&mm->mm_users) > 1)

Just to verify there's nothing this is hiding, can you make this 

if (pte_dirty(old_pte))

and reverify?  The if clause should only trip on the case where the
parent has dirtied the line between flush and now.

> + flush_cache_page(vma, addr, pte_pfn(old_pte));
>  
> +#ifdef CONFIG_SMP
>   do {
>   old = pte_val(*ptep);
>   new = pte_val(pte_wrprotect(__pte (old)));
>   } while (cmpxchg((unsigned long *) ptep, old, new) != old);
>  #else
> - pte_t old_pte = *ptep;
>   set_pte_at(mm, addr, ptep, pte_wrprotect(old_pte));
>  #endif
>  }
> diff --git a/mm/memory.c b/mm/memory.c
> index 09e4b1b..21c2916 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -616,7 +616,7 @@ copy_one_pte(struct mm_struct *dst_mm, struct mm_struct 
> *src_mm,
>* in the parent and the child
>*/
>   if (is_cow_mapping(vm_flags)) {
> - ptep_set_wrprotect(src_mm, addr, src_pte);
> + ptep_set_wrprotect(vma, src_mm, addr, src_pte);

So this is going to be a hard sell because of the arch churn.

Bug#561203: threads and fork on machine with VIPT-WB cache

2010-04-04 Thread John David Anglin

> > > By design that shouldn't happen: the idea behind COW breaking is
> > > that before it breaks, the page is read only ... this means that
> > > processes can have clean cache copies of it, but never dirty cache
> > > copies (because writes are forbidden).
> > 
> > That must be design, I agree.
> > 
> > To keep this condition (no dirty cache for COW page), we need to flush
> > cache before ptep_set_wrprotect.  That's my point.

Is it possible that a sleep/reschedule could cause the cache to become
dirty again before it is write protected?

Dave
-- 
J. David Anglin  dave.ang...@nrc-cnrc.gc.ca
National Research Council of Canada  (613) 990-0752 (FAX: 952-6602)



-- 
To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/20100405025840.b79a54...@hiauly1.hia.nrc.ca

Bug#561203: threads and fork on machine with VIPT-WB cache

2010-04-04 Thread John David Anglin

> Thanks a lot for the discussion.
> 
> James Bottomley wrote:
> > So your theory is that the data the kernel sees doing the page copy can
> > be stale because of dirty cache lines in userspace (which is certainly
> > possible in the ordinary way)?
> 
> Yes.
> 
> > By design that shouldn't happen: the idea behind COW breaking is
> > that before it breaks, the page is read only ... this means that
> > processes can have clean cache copies of it, but never dirty cache
> > copies (because writes are forbidden).
> 
> That must be design, I agree.
> 
> To keep this condition (no dirty cache for COW page), we need to flush
> cache before ptep_set_wrprotect.  That's my point.
> 
> Please look at the code path:
>(kernel/fork.c)
>do_fork -> copy_process -> copy_mm -> dup_mm -> dup_mmap ->
>(mm/memory.c)
>copy_page_range -> copy_p*d_range -> copy_one_pte -> ptep_set_wrprotect
> 
> The function flush_cache_dup_mm is called from dup_mmap, that's enough
> for a case of a process with single thread.
> I think that:
> We need to flush cache before ptep_set_wrprotect for a process with
> multiple threads.  Other threads may change memory after a thread
> invokes do_fork and before calling ptep_set_wrprotect.  Specifically,
> a process may sleep at pte_alloc function to get a page.

I agree.  It is interesting that in the case of the Debian bug that
a thread of the parent process causes the COW break and thereby corrupts
its own memory.  As far as I can tell, the fork'd child never writes
to the memory that causes the fault.

My testing indicates that your suggested change fixes the Debian
bug.  I've attached below my latest test version.  This seems to fix
the bug on both SMP and UP kernels.

However, it doesn't fix all page/cache related issues on parisc
SMP kernels that I commonly see.

My first inclination after even before reading your analysis was
to assume that copy_user_page was broken (i.e, that even if a
processor cache was dirty when the COW page was write protected,
it should be possible to do the flush before the page is copied).
However, this didn't seem to work...  Possibly, there are issues
with aliased addresses.

I note that sparc flushes the entire cache and purges the entire
tlb in kmap_atomic/kunmap_atomic for highmem.  Although the breakage
that I see is not limited to PA8800/PA8900, I'm not convinced
that we maintain coherency that is required for these processors
in copy_user_page when we have multiple threads.

As a side note, kmap_atomic/kunmap_atomic seem to lack calls to
pagefault_disable()/pagefault_enable() on PA8800.

Dave
-- 
J. David Anglin  dave.ang...@nrc-cnrc.gc.ca
National Research Council of Canada  (613) 990-0752 (FAX: 952-6602)

diff --git a/arch/parisc/include/asm/pgtable.h 
b/arch/parisc/include/asm/pgtable.h
index a27d2e2..b140d5c 100644
--- a/arch/parisc/include/asm/pgtable.h
+++ b/arch/parisc/include/asm/pgtable.h
@@ -14,6 +14,7 @@
 #include 
 #include 
 #include 
+extern void flush_cache_page(struct vm_area_struct *vma, unsigned long vmaddr, 
unsigned long pfn);
 
 /*
  * kern_addr_valid(ADDR) tests if ADDR is pointing to valid kernel
@@ -456,17 +457,22 @@ static inline pte_t ptep_get_and_clear(struct mm_struct 
*mm, unsigned long addr,
return old_pte;
 }
 
-static inline void ptep_set_wrprotect(struct mm_struct *mm, unsigned long 
addr, pte_t *ptep)
+static inline void ptep_set_wrprotect(struct vm_area_struct *vma, struct 
mm_struct *mm, unsigned long addr, pte_t *ptep)
 {
 #ifdef CONFIG_SMP
unsigned long new, old;
+#endif
+   pte_t old_pte = *ptep;
+
+   if (atomic_read(&mm->mm_users) > 1)
+   flush_cache_page(vma, addr, pte_pfn(old_pte));
 
+#ifdef CONFIG_SMP
do {
old = pte_val(*ptep);
new = pte_val(pte_wrprotect(__pte (old)));
} while (cmpxchg((unsigned long *) ptep, old, new) != old);
 #else
-   pte_t old_pte = *ptep;
set_pte_at(mm, addr, ptep, pte_wrprotect(old_pte));
 #endif
 }
diff --git a/mm/memory.c b/mm/memory.c
index 09e4b1b..21c2916 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -616,7 +616,7 @@ copy_one_pte(struct mm_struct *dst_mm, struct mm_struct 
*src_mm,
 * in the parent and the child
 */
if (is_cow_mapping(vm_flags)) {
-   ptep_set_wrprotect(src_mm, addr, src_pte);
+   ptep_set_wrprotect(vma, src_mm, addr, src_pte);
pte = pte_wrprotect(pte);
}
 



-- 
To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/20100405025154.de1d55...@hiauly1.hia.nrc.ca

Bug#561203: threads and fork on machine with VIPT-WB cache

Bug#561203: threads and fork on machine with VIPT-WB cache

Bug#561203: threads and fork on machine with VIPT-WB cache

Bug#561203: threads and fork on machine with VIPT-WB cache

Bug#561203: threads and fork on machine with VIPT-WB cache

Bug#561203: threads and fork on machine with VIPT-WB cache

Processed: Re: Bug#561203: threads and fork on machine with VIPT-WB cache

Bug#561203: threads and fork on machine with VIPT-WB cache

Bug#561203: threads and fork on machine with VIPT-WB cache

Bug#561203: threads and fork on machine with VIPT-WB cache

Bug#561203: threads and fork on machine with VIPT-WB cache

Bug#561203: threads and fork on machine with VIPT-WB cache

Bug#561203: threads and fork on machine with VIPT-WB cache

Bug#561203: threads and fork on machine with VIPT-WB cache

Bug#561203: threads and fork on machine with VIPT-WB cache

Bug#561203: threads and fork on machine with VIPT-WB cache

Bug#561203: threads and fork on machine with VIPT-WB cache

Bug#561203: threads and fork on machine with VIPT-WB cache

Bug#561203: threads and fork on machine with VIPT-WB cache

Bug#561203: threads and fork on machine with VIPT-WB cache

Bug#561203: threads and fork on machine with VIPT-WB cache

21 matches

Site Navigation

Mail list logo

Footer information