Re: Data corruption with raid5/dm-crypt/lvm/reiserfs on 2.6.19.2

2007-01-22 Thread Christophe Saout
Am Montag, den 22.01.2007, 11:56 -0800 schrieb Andrew Morton:

> There has been a long history of similar problems when raid and dm-crypt
> are used together.  I thought a couple of months ago that we were hot on
> the trail of a fix, but I don't think we ever got there.  Perhaps
> Christophe can comment?

No, I think it's exactly this bug. Three month ago someone came up with
a very reliable test case and I managed to nail down the bug.

Readaheads that were aborted by the raid5 code (or some layer below)
were signalled using a cleared BIO_UPTODATE bit, but no error code, and
were missed as aborted by dm-crypt (all other layers apparently set the
error code in this case, so this only happened with raid5) which could
mess up the buffer cache.

Anyway, it then turned out this bug was already "accidentally" fixed in
2.6.19 by RedHat in order to play nicely with make_request changes (the
stuff to reduce stack usage with stacked block device layers), that's
why you probably missed that it got fixed. The fix for pre-2.6.19
kernels went into some 2.6.16.x and 2.6.18.6.


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Data corruption with raid5/dm-crypt/lvm/reiserfs on 2.6.19.2

2007-01-22 Thread Christophe Saout
Am Montag, den 22.01.2007, 11:56 -0800 schrieb Andrew Morton:

 There has been a long history of similar problems when raid and dm-crypt
 are used together.  I thought a couple of months ago that we were hot on
 the trail of a fix, but I don't think we ever got there.  Perhaps
 Christophe can comment?

No, I think it's exactly this bug. Three month ago someone came up with
a very reliable test case and I managed to nail down the bug.

Readaheads that were aborted by the raid5 code (or some layer below)
were signalled using a cleared BIO_UPTODATE bit, but no error code, and
were missed as aborted by dm-crypt (all other layers apparently set the
error code in this case, so this only happened with raid5) which could
mess up the buffer cache.

Anyway, it then turned out this bug was already accidentally fixed in
2.6.19 by RedHat in order to play nicely with make_request changes (the
stuff to reduce stack usage with stacked block device layers), that's
why you probably missed that it got fixed. The fix for pre-2.6.19
kernels went into some 2.6.16.x and 2.6.18.6.


-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [stable][PATCH < 2.6.19] Fix data corruption with dm-crypt over RAID5

2006-12-02 Thread Christophe Saout
Am Freitag, den 01.12.2006, 19:49 -0800 schrieb Chris Wright:

> * Christophe Saout ([EMAIL PROTECTED]) wrote:
> > Fix corruption issue with dm-crypt on top of software raid5. Cancelled
> > readahead bio's that report no error, just have BIO_UPTODATE cleared
> > were reported as successful reads to the higher layers (and leaving
> > random content in the buffer cache). Already fixed in 2.6.19.
> 
> I take it this is fixed a different way in 2.6.19?  Mind clarifying the
> difference?

It's more or less fixed the same way in 2.6.19. The function was
reordered by Milan Broz to accommodate the changed write path (commit
8b004457168995f2ae2a35327f885183a9e74141):

> [PATCH] dm crypt: restructure for workqueue change
>
> Restructure part of the dm-crypt code in preparation for workqueue
> changes.
>
> Use 'base_bio' or 'clone' variable names consistently throughout.  No
> functional changes are included in this patch.

"No functional changes" actually included the correct fix, accidental or
not.

> Minor nit:  introduces trailing whitespaces, cleaned it up locally.

Ouch, sorry.



signature.asc
Description: Dies ist ein digital signierter Nachrichtenteil


Re: [stable][PATCH 2.6.19] Fix data corruption with dm-crypt over RAID5

2006-12-02 Thread Christophe Saout
Am Freitag, den 01.12.2006, 19:49 -0800 schrieb Chris Wright:

 * Christophe Saout ([EMAIL PROTECTED]) wrote:
  Fix corruption issue with dm-crypt on top of software raid5. Cancelled
  readahead bio's that report no error, just have BIO_UPTODATE cleared
  were reported as successful reads to the higher layers (and leaving
  random content in the buffer cache). Already fixed in 2.6.19.
 
 I take it this is fixed a different way in 2.6.19?  Mind clarifying the
 difference?

It's more or less fixed the same way in 2.6.19. The function was
reordered by Milan Broz to accommodate the changed write path (commit
8b004457168995f2ae2a35327f885183a9e74141):

 [PATCH] dm crypt: restructure for workqueue change

 Restructure part of the dm-crypt code in preparation for workqueue
 changes.

 Use 'base_bio' or 'clone' variable names consistently throughout.  No
 functional changes are included in this patch.

No functional changes actually included the correct fix, accidental or
not.

 Minor nit:  introduces trailing whitespaces, cleaned it up locally.

Ouch, sorry.



signature.asc
Description: Dies ist ein digital signierter Nachrichtenteil


[stable][PATCH < 2.6.19] Fix data corruption with dm-crypt over RAID5

2006-12-01 Thread Christophe Saout
Fix corruption issue with dm-crypt on top of software raid5. Cancelled
readahead bio's that report no error, just have BIO_UPTODATE cleared
were reported as successful reads to the higher layers (and leaving
random content in the buffer cache). Already fixed in 2.6.19.

Signed-off-by: Christophe Saout <[EMAIL PROTECTED]>


--- linux-2.6.18.orig/drivers/md/dm-crypt.c 2006-09-20 05:42:06.0 
+0200
+++ linux-2.6.18/drivers/md/dm-crypt.c  2006-12-02 03:03:36.0 +0100
@@ -717,13 +717,15 @@
if (bio->bi_size)
return 1;
 
+   if (!bio_flagged(bio, BIO_UPTODATE) && !error)
+   error = -EIO;
+
bio_put(bio);
 
/*
 * successful reads are decrypted by the worker thread
 */
-   if ((bio_data_dir(bio) == READ)
-   && bio_flagged(bio, BIO_UPTODATE)) {
+   if (bio_data_dir(io->bio) == READ && !error) {
kcryptd_queue_io(io);
return 0;
}


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[stable][PATCH 2.6.19] Fix data corruption with dm-crypt over RAID5

2006-12-01 Thread Christophe Saout
Fix corruption issue with dm-crypt on top of software raid5. Cancelled
readahead bio's that report no error, just have BIO_UPTODATE cleared
were reported as successful reads to the higher layers (and leaving
random content in the buffer cache). Already fixed in 2.6.19.

Signed-off-by: Christophe Saout [EMAIL PROTECTED]


--- linux-2.6.18.orig/drivers/md/dm-crypt.c 2006-09-20 05:42:06.0 
+0200
+++ linux-2.6.18/drivers/md/dm-crypt.c  2006-12-02 03:03:36.0 +0100
@@ -717,13 +717,15 @@
if (bio-bi_size)
return 1;
 
+   if (!bio_flagged(bio, BIO_UPTODATE)  !error)
+   error = -EIO;
+
bio_put(bio);
 
/*
 * successful reads are decrypted by the worker thread
 */
-   if ((bio_data_dir(bio) == READ)
-bio_flagged(bio, BIO_UPTODATE)) {
+   if (bio_data_dir(io-bio) == READ  !error) {
kcryptd_queue_io(io);
return 0;
}


-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [BUG mm] "fixed" i386 memcpy inlining buggy

2005-04-06 Thread Christophe Saout
Am Mittwoch, den 06.04.2005, 13:14 +0300 schrieb Denis Vlasenko:

> Oh shit. I was trying to be too clever. I still run with this patch,
> so it must be happening very rarely.

Yes, that's right, it happened with code that's not in the mainline tree
but could have happened anywhere.

> Does this one compile ok?

Yes, the case that failed is now okay. I changed it slightly to assign
esi and edi directy on top of the functions, no asm section needed here.
The compiler will make sure that they have the correct values when
needed.

In the case above the compiler now uses %ebx to save the loop counter
instead of %esi.

In drivers/cdrom/cdrom.c I'm observing one very strange thing though.

It appears that the compiler decided to put the local variable edi on
the stack for some unexplicable reason (or possibly there is?). Since
the asm sections are memory barriers the compiler then saves the value
of %edi on the stack before entering the next assembler section.

1f1c:   a5  movsl  %ds:(%esi),%es:(%edi)
1f1d:   89 7d 84mov%edi,0xff84(%ebp)
1f20:   a5  movsl  %ds:(%esi),%es:(%edi)
1f21:   89 7d 84mov%edi,0xff84(%ebp)
1f24:   66 a5   movsw  %ds:(%esi),%es:(%edi)

(this is a constant 10 byte memcpy)

The only thing that would avoid this is to either tell the compiler to
never put esi/edi in memory (which I think is not possibly across
different versions of gcc) or to always generate a single asm section
for all the different cases.



signature.asc
Description: This is a digitally signed message part


Re: [BUG mm] fixed i386 memcpy inlining buggy

2005-04-06 Thread Christophe Saout
Am Mittwoch, den 06.04.2005, 13:14 +0300 schrieb Denis Vlasenko:

 Oh shit. I was trying to be too clever. I still run with this patch,
 so it must be happening very rarely.

Yes, that's right, it happened with code that's not in the mainline tree
but could have happened anywhere.

 Does this one compile ok?

Yes, the case that failed is now okay. I changed it slightly to assign
esi and edi directy on top of the functions, no asm section needed here.
The compiler will make sure that they have the correct values when
needed.

In the case above the compiler now uses %ebx to save the loop counter
instead of %esi.

In drivers/cdrom/cdrom.c I'm observing one very strange thing though.

It appears that the compiler decided to put the local variable edi on
the stack for some unexplicable reason (or possibly there is?). Since
the asm sections are memory barriers the compiler then saves the value
of %edi on the stack before entering the next assembler section.

1f1c:   a5  movsl  %ds:(%esi),%es:(%edi)
1f1d:   89 7d 84mov%edi,0xff84(%ebp)
1f20:   a5  movsl  %ds:(%esi),%es:(%edi)
1f21:   89 7d 84mov%edi,0xff84(%ebp)
1f24:   66 a5   movsw  %ds:(%esi),%es:(%edi)

(this is a constant 10 byte memcpy)

The only thing that would avoid this is to either tell the compiler to
never put esi/edi in memory (which I think is not possibly across
different versions of gcc) or to always generate a single asm section
for all the different cases.



signature.asc
Description: This is a digitally signed message part


Re: 2.6.12-rc2-mm1

2005-04-05 Thread Christophe Saout
Hi Andrew,

> - Nobody said anything about the PM resume and DRI behaviour in
>   2.6.12-rc1-mm4.  So it's all perfect now?

Yes, works for me. DRI (i915) is working again and  USB is now happy
after a PM resume too.



signature.asc
Description: This is a digitally signed message part


[BUG mm] "fixed" i386 memcpy inlining buggy

2005-04-05 Thread Christophe Saout
Hi Denis,

the new i386 memcpy macro is a ticking timebomb.

I've been debugging a new mISDN crash, just to find out that a memcpy
was not inlined correctly.

Andrew, you should drop the fix-i386-memcpy.patch (or have it fixed).

This source code:

mISDN_pid_t pid;
[...]
memcpy(>mgr->pid, , sizeof(mISDN_pid_t));

was compiled as:

lea0xffa4(%ebp),%esi< %esi is loaded
(   add$0x10,%ebx  )
(   mov%ebx,%eax   ) something else
(   call   1613  ) %esi preserved
mov0xffa0(%ebp),%edx
mov0x74(%edx),%edi  < %edi is loaded
add$0x20,%edi offset in structure added
!   mov$0x14,%esi!! < %esi overwritten!
mov%esi,%ecx< %ecx loaded
repz movsl %ds:(%esi),%es:(%edi)

Apparently the compiled decided that the value 0x14 could be reused
afterwards (which it does for an inlined memset of the same size some
instructions below) and clobbers %esi.

Looking at the macro:

__asm__ __volatile__(
""
: "=" (edi), "=" (esi)
: "0" ((long) to),"1" ((long) from)
: "memory"
);
}
if (n >= 5*4) {
/* large block: use rep prefix */
int ecx;
__asm__ __volatile__(
"rep ; movsl"
: "=" (ecx)

it seems obvious that the compiled assumes it can reuse %esi and %edi
for something else between the two __asm__ sections. These should
probably be merged.



signature.asc
Description: This is a digitally signed message part


Re: 2.6.12-rc2-mm1

2005-04-05 Thread Christophe Saout
Hi Andrew,

 - Nobody said anything about the PM resume and DRI behaviour in
   2.6.12-rc1-mm4.  So it's all perfect now?

Yes, works for me. DRI (i915) is working again and  USB is now happy
after a PM resume too.



signature.asc
Description: This is a digitally signed message part


[BUG mm] fixed i386 memcpy inlining buggy

2005-04-05 Thread Christophe Saout
Hi Denis,

the new i386 memcpy macro is a ticking timebomb.

I've been debugging a new mISDN crash, just to find out that a memcpy
was not inlined correctly.

Andrew, you should drop the fix-i386-memcpy.patch (or have it fixed).

This source code:

mISDN_pid_t pid;
[...]
memcpy(st-mgr-pid, pid, sizeof(mISDN_pid_t));

was compiled as:

lea0xffa4(%ebp),%esi %esi is loaded
(   add$0x10,%ebx  )
(   mov%ebx,%eax   ) something else
(   call   1613 test_stack_protocol+0x83 ) %esi preserved
mov0xffa0(%ebp),%edx
mov0x74(%edx),%edi   %edi is loaded
add$0x20,%edi offset in structure added
!   mov$0x14,%esi!!  %esi overwritten!
mov%esi,%ecx %ecx loaded
repz movsl %ds:(%esi),%es:(%edi)

Apparently the compiled decided that the value 0x14 could be reused
afterwards (which it does for an inlined memset of the same size some
instructions below) and clobbers %esi.

Looking at the macro:

__asm__ __volatile__(

: =D (edi), =S (esi)
: 0 ((long) to),1 ((long) from)
: memory
);
}
if (n = 5*4) {
/* large block: use rep prefix */
int ecx;
__asm__ __volatile__(
rep ; movsl
: =c (ecx)

it seems obvious that the compiled assumes it can reuse %esi and %edi
for something else between the two __asm__ sections. These should
probably be merged.



signature.asc
Description: This is a digitally signed message part


Re: x86-64 preemption fix from IRQ and BKL in 2.6.12-rc1-mm2

2005-03-27 Thread Christophe Saout
Am Sonntag, den 27.03.2005, 19:26 +0200 schrieb Andi Kleen:

> > preempt_schedule_irq is not an i386 specific function and seems to take
> > special care of BKL preemption and since reiserfs does use the BKL to do
> > certain things I think this actually might be the problem...?
> 
> Hmm, preempt_schedule_irq is not in mainline as far as I can see.
> My patches are always for mainline; i dont do a special
> patch kit for -mm*

PREEMPT_BKL has been in mainline since 2.6.11-rc1,  preempt_schedule_irq
made it in 2.6.11-rc3. Please look here:
http://linux.bkbits.net:8080/linux-2.6/search/?expr=preempt_schedule_irq=ChangeSet+comments

For i386 the first change was to switch to preempt_schedule in this code
path: http://linux.bkbits.net:8080/linux-2.6/[EMAIL PROTECTED]

preempt_schedule takes care of setting PREEMPT_ACTIVE and resetting it
afterwards, so I removed that from the assembler code.

Then preempt_schedule_irq has been introduced to move the sti/cli back
around the call to schedule:
http://linux.bkbits.net:8080/linux-2.6/[EMAIL PROTECTED]

So in the end the only thing that the patch I proposed was doing is to
*additionally* handle the PREEMPT_BKL case so that schedule doesn't
accidentally release the BKL semaphore when it shouldn't because we are
preempting and nobody explicitly called schedule.

Several other archs have done the same. No bug has shown up until the
recent -mm kernel where the execution of this code path actually became
possible (the "jc -> jnc" fix some lines above).

> It looks like a unfortunate interaction with some other patches
> in mm. Andrew, can you disable CONFIG_PREEMPT on x86-64 in
> mm for now?

These things are in 2.6.11 (except that they never got called because of
the wrong interrupt flag check in the IRQ handler).

> > Unfortunately I don't have a amd64 machine to play with, so can somebody
> > please check this?
> 
> How did you generate the crash dumps above then?

Well, nobody minds if I play with a webserver in the middle of the
night, as long as it works during the day. Shoot me. :)

Both servers are running fine since I applied my patch last night.

Now that I looked into it I think that it's obviously the correct
solution.



signature.asc
Description: This is a digitally signed message part


inotify issue: iput called atomically

2005-03-27 Thread Christophe Saout
Hi Robert,

it looks like you shouldn't call iput with spinlocks held. iput might
call down into the filesystem to delete the inode and this can sleep.

Mar 27 14:38:18 server Debug: sleeping function called from invalid
context at include/asm/semaphore.h:102
Mar 27 14:38:18 server in_atomic():1, irqs_disabled():0
Mar 27 14:38:18 server [] dump_stack+0x17/0x20
Mar 27 14:38:18 server [] __might_sleep+0xa3/0xb0
Mar 27 14:38:18 server [] lock_kernel+0x2b/0x50
Mar 27 14:38:18 server [] reiserfs_delete_inode+0x13/0x100
Mar 27 14:38:18 server [] generic_delete_inode+0xa4/0x160
Mar 27 14:38:18 server [] iput+0x56/0x80
Mar 27 14:38:18 server [] remove_watch_no_event+0x8d/0x100
Mar 27 14:38:18 server [] inotify_ignore+0x49/0x90
Mar 27 14:38:18 server [] inotify_ioctl+0xcd/0x110
Mar 27 14:38:18 server [] do_ioctl+0x76/0x90
Mar 27 14:38:18 server [] vfs_ioctl+0x59/0x1c0
Mar 27 14:38:18 server [] sys_ioctl+0x39/0x60
Mar 27 14:38:18 server [] sysenter_past_esp+0x54/0x75

I've looked through the code and iput can usually be called after
releasing spinlocks. It doesn't look trivial to implement though.



signature.asc
Description: This is a digitally signed message part


inotify issue: iput called atomically

2005-03-27 Thread Christophe Saout
Hi Robert,

it looks like you shouldn't call iput with spinlocks held. iput might
call down into the filesystem to delete the inode and this can sleep.

Mar 27 14:38:18 server Debug: sleeping function called from invalid
context at include/asm/semaphore.h:102
Mar 27 14:38:18 server in_atomic():1, irqs_disabled():0
Mar 27 14:38:18 server [c0103e37] dump_stack+0x17/0x20
Mar 27 14:38:18 server [c0118973] __might_sleep+0xa3/0xb0
Mar 27 14:38:18 server [c03fc5ab] lock_kernel+0x2b/0x50
Mar 27 14:38:18 server [c019fc33] reiserfs_delete_inode+0x13/0x100
Mar 27 14:38:18 server [c0172c64] generic_delete_inode+0xa4/0x160
Mar 27 14:38:18 server [c0172f16] iput+0x56/0x80
Mar 27 14:38:18 server [c017f9bd] remove_watch_no_event+0x8d/0x100
Mar 27 14:38:18 server [c01805f9] inotify_ignore+0x49/0x90
Mar 27 14:38:18 server [c018070d] inotify_ioctl+0xcd/0x110
Mar 27 14:38:18 server [c016ac26] do_ioctl+0x76/0x90
Mar 27 14:38:18 server [c016adb9] vfs_ioctl+0x59/0x1c0
Mar 27 14:38:18 server [c016af59] sys_ioctl+0x39/0x60
Mar 27 14:38:18 server [c0102f6b] sysenter_past_esp+0x54/0x75

I've looked through the code and iput can usually be called after
releasing spinlocks. It doesn't look trivial to implement though.



signature.asc
Description: This is a digitally signed message part


Re: x86-64 preemption fix from IRQ and BKL in 2.6.12-rc1-mm2

2005-03-27 Thread Christophe Saout
Am Sonntag, den 27.03.2005, 19:26 +0200 schrieb Andi Kleen:

  preempt_schedule_irq is not an i386 specific function and seems to take
  special care of BKL preemption and since reiserfs does use the BKL to do
  certain things I think this actually might be the problem...?
 
 Hmm, preempt_schedule_irq is not in mainline as far as I can see.
 My patches are always for mainline; i dont do a special
 patch kit for -mm*

PREEMPT_BKL has been in mainline since 2.6.11-rc1,  preempt_schedule_irq
made it in 2.6.11-rc3. Please look here:
http://linux.bkbits.net:8080/linux-2.6/search/?expr=preempt_schedule_irqsearch=ChangeSet+comments

For i386 the first change was to switch to preempt_schedule in this code
path: http://linux.bkbits.net:8080/linux-2.6/[EMAIL PROTECTED]

preempt_schedule takes care of setting PREEMPT_ACTIVE and resetting it
afterwards, so I removed that from the assembler code.

Then preempt_schedule_irq has been introduced to move the sti/cli back
around the call to schedule:
http://linux.bkbits.net:8080/linux-2.6/[EMAIL PROTECTED]

So in the end the only thing that the patch I proposed was doing is to
*additionally* handle the PREEMPT_BKL case so that schedule doesn't
accidentally release the BKL semaphore when it shouldn't because we are
preempting and nobody explicitly called schedule.

Several other archs have done the same. No bug has shown up until the
recent -mm kernel where the execution of this code path actually became
possible (the jc - jnc fix some lines above).

 It looks like a unfortunate interaction with some other patches
 in mm. Andrew, can you disable CONFIG_PREEMPT on x86-64 in
 mm for now?

These things are in 2.6.11 (except that they never got called because of
the wrong interrupt flag check in the IRQ handler).

  Unfortunately I don't have a amd64 machine to play with, so can somebody
  please check this?
 
 How did you generate the crash dumps above then?

Well, nobody minds if I play with a webserver in the middle of the
night, as long as it works during the day. Shoot me. :)

Both servers are running fine since I applied my patch last night.

Now that I looked into it I think that it's obviously the correct
solution.



signature.asc
Description: This is a digitally signed message part


[PATCH] Fix preemption off of irq context on x86-64 with PREEMPT_BKL

2005-03-26 Thread Christophe Saout
Hi,

> x86_64-fix-config_preempt.patch
>   x86_64: Fix CONFIG_PREEMPT

This patch causes another bug to show up some lines below with
CONFIG_PREEMPT_BKL. schedule releases the BKL which it shouldn't do.

Call preempt_schedule_irq instead (like for i386). This seems to fix the
easily reproducible filesystem errors I've seen (with reiserfs, which
heavily relies on the BKL).

Signed-off-by: Christophe Saout <[EMAIL PROTECTED]>

--- linux-2.6.12-rc1-mm2.orig/arch/x86_64/kernel/entry.S2005-03-24 
17:32:22.0 +0100
+++ linux-2.6.12-rc1-mm2/arch/x86_64/kernel/entry.S 2005-03-26 
23:40:30.0 +0100
@@ -517,12 +517,7 @@
jnc  retint_restore_args
bt   $9,EFLAGS-ARGOFFSET(%rsp)  /* interrupts off? */
jnc  retint_restore_args
-   movl $PREEMPT_ACTIVE,threadinfo_preempt_count(%rcx)
-   sti
-   call schedule
-   cli
-   GET_THREAD_INFO(%rcx)
-   movl $0,threadinfo_preempt_count(%rcx) 
+   call preempt_schedule_irq
jmp exit_intr
 #endif 
CFI_ENDPROC



signature.asc
Description: This is a digitally signed message part


[PATCH] Fix preemption off of irq context on x86-64 with PREEMPT_BKL

2005-03-26 Thread Christophe Saout
Hi,

 x86_64-fix-config_preempt.patch
   x86_64: Fix CONFIG_PREEMPT

This patch causes another bug to show up some lines below with
CONFIG_PREEMPT_BKL. schedule releases the BKL which it shouldn't do.

Call preempt_schedule_irq instead (like for i386). This seems to fix the
easily reproducible filesystem errors I've seen (with reiserfs, which
heavily relies on the BKL).

Signed-off-by: Christophe Saout [EMAIL PROTECTED]

--- linux-2.6.12-rc1-mm2.orig/arch/x86_64/kernel/entry.S2005-03-24 
17:32:22.0 +0100
+++ linux-2.6.12-rc1-mm2/arch/x86_64/kernel/entry.S 2005-03-26 
23:40:30.0 +0100
@@ -517,12 +517,7 @@
jnc  retint_restore_args
bt   $9,EFLAGS-ARGOFFSET(%rsp)  /* interrupts off? */
jnc  retint_restore_args
-   movl $PREEMPT_ACTIVE,threadinfo_preempt_count(%rcx)
-   sti
-   call schedule
-   cli
-   GET_THREAD_INFO(%rcx)
-   movl $0,threadinfo_preempt_count(%rcx) 
+   call preempt_schedule_irq
jmp exit_intr
 #endif 
CFI_ENDPROC



signature.asc
Description: This is a digitally signed message part


x86-64 preemption fix from IRQ and BKL in 2.6.12-rc1-mm2

2005-03-25 Thread Christophe Saout
Hi,

> +x86_64-fix-config_preempt.patch
>
> x86_64-fix-config_preempt.patch
>   x86_64: Fix CONFIG_PREEMPT

Has this one been stress-tested?

I've got the impression that things have become a lot worse.

I've been seeing things like these:

Mar 25 01:00:48 websrv2 REISERFS: panic (device dm-1): clm-6000: do_balance, fs 
generation has changed
Mar 25 01:00:48 websrv2
Mar 25 01:00:48 websrv2 --- [cut here ] - [please bite here ] 
-
Mar 25 01:00:48 websrv2 Kernel BUG at prints:362
Mar 25 01:00:48 websrv2 invalid operand:  [1] PREEMPT
Mar 25 01:00:48 websrv2 CPU 0
Mar 25 01:00:48 websrv2 Modules linked in: iptable_nat ipt_MARK iptable_mangle 
ipt_LOG ipt_multiport ipt_owner ipt_mark ipt_state ipt_REJECT iptable_filter 
ip_tables twofish serpent blowfish ext3 jbd reiser4 sha256 aes dm_crypt 
ip_conntrack_irc ip_conntrack_ftp ip_conntrack via_rhine 8139too crc32
Mar 25 01:00:48 websrv2 Pid: 25172, comm: rm Not tainted 2.6.12-rc1-cs1
Mar 25 01:00:48 websrv2 RIP: 0010:[] 
{reiserfs_panic+211}
Mar 25 01:00:48 websrv2 RSP: 0018:81001efe37b8  EFLAGS: 00010292
Mar 25 01:00:48 websrv2 RAX: 0059 RBX: 803fbcac RCX: 
c100
Mar 25 01:00:48 websrv2 RDX:  RSI: 81007d0b31f0 RDI: 

Mar 25 01:00:48 websrv2 RBP: 81004f960060 R08: 81001efe2000 R09: 
0002
Mar 25 01:00:48 websrv2 R10:  R11: 80340ef0 R12: 
81007f850230
Mar 25 01:00:48 websrv2 R13: 81007f85 R14:  R15: 
81004f9565d0
Mar 25 01:00:48 websrv2 FS:  2aabaae0() GS:805be800() 
knlGS:55563dc0
Mar 25 01:00:48 websrv2 CS:  0010 DS:  ES:  CR0: 8005003b
Mar 25 01:00:48 websrv2 CR2: 2aaff008 CR3: 1ebbd000 CR4: 
06e0
Mar 25 01:00:48 websrv2 Process rm (pid: 25172, threadinfo 81001efe2000, 
task 81007d0b31f0)
Mar 25 01:00:48 websrv2 Stack: 00300010 81001efe38a8 
81001efe37d8 81001c041530
Mar 25 01:00:48 websrv2 81001efe39d8 801d4e42 81007e659a00 
0063
Mar 25 01:00:48 websrv2 0063 
Mar 25 01:00:48 websrv2 Call Trace:{pathrelse_and_restore+66} 
{retint_kernel+46}
Mar 25 01:00:48 websrv2 {do_balance+39} 
{do_balance+6901}
Mar 25 01:00:48 websrv2 {unfix_nodes+128} 
{do_balance+10555}
Mar 25 01:00:48 websrv2 {reiserfs_cut_from_item+1673} 
{reiserfs_unlink+362}
Mar 25 01:00:48 websrv2 {vfs_unlink+462} 
{sys_unlink+233}
Mar 25 01:00:48 websrv2 {sys_getdents+232} 
{error_exit+0}
Mar 25 01:00:48 websrv2 {system_call+126}
Mar 25 01:00:48 websrv2
Mar 25 01:00:48 websrv2 Code: 0f 0b b8 c1 3f 80 ff ff ff ff 6a 01 4d 85 ed 48 
c7 c2 40 ba
Mar 25 01:00:48 websrv2 RIP {reiserfs_panic+211} RSP 


or

Mar 25 16:39:21 websrv2 VFS: brelse: Trying to free free buffer
Mar 25 16:39:21 websrv2 Badness in __brelse at fs/buffer.c:1295
Mar 25 16:39:21 websrv2
Mar 25 16:39:21 websrv2 Call Trace:{__find_get_block+479} 
{__getblk+37}
Mar 25 16:39:21 websrv2 {do_journal_end+2181} 
{keventd_create_kthread+0}
Mar 25 16:39:21 websrv2 {reiserfs_sync_fs+64} 
{sync_supers+211}
Mar 25 16:39:21 websrv2 {wb_kupdate+42} 
{pdflush+399}
Mar 25 16:39:21 websrv2 {wb_kupdate+0} 
{keventd_create_kthread+0}
Mar 25 16:39:21 websrv2 {pdflush+0} 
{kthread+205}
Mar 25 16:39:21 websrv2 {child_rip+8} 
{keventd_create_kthread+0}
Mar 25 16:39:21 websrv2 {kthread+0} 
{child_rip+0}

Fortunately the kernel locked up and there was no data corruption.

I've got PREEMPT and PREEMPT_BKL enabled under UP.

I just took a look at the change and found this:

x86-64 does this (in entry.S):

bt   $9,EFLAGS-ARGOFFSET(%rsp)  /* interrupts off? */
jnc   retint_restore_args
movl $PREEMPT_ACTIVE,threadinfo_preempt_count(%rcx)
sti
call schedule
cli
GET_THREAD_INFO(%rcx)
movl $0,threadinfo_preempt_count(%rcx)
jmp exit_intr

while i386 does this:

testl $IF_MASK,EFLAGS(%esp) # interrupts off (exception path) ?
jz restore_all
call preempt_schedule_irq
jmp need_resched

preempt_schedule_irq is not an i386 specific function and seems to take
special care of BKL preemption and since reiserfs does use the BKL to do
certain things I think this actually might be the problem...?

I'm not saying that this fix is wrong (it is obviously the right fix)
but it causes another problem to show up.

Unfortunately I don't have a amd64 machine to play with, so can somebody
please check this?



signature.asc
Description: This is a digitally signed message part


x86-64 preemption fix from IRQ and BKL in 2.6.12-rc1-mm2

2005-03-25 Thread Christophe Saout
Hi,

 +x86_64-fix-config_preempt.patch

 x86_64-fix-config_preempt.patch
   x86_64: Fix CONFIG_PREEMPT

Has this one been stress-tested?

I've got the impression that things have become a lot worse.

I've been seeing things like these:

Mar 25 01:00:48 websrv2 REISERFS: panic (device dm-1): clm-6000: do_balance, fs 
generation has changed
Mar 25 01:00:48 websrv2
Mar 25 01:00:48 websrv2 --- [cut here ] - [please bite here ] 
-
Mar 25 01:00:48 websrv2 Kernel BUG at prints:362
Mar 25 01:00:48 websrv2 invalid operand:  [1] PREEMPT
Mar 25 01:00:48 websrv2 CPU 0
Mar 25 01:00:48 websrv2 Modules linked in: iptable_nat ipt_MARK iptable_mangle 
ipt_LOG ipt_multiport ipt_owner ipt_mark ipt_state ipt_REJECT iptable_filter 
ip_tables twofish serpent blowfish ext3 jbd reiser4 sha256 aes dm_crypt 
ip_conntrack_irc ip_conntrack_ftp ip_conntrack via_rhine 8139too crc32
Mar 25 01:00:48 websrv2 Pid: 25172, comm: rm Not tainted 2.6.12-rc1-cs1
Mar 25 01:00:48 websrv2 RIP: 0010:[801cfe13] 
801cfe13{reiserfs_panic+211}
Mar 25 01:00:48 websrv2 RSP: 0018:81001efe37b8  EFLAGS: 00010292
Mar 25 01:00:48 websrv2 RAX: 0059 RBX: 803fbcac RCX: 
c100
Mar 25 01:00:48 websrv2 RDX:  RSI: 81007d0b31f0 RDI: 

Mar 25 01:00:48 websrv2 RBP: 81004f960060 R08: 81001efe2000 R09: 
0002
Mar 25 01:00:48 websrv2 R10:  R11: 80340ef0 R12: 
81007f850230
Mar 25 01:00:48 websrv2 R13: 81007f85 R14:  R15: 
81004f9565d0
Mar 25 01:00:48 websrv2 FS:  2aabaae0() GS:805be800() 
knlGS:55563dc0
Mar 25 01:00:48 websrv2 CS:  0010 DS:  ES:  CR0: 8005003b
Mar 25 01:00:48 websrv2 CR2: 2aaff008 CR3: 1ebbd000 CR4: 
06e0
Mar 25 01:00:48 websrv2 Process rm (pid: 25172, threadinfo 81001efe2000, 
task 81007d0b31f0)
Mar 25 01:00:48 websrv2 Stack: 00300010 81001efe38a8 
81001efe37d8 81001c041530
Mar 25 01:00:48 websrv2 81001efe39d8 801d4e42 81007e659a00 
0063
Mar 25 01:00:48 websrv2 0063 
Mar 25 01:00:48 websrv2 Call Trace:801d4e42{pathrelse_and_restore+66} 
8010efe6{retint_kernel+46}
Mar 25 01:00:48 websrv2 801bb847{do_balance+39} 
801bd315{do_balance+6901}
Mar 25 01:00:48 websrv2 801cbd90{unfix_nodes+128} 
801be15b{do_balance+10555}
Mar 25 01:00:48 websrv2 801d7bf9{reiserfs_cut_from_item+1673} 
801bfcfa{reiserfs_unlink+362}
Mar 25 01:00:48 websrv2 801873ae{vfs_unlink+462} 
801874f9{sys_unlink+233}
Mar 25 01:00:48 websrv2 8018a268{sys_getdents+232} 
8010f221{error_exit+0}
Mar 25 01:00:48 websrv2 8010e906{system_call+126}
Mar 25 01:00:48 websrv2
Mar 25 01:00:48 websrv2 Code: 0f 0b b8 c1 3f 80 ff ff ff ff 6a 01 4d 85 ed 48 
c7 c2 40 ba
Mar 25 01:00:48 websrv2 RIP 801cfe13{reiserfs_panic+211} RSP 
81001efe37b8

or

Mar 25 16:39:21 websrv2 VFS: brelse: Trying to free free buffer
Mar 25 16:39:21 websrv2 Badness in __brelse at fs/buffer.c:1295
Mar 25 16:39:21 websrv2
Mar 25 16:39:21 websrv2 Call Trace:8017787f{__find_get_block+479} 
8017a175{__getblk+37}
Mar 25 16:39:21 websrv2 801de3d5{do_journal_end+2181} 
80147d70{keventd_create_kthread+0}
Mar 25 16:39:21 websrv2 801cbf50{reiserfs_sync_fs+64} 
8017c0b3{sync_supers+211}
Mar 25 16:39:21 websrv2 8015a22a{wb_kupdate+42} 
8015ae8f{pdflush+399}
Mar 25 16:39:21 websrv2 8015a200{wb_kupdate+0} 
80147d70{keventd_create_kthread+0}
Mar 25 16:39:21 websrv2 8015ad00{pdflush+0} 
80147d2d{kthread+205}
Mar 25 16:39:21 websrv2 8010f3d7{child_rip+8} 
80147d70{keventd_create_kthread+0}
Mar 25 16:39:21 websrv2 80147c60{kthread+0} 
8010f3cf{child_rip+0}

Fortunately the kernel locked up and there was no data corruption.

I've got PREEMPT and PREEMPT_BKL enabled under UP.

I just took a look at the change and found this:

x86-64 does this (in entry.S):

bt   $9,EFLAGS-ARGOFFSET(%rsp)  /* interrupts off? */
jnc   retint_restore_args
movl $PREEMPT_ACTIVE,threadinfo_preempt_count(%rcx)
sti
call schedule
cli
GET_THREAD_INFO(%rcx)
movl $0,threadinfo_preempt_count(%rcx)
jmp exit_intr

while i386 does this:

testl $IF_MASK,EFLAGS(%esp) # interrupts off (exception path) ?
jz restore_all
call preempt_schedule_irq
jmp need_resched

preempt_schedule_irq is not an i386 specific function and seems to take
special care of BKL preemption and since reiserfs does use the BKL to do
certain things I think this actually might be the problem...?

I'm not saying that this fix is wrong (it is obviously the right fix)
but it causes another problem to show up.

Unfortunately I don't have 

Re: [0/many] Acrypto - asynchronous crypto layer for linux kernel 2.6

2005-03-10 Thread Christophe Saout
Am Dienstag, den 08.03.2005, 00:08 -0500 schrieb Kyle Moffett:

> Did you include support for the new key/keyring infrastructure 
> introduced
> a couple versions ago by David Howells?  It allows userspace to create 
> and
> manage various sorts of "keys" in kernelspace.  If you create and 
> register
> a few keytypes for various symmetric and asymmetric ciphers, you could 
> then
> take advantage of its support for securely passing keys around in and 
> out
> of userspace.

I've written a dm-crypt patch some weeks ago that does what you
describe. The crypto information (cipher and key) is added to a keyring
and then the device is constructed using a reference to this key.

I had some issues with the keyring code (mainly a deadlock problem with
crypto module autoloading): http://lkml.org/lkml/2005/2/4/113

I would also like to switch dm-crypt to acrypto once it's accepted into
the kernel.



signature.asc
Description: Dies ist ein digital signierter Nachrichtenteil


Re: [0/many] Acrypto - asynchronous crypto layer for linux kernel 2.6

2005-03-10 Thread Christophe Saout
Am Dienstag, den 08.03.2005, 00:08 -0500 schrieb Kyle Moffett:

 Did you include support for the new key/keyring infrastructure 
 introduced
 a couple versions ago by David Howells?  It allows userspace to create 
 and
 manage various sorts of keys in kernelspace.  If you create and 
 register
 a few keytypes for various symmetric and asymmetric ciphers, you could 
 then
 take advantage of its support for securely passing keys around in and 
 out
 of userspace.

I've written a dm-crypt patch some weeks ago that does what you
describe. The crypto information (cipher and key) is added to a keyring
and then the device is constructed using a reference to this key.

I had some issues with the keyring code (mainly a deadlock problem with
crypto module autoloading): http://lkml.org/lkml/2005/2/4/113

I would also like to switch dm-crypt to acrypto once it's accepted into
the kernel.



signature.asc
Description: Dies ist ein digital signierter Nachrichtenteil


Re: [PATCH 1/8] lib/sort: Heapsort implementation of sort()

2005-03-01 Thread Christophe Saout
Am Sonntag, den 27.02.2005, 13:25 -0800 schrieb Matt Mackall:

> Which kernel? There was an off-by-one for odd array sizes in the
> original posted version that was quickly spotted:
> 
> http://www.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.11-rc4/2.6.11-rc4-mm1/broken-out/sort-fix.patch
> 
> I've since tested all sizes 1 - 1000 with 100 random arrays each, so
> I'm fairly confident it's now fixed.

-   int i = (num/2 - 1) * size, n = num * size, c, r;
+   int i = (num/2) * size, n = num * size, c, r;

What's probably meant is: ((num - 1)/2) * size

`i' must cover half of the array or a little bit more (not less, in case
of odd numbers). `i' (before my patch) is the highest address to start
with, so that's why it should be ((num + 1)/2 - 1) * size or simpler:
((num - 1)/2) * size

Anyway, I was wondering, is there a specific reason you are not using
size_t for the offset variables? size is a size_t and the only purpose
of the variables i, n, c and r is to be compared or added to the start
pointer (also I think it's just ugly to cast size_t down to an int).

On system where int is 32 bit but pointers are 64 bit the compiler might
need to extend to adjust the size of the operands for the address
calculation. Right?

Since size_t is unsigned I also had to modify the two loops since I
can't check for any of the variables becoming negative. Tested with all
kinds of array sizes.

Signed-off-by: Christophe Saout <[EMAIL PROTECTED]>

diff -Nur linux-2.6.11-rc4-mm1.orig/include/linux/sort.h 
linux-2.6.11-rc4-mm1/include/linux/sort.h
--- linux-2.6.11-rc4-mm1.orig/include/linux/sort.h  2005-03-01 
19:45:11.0 +0100
+++ linux-2.6.11-rc4-mm1/include/linux/sort.h   2005-03-01 19:47:13.456097896 
+0100
@@ -5,6 +5,6 @@
 
 void sort(void *base, size_t num, size_t size,
  int (*cmp)(const void *, const void *),
- void (*swap)(void *, void *, int));
+ void (*swap)(void *, void *, size_t));
 
 #endif
diff -Nur linux-2.6.11-rc4-mm1.orig/lib/sort.c linux-2.6.11-rc4-mm1/lib/sort.c
--- linux-2.6.11-rc4-mm1.orig/lib/sort.c2005-03-01 19:46:05.0 
+0100
+++ linux-2.6.11-rc4-mm1/lib/sort.c 2005-03-01 19:47:55.688677568 +0100
@@ -7,14 +7,14 @@
 #include 
 #include 
 
-void u32_swap(void *a, void *b, int size)
+void u32_swap(void *a, void *b, size_t size)
 {
u32 t = *(u32 *)a;
*(u32 *)a = *(u32 *)b;
*(u32 *)b = t;
 }
 
-void generic_swap(void *a, void *b, int size)
+void generic_swap(void *a, void *b, size_t size)
 {
char t;
 
@@ -44,17 +44,19 @@
 
 void sort(void *base, size_t num, size_t size,
  int (*cmp)(const void *, const void *),
- void (*swap)(void *, void *, int size))
+ void (*swap)(void *, void *, size_t size))
 {
/* pre-scale counters for performance */
-   int i = (num/2) * size, n = num * size, c, r;
+   size_t i = ((num + 1)/2) * size, n = num * size, c, r;
 
if (!swap)
swap = (size == 4 ? u32_swap : generic_swap);
 
/* heapify */
-   for ( ; i >= 0; i -= size) {
-   for (r = i; r * 2 < n; r  = c) {
+   while(i > 0) {
+   i -= size;
+
+   for (r = i; r * 2 < n; r = c) {
c = r * 2;
if (c < n - size && cmp(base + c, base + c + size) < 0)
c += size;
@@ -65,7 +67,9 @@
}
 
/* sort */
-   for (i = n - size; i >= 0; i -= size) {
+   for (i = n; i > 0;) {
+   i -= size;
+
swap(base, base + i, size);
for (r = 0; r * 2 < i; r = c) {
c = r * 2;



signature.asc
Description: Dies ist ein digital signierter Nachrichtenteil


Re: [PATCH 1/8] lib/sort: Heapsort implementation of sort()

2005-03-01 Thread Christophe Saout
Am Sonntag, den 27.02.2005, 13:25 -0800 schrieb Matt Mackall:

 Which kernel? There was an off-by-one for odd array sizes in the
 original posted version that was quickly spotted:
 
 http://www.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.11-rc4/2.6.11-rc4-mm1/broken-out/sort-fix.patch
 
 I've since tested all sizes 1 - 1000 with 100 random arrays each, so
 I'm fairly confident it's now fixed.

-   int i = (num/2 - 1) * size, n = num * size, c, r;
+   int i = (num/2) * size, n = num * size, c, r;

What's probably meant is: ((num - 1)/2) * size

`i' must cover half of the array or a little bit more (not less, in case
of odd numbers). `i' (before my patch) is the highest address to start
with, so that's why it should be ((num + 1)/2 - 1) * size or simpler:
((num - 1)/2) * size

Anyway, I was wondering, is there a specific reason you are not using
size_t for the offset variables? size is a size_t and the only purpose
of the variables i, n, c and r is to be compared or added to the start
pointer (also I think it's just ugly to cast size_t down to an int).

On system where int is 32 bit but pointers are 64 bit the compiler might
need to extend to adjust the size of the operands for the address
calculation. Right?

Since size_t is unsigned I also had to modify the two loops since I
can't check for any of the variables becoming negative. Tested with all
kinds of array sizes.

Signed-off-by: Christophe Saout [EMAIL PROTECTED]

diff -Nur linux-2.6.11-rc4-mm1.orig/include/linux/sort.h 
linux-2.6.11-rc4-mm1/include/linux/sort.h
--- linux-2.6.11-rc4-mm1.orig/include/linux/sort.h  2005-03-01 
19:45:11.0 +0100
+++ linux-2.6.11-rc4-mm1/include/linux/sort.h   2005-03-01 19:47:13.456097896 
+0100
@@ -5,6 +5,6 @@
 
 void sort(void *base, size_t num, size_t size,
  int (*cmp)(const void *, const void *),
- void (*swap)(void *, void *, int));
+ void (*swap)(void *, void *, size_t));
 
 #endif
diff -Nur linux-2.6.11-rc4-mm1.orig/lib/sort.c linux-2.6.11-rc4-mm1/lib/sort.c
--- linux-2.6.11-rc4-mm1.orig/lib/sort.c2005-03-01 19:46:05.0 
+0100
+++ linux-2.6.11-rc4-mm1/lib/sort.c 2005-03-01 19:47:55.688677568 +0100
@@ -7,14 +7,14 @@
 #include linux/kernel.h
 #include linux/module.h
 
-void u32_swap(void *a, void *b, int size)
+void u32_swap(void *a, void *b, size_t size)
 {
u32 t = *(u32 *)a;
*(u32 *)a = *(u32 *)b;
*(u32 *)b = t;
 }
 
-void generic_swap(void *a, void *b, int size)
+void generic_swap(void *a, void *b, size_t size)
 {
char t;
 
@@ -44,17 +44,19 @@
 
 void sort(void *base, size_t num, size_t size,
  int (*cmp)(const void *, const void *),
- void (*swap)(void *, void *, int size))
+ void (*swap)(void *, void *, size_t size))
 {
/* pre-scale counters for performance */
-   int i = (num/2) * size, n = num * size, c, r;
+   size_t i = ((num + 1)/2) * size, n = num * size, c, r;
 
if (!swap)
swap = (size == 4 ? u32_swap : generic_swap);
 
/* heapify */
-   for ( ; i = 0; i -= size) {
-   for (r = i; r * 2  n; r  = c) {
+   while(i  0) {
+   i -= size;
+
+   for (r = i; r * 2  n; r = c) {
c = r * 2;
if (c  n - size  cmp(base + c, base + c + size)  0)
c += size;
@@ -65,7 +67,9 @@
}
 
/* sort */
-   for (i = n - size; i = 0; i -= size) {
+   for (i = n; i  0;) {
+   i -= size;
+
swap(base, base + i, size);
for (r = 0; r * 2  i; r = c) {
c = r * 2;



signature.asc
Description: Dies ist ein digital signierter Nachrichtenteil


Re: [PATCH 01/04] Adding cipher mode context information to crypto_tfm

2005-02-09 Thread Christophe Saout
Am Mittwoch, den 09.02.2005, 17:19 -0800 schrieb Andrew Morton:

> > It must be
> >  possible to process more than 2 mappings in softirq context.
> 
> Adding a few more fixmap slots wouldn't hurt anyone.  But if you want an
> arbitrarily large number of them then no, we cannot do that.
> 
> Possibly one could arrange for the pages to not be in highmem at all.

In the case of the LRW use in dm-crypt only plain- and ciphertext need
to be accessible through struct page (both are addressed through BIO
vectors). The LRW tweaks could simply be kmalloced. So instead of
passing "struct scatterlist *tweaksg" we could just pass a "void
*tweaksg".

Some of the hard work on the generic scatterlist walker would be for
nothing then. :(



signature.asc
Description: Dies ist ein digital signierter Nachrichtenteil


Re: [PATCH 01/04] Adding cipher mode context information to crypto_tfm

2005-02-09 Thread Christophe Saout
Am Mittwoch, den 09.02.2005, 17:19 -0800 schrieb Andrew Morton:

  It must be
   possible to process more than 2 mappings in softirq context.
 
 Adding a few more fixmap slots wouldn't hurt anyone.  But if you want an
 arbitrarily large number of them then no, we cannot do that.
 
 Possibly one could arrange for the pages to not be in highmem at all.

In the case of the LRW use in dm-crypt only plain- and ciphertext need
to be accessible through struct page (both are addressed through BIO
vectors). The LRW tweaks could simply be kmalloced. So instead of
passing struct scatterlist *tweaksg we could just pass a void
*tweaksg.

Some of the hard work on the generic scatterlist walker would be for
nothing then. :(



signature.asc
Description: Dies ist ein digital signierter Nachrichtenteil


Re: [dm-crypt] Re: dm-crypt crypt_status reports key?

2005-02-04 Thread Christophe Saout
Am Mittwoch, den 02.02.2005, 20:05 -0800 schrieb Matt Mackall:

> Dunno here, seems that having one tool that gave the kernel a key named
> "foo" and then telling dm-crypt to use key "foo" is probably not a bad
> way to go. Then we don't have stuff like "echo  | dmsetup create"
> and the like and the key-handling smarts can all be put in one
> separate place.
> 
> Getting from here to there might be interesting though. Perhaps we can
> teach dm-crypt to understand keys of the form "keyname:"? in
> addition to raw keys to keep compatibility. Might even be possible to
> push this down into crypt_decode_key() (or a smarter variant of same).

I've hacked together a working implementation. It's just a prototype
though. The patch requires that the key retention service is activated
unconditionally. It would probably be better to introduce #ifdefs and
the possibility turn of the cleartext key way in the kernel
configuration.

I've separated out the crypto configuration from the device-mapper
configuration. Several device mappings can now share/reference the same
key. This might be useful if you want to span your mapping over several
devices (without stacking device mappings on top of each other). So
basically the crypto and the device-mapper part of the configuration are
in its own "classes" now. It makes even more sense now to rename IV
operations to crypto/key operations (which only work on the new
structure) like it's done in the LRW patch.

This crypto configuration that I separated out can be referenced by the
device-mapper target directly (old way) or indirectly by the new key
retention service. It holds the cipher, chaining mode and IV mechanisms
as well as the key itself.

A dm table configuration line then looks like "0 100 crypt key
my_key_name 0 /dev/bla 0".

And dm-crypt.c is slowly getting large. Thirty functions already. Hmm.

--- linux-2.6.11-rc3.orig/drivers/md/dm-crypt.c	2005-02-04 01:00:51.983910752 +0100
+++ linux-2.6.11-rc3/drivers/md/dm-crypt.c	2005-02-04 14:30:47.629117856 +0100
@@ -13,6 +13,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -48,14 +49,25 @@
 	int write;
 };
 
-struct crypt_config;
+/*
+ * key and associated crypto information
+ */
+struct crypt_key {
+	struct crypt_iv_operations *iv_gen_ops;
+	char *iv_mode;
+	void *iv_gen_private;
+	unsigned int iv_size;
+
+	struct crypto_tfm *tfm;
+	unsigned int key_size;
+	u8 key[0];
+};
 
 struct crypt_iv_operations {
-	int (*ctr)(struct crypt_config *cc, struct dm_target *ti,
-	   const char *opts);
-	void (*dtr)(struct crypt_config *cc);
-	const char *(*status)(struct crypt_config *cc);
-	int (*generator)(struct crypt_config *cc, u8 *iv, sector_t sector);
+	int (*ctr)(struct crypt_key *ck, const char *opts, char **error_msg);
+	void (*dtr)(struct crypt_key *ck);
+	const char *(*status)(struct crypt_key *ck);
+	int (*generator)(struct crypt_key *ck, u8 *iv, sector_t sector);
 };
 
 /*
@@ -73,18 +85,9 @@
 	mempool_t *io_pool;
 	mempool_t *page_pool;
 
-	/*
-	 * crypto related data
-	 */
-	struct crypt_iv_operations *iv_gen_ops;
-	char *iv_mode;
-	void *iv_gen_private;
 	sector_t iv_offset;
-	unsigned int iv_size;
-
-	struct crypto_tfm *tfm;
-	unsigned int key_size;
-	u8 key[0];
+	struct crypt_key *crypto;
+	struct key *key;
 };
 
 #define MIN_IOS256
@@ -121,16 +124,16 @@
  * http://article.gmane.org/gmane.linux.kernel.device-mapper.dm-crypt/454
  */
 
-static int crypt_iv_plain_gen(struct crypt_config *cc, u8 *iv, sector_t sector)
+static int crypt_iv_plain_gen(struct crypt_key *ck, u8 *iv, sector_t sector)
 {
-	memset(iv, 0, cc->iv_size);
+	memset(iv, 0, ck->iv_size);
 	*(u32 *)iv = cpu_to_le32(sector & 0x);
 
 	return 0;
 }
 
-static int crypt_iv_essiv_ctr(struct crypt_config *cc, struct dm_target *ti,
-	  const char *opts)
+static int crypt_iv_essiv_ctr(struct crypt_key *ck, const char *opts,
+  char **error_msg)
 {
 	struct crypto_tfm *essiv_tfm;
 	struct crypto_tfm *hash_tfm;
@@ -139,19 +142,19 @@
 	u8 *salt;
 
 	if (opts == NULL) {
-		ti->error = PFX "Digest algorithm missing for ESSIV mode";
+		*error_msg = PFX "Digest algorithm missing for ESSIV mode";
 		return -EINVAL;
 	}
 
 	/* Hash the cipher key with the given hash algorithm */
 	hash_tfm = crypto_alloc_tfm(opts, 0);
 	if (hash_tfm == NULL) {
-		ti->error = PFX "Error initializing ESSIV hash";
+		*error_msg = PFX "Error initializing ESSIV hash";
 		return -EINVAL;
 	}
 
 	if (crypto_tfm_alg_type(hash_tfm) != CRYPTO_ALG_TYPE_DIGEST) {
-		ti->error = PFX "Expected digest algorithm for ESSIV hash";
+		*error_msg = PFX "Expected digest algorithm for ESSIV hash";
 		crypto_free_tfm(hash_tfm);
 		return -EINVAL;
 	}
@@ -159,63 +162,63 @@
 	saltsize = crypto_tfm_alg_digestsize(hash_tfm);
 	salt = kmalloc(saltsize, GFP_KERNEL);
 	if (salt == NULL) {
-		ti->error = PFX "Error kmallocing salt storage in ESSIV";
+		*error_msg = PFX "Error kmallocing salt storage in ESSIV";
 	

Questions about the Linux key retention services (and dm-crypt)

2005-02-04 Thread Christophe Saout
Hi,

I was investigating a way to hide the dm-crypt key from device-mapper
configuration IOCTLs since the key might accidentally end up somewhere
it shouldn't (see other thread).

Then I stumbled across the new key retention service. This is exectly
what I was looking for.

The idea is to add the crypto configuration as a key and then give the
device-mapper a reference to that key. So everybody can then safely read
and copy the device-mapper configuration without risking to compromise
the key.

I just hacked something up which seems to work, just a prototype though.
(If someone wants to have a look at it:
http://www.saout.de/misc/dm-crypt-key-retention-v1.diff ).

There are some minor issues and questions:

I'd like to create the crypto tfm in the instantiation function (to
verify the validity of the cipher and key immediately). Cryptoapi might
call modprobe to load a cipher. Then it deadlocks. The reason is the
instantiation semaphore (kernel tries to create a keyring for modprobe).
Can we somehow get rid of it?

The other question I have is how should I describe references to the
key. My idea was to use its description. The userspace application adds
the key to one of its keyrings and tells dm-crypt its name. dm-crypt
calls search_process_keyrings to retrieve the key.

The alternative would be to use the key's serial id. What do you think
(as designer of the API) would be the better solution?

What I like is the key refcounting. The process creating the dm-crypt
mapping can put the key in its process keyring and when it exits the
keyring is destroyed, so that the key is "floating" (held by dm-crypt).

Once the dm-crypt mapping is removed the key is also destroy. No risk of
having unused keys hanging around.

The problem is that if an application wants to modify the dm-crypt
mapping is that it needs to get a reference to the key:

With the serial id it would be easy, it just links the key to one of its
keyrings and can then destroy the dm-crypt mapping without having to
worry that the keys gets destroyed.

But if I reference the key by its description, dm-crypt would need to
link the key to one of the caller's (which asks dm-crypt for the key)
keyrings itself so that the caller can then find the key. This sounds a
bit ugly to me. I'd prefer to not keep the keys in the root user keyring
to make sure the key gets destroyed when nobody is using it.

So which solution is better? Should I use the key description or serial
id for references? The downside of using the serial id would be that any
application could reference any key, even keys it does not own. The
problem with using the description is described above.



signature.asc
Description: Dies ist ein digital signierter Nachrichtenteil


Questions about the Linux key retention services (and dm-crypt)

2005-02-04 Thread Christophe Saout
Hi,

I was investigating a way to hide the dm-crypt key from device-mapper
configuration IOCTLs since the key might accidentally end up somewhere
it shouldn't (see other thread).

Then I stumbled across the new key retention service. This is exectly
what I was looking for.

The idea is to add the crypto configuration as a key and then give the
device-mapper a reference to that key. So everybody can then safely read
and copy the device-mapper configuration without risking to compromise
the key.

I just hacked something up which seems to work, just a prototype though.
(If someone wants to have a look at it:
http://www.saout.de/misc/dm-crypt-key-retention-v1.diff ).

There are some minor issues and questions:

I'd like to create the crypto tfm in the instantiation function (to
verify the validity of the cipher and key immediately). Cryptoapi might
call modprobe to load a cipher. Then it deadlocks. The reason is the
instantiation semaphore (kernel tries to create a keyring for modprobe).
Can we somehow get rid of it?

The other question I have is how should I describe references to the
key. My idea was to use its description. The userspace application adds
the key to one of its keyrings and tells dm-crypt its name. dm-crypt
calls search_process_keyrings to retrieve the key.

The alternative would be to use the key's serial id. What do you think
(as designer of the API) would be the better solution?

What I like is the key refcounting. The process creating the dm-crypt
mapping can put the key in its process keyring and when it exits the
keyring is destroyed, so that the key is floating (held by dm-crypt).

Once the dm-crypt mapping is removed the key is also destroy. No risk of
having unused keys hanging around.

The problem is that if an application wants to modify the dm-crypt
mapping is that it needs to get a reference to the key:

With the serial id it would be easy, it just links the key to one of its
keyrings and can then destroy the dm-crypt mapping without having to
worry that the keys gets destroyed.

But if I reference the key by its description, dm-crypt would need to
link the key to one of the caller's (which asks dm-crypt for the key)
keyrings itself so that the caller can then find the key. This sounds a
bit ugly to me. I'd prefer to not keep the keys in the root user keyring
to make sure the key gets destroyed when nobody is using it.

So which solution is better? Should I use the key description or serial
id for references? The downside of using the serial id would be that any
application could reference any key, even keys it does not own. The
problem with using the description is described above.



signature.asc
Description: Dies ist ein digital signierter Nachrichtenteil


Re: [dm-crypt] Re: dm-crypt crypt_status reports key?

2005-02-04 Thread Christophe Saout
Am Mittwoch, den 02.02.2005, 20:05 -0800 schrieb Matt Mackall:

 Dunno here, seems that having one tool that gave the kernel a key named
 foo and then telling dm-crypt to use key foo is probably not a bad
 way to go. Then we don't have stuff like echo key | dmsetup create
 and the like and the key-handling smarts can all be put in one
 separate place.
 
 Getting from here to there might be interesting though. Perhaps we can
 teach dm-crypt to understand keys of the form keyname:foo? in
 addition to raw keys to keep compatibility. Might even be possible to
 push this down into crypt_decode_key() (or a smarter variant of same).

I've hacked together a working implementation. It's just a prototype
though. The patch requires that the key retention service is activated
unconditionally. It would probably be better to introduce #ifdefs and
the possibility turn of the cleartext key way in the kernel
configuration.

I've separated out the crypto configuration from the device-mapper
configuration. Several device mappings can now share/reference the same
key. This might be useful if you want to span your mapping over several
devices (without stacking device mappings on top of each other). So
basically the crypto and the device-mapper part of the configuration are
in its own classes now. It makes even more sense now to rename IV
operations to crypto/key operations (which only work on the new
structure) like it's done in the LRW patch.

This crypto configuration that I separated out can be referenced by the
device-mapper target directly (old way) or indirectly by the new key
retention service. It holds the cipher, chaining mode and IV mechanisms
as well as the key itself.

A dm table configuration line then looks like 0 100 crypt key
my_key_name 0 /dev/bla 0.

And dm-crypt.c is slowly getting large. Thirty functions already. Hmm.

--- linux-2.6.11-rc3.orig/drivers/md/dm-crypt.c	2005-02-04 01:00:51.983910752 +0100
+++ linux-2.6.11-rc3/drivers/md/dm-crypt.c	2005-02-04 14:30:47.629117856 +0100
@@ -13,6 +13,7 @@
 #include linux/mempool.h
 #include linux/slab.h
 #include linux/crypto.h
+#include linux/key.h
 #include linux/workqueue.h
 #include asm/atomic.h
 #include asm/scatterlist.h
@@ -48,14 +49,25 @@
 	int write;
 };
 
-struct crypt_config;
+/*
+ * key and associated crypto information
+ */
+struct crypt_key {
+	struct crypt_iv_operations *iv_gen_ops;
+	char *iv_mode;
+	void *iv_gen_private;
+	unsigned int iv_size;
+
+	struct crypto_tfm *tfm;
+	unsigned int key_size;
+	u8 key[0];
+};
 
 struct crypt_iv_operations {
-	int (*ctr)(struct crypt_config *cc, struct dm_target *ti,
-	   const char *opts);
-	void (*dtr)(struct crypt_config *cc);
-	const char *(*status)(struct crypt_config *cc);
-	int (*generator)(struct crypt_config *cc, u8 *iv, sector_t sector);
+	int (*ctr)(struct crypt_key *ck, const char *opts, char **error_msg);
+	void (*dtr)(struct crypt_key *ck);
+	const char *(*status)(struct crypt_key *ck);
+	int (*generator)(struct crypt_key *ck, u8 *iv, sector_t sector);
 };
 
 /*
@@ -73,18 +85,9 @@
 	mempool_t *io_pool;
 	mempool_t *page_pool;
 
-	/*
-	 * crypto related data
-	 */
-	struct crypt_iv_operations *iv_gen_ops;
-	char *iv_mode;
-	void *iv_gen_private;
 	sector_t iv_offset;
-	unsigned int iv_size;
-
-	struct crypto_tfm *tfm;
-	unsigned int key_size;
-	u8 key[0];
+	struct crypt_key *crypto;
+	struct key *key;
 };
 
 #define MIN_IOS256
@@ -121,16 +124,16 @@
  * http://article.gmane.org/gmane.linux.kernel.device-mapper.dm-crypt/454
  */
 
-static int crypt_iv_plain_gen(struct crypt_config *cc, u8 *iv, sector_t sector)
+static int crypt_iv_plain_gen(struct crypt_key *ck, u8 *iv, sector_t sector)
 {
-	memset(iv, 0, cc-iv_size);
+	memset(iv, 0, ck-iv_size);
 	*(u32 *)iv = cpu_to_le32(sector  0x);
 
 	return 0;
 }
 
-static int crypt_iv_essiv_ctr(struct crypt_config *cc, struct dm_target *ti,
-	  const char *opts)
+static int crypt_iv_essiv_ctr(struct crypt_key *ck, const char *opts,
+  char **error_msg)
 {
 	struct crypto_tfm *essiv_tfm;
 	struct crypto_tfm *hash_tfm;
@@ -139,19 +142,19 @@
 	u8 *salt;
 
 	if (opts == NULL) {
-		ti-error = PFX Digest algorithm missing for ESSIV mode;
+		*error_msg = PFX Digest algorithm missing for ESSIV mode;
 		return -EINVAL;
 	}
 
 	/* Hash the cipher key with the given hash algorithm */
 	hash_tfm = crypto_alloc_tfm(opts, 0);
 	if (hash_tfm == NULL) {
-		ti-error = PFX Error initializing ESSIV hash;
+		*error_msg = PFX Error initializing ESSIV hash;
 		return -EINVAL;
 	}
 
 	if (crypto_tfm_alg_type(hash_tfm) != CRYPTO_ALG_TYPE_DIGEST) {
-		ti-error = PFX Expected digest algorithm for ESSIV hash;
+		*error_msg = PFX Expected digest algorithm for ESSIV hash;
 		crypto_free_tfm(hash_tfm);
 		return -EINVAL;
 	}
@@ -159,63 +162,63 @@
 	saltsize = crypto_tfm_alg_digestsize(hash_tfm);
 	salt = kmalloc(saltsize, GFP_KERNEL);
 	if (salt == NULL) {
-		ti-error = PFX Error kmallocing salt storage in ESSIV;
+		

Re: dm-crypt crypt_status reports key?

2005-02-03 Thread Christophe Saout
Am Mittwoch, den 02.02.2005, 20:05 -0800 schrieb Matt Mackall:

> On Thu, Feb 03, 2005 at 03:34:29AM +0100, Christophe Saout wrote:
> > The keyring API seems very flexible. You can define your own type of
> > keys and give them names. Well, the name is probably irrelevant here and
> > should be chosen randomly but it's less likely to collide with someone
> > else.
>  
> Dunno here, seems that having one tool that gave the kernel a key named
> "foo" and then telling dm-crypt to use key "foo" is probably not a bad
> way to go. Then we don't have stuff like "echo  | dmsetup create"
> and the like and the key-handling smarts can all be put in one
> separate place.

Yes. I could also change cryptsetup to not mlockall the whole
application just because the key is passed down to libdevmapper which
does not treat parameters with special care.

> Getting from here to there might be interesting though. Perhaps we can
> teach dm-crypt to understand keys of the form "keyname:"? in
> addition to raw keys to keep compatibility. Might even be possible to
> push this down into crypt_decode_key() (or a smarter variant of same).
> 
> Meanwhile, I'd still like to hide the raw key in crypt_status().

Well, I don't. I don't know any tools that actually use the
DM_DEVICE_TABLE command except cryptsetup. I don't like to make the
interface inconsistent just because there might be an incompetent root
sitting in front of the machine.



signature.asc
Description: Dies ist ein digital signierter Nachrichtenteil


Re: dm-crypt crypt_status reports key?

2005-02-03 Thread Christophe Saout
Am Mittwoch, den 02.02.2005, 20:05 -0800 schrieb Matt Mackall:

 On Thu, Feb 03, 2005 at 03:34:29AM +0100, Christophe Saout wrote:
  The keyring API seems very flexible. You can define your own type of
  keys and give them names. Well, the name is probably irrelevant here and
  should be chosen randomly but it's less likely to collide with someone
  else.
  
 Dunno here, seems that having one tool that gave the kernel a key named
 foo and then telling dm-crypt to use key foo is probably not a bad
 way to go. Then we don't have stuff like echo key | dmsetup create
 and the like and the key-handling smarts can all be put in one
 separate place.

Yes. I could also change cryptsetup to not mlockall the whole
application just because the key is passed down to libdevmapper which
does not treat parameters with special care.

 Getting from here to there might be interesting though. Perhaps we can
 teach dm-crypt to understand keys of the form keyname:foo? in
 addition to raw keys to keep compatibility. Might even be possible to
 push this down into crypt_decode_key() (or a smarter variant of same).
 
 Meanwhile, I'd still like to hide the raw key in crypt_status().

Well, I don't. I don't know any tools that actually use the
DM_DEVICE_TABLE command except cryptsetup. I don't like to make the
interface inconsistent just because there might be an incompetent root
sitting in front of the machine.



signature.asc
Description: Dies ist ein digital signierter Nachrichtenteil


Re: dm-crypt crypt_status reports key?

2005-02-02 Thread Christophe Saout
Am Mittwoch, den 02.02.2005, 17:52 -0800 schrieb Matt Mackall:

> > An alternativ would be to use some form of handle to point to the key
> > after it has been given to the kernel. But that would require some more
> > infrastructure.
> 
> There's been some talk about such infrastructure already. I believe
> some pieces of it may already be in place.

Yes, you are right. I didn't follow the discussion but it actually looks
very promising. The keys in the infrastructure are reference-counted.
That's good.

The keyrings can be attached to either thread, processes, sessions or
users. 

It seems that it's possible to have floating keys (not attached to any
keyring). So we would just need to figure out how to use these keyrings
to allow communication with userspace applications. Process keyrings
seem to have the advantage that the keyring is dropped when it exits so
that all keys that are not in use by the kernel are also dropped. A
keyring for the root user would have the problem that if the cryptsetup
application aborts in the middle you would end up with old keys lying
around forever.

The keyring API seems very flexible. You can define your own type of
keys and give them names. Well, the name is probably irrelevant here and
should be chosen randomly but it's less likely to collide with someone
else.



signature.asc
Description: Dies ist ein digital signierter Nachrichtenteil


Re: dm-crypt crypt_status reports key?

2005-02-02 Thread Christophe Saout
Am Mittwoch, den 02.02.2005, 13:19 -0800 schrieb Matt Mackall:

> From looking at the dm_crypt code, it appears that it can be
> interrogated to report the current key. Some quick testing shows:
> 
> # dmsetup table /dev/mapper/volume1
> 0 200 crypt aes-plain 0123456789abcdef0123456789abcdef 0 7:0 0
> 
> Obviously, root can in principle recover this password from the
> running kernel but it seems silly to make it so easy.

I already tried that. It took me about five minutes using a shell, dd
and hexdump to get the key out of the running kernel...

> Moreover, it seems this facility exists to support some form of
> automated table storage (LVM?). As we don't want anyone/anything
> accidentally storing our passwords on disk in the clear, we probably
> shouldn't facilitate this. Perhaps we can stick something here like
> "" that the dm_crypt constructor can reject.

Yes, the reason is that the device-mapper supports on-the-fly
modifications of the device. cryptsetup has a command to resize the
mapping for example. It can do that without asking for the password
again. Features like this are the reason I'm doing this. Userspace tools
should be able to assume that they can use the result of a table status
command to create a new table with this information.

An alternativ would be to use some form of handle to point to the key
after it has been given to the kernel. But that would require some more
infrastructure.

I could imagine something like this:

Some kernel infrastructure for key management. It can hold keys which
are referenced by some form of handle. The keys are refcounted and wiped
if the reference count drops to zero.

If you want to create a dm-crypt mapping (or something else that uses
some form of cryptographic key) you create a new handle and assign the
key. Then you give the handle to dm-crypt which increments reference
count. When cryptsetup exits its reference to the key is dropped but the
key still has a reference from the active dm-crypt mapping. Later on
another application could then safely do something with that handle. As
long as an application or in-kernel user references the key it won't be
dropped so it's safe for a userspace application to play around with it.

BTW: The setkey command also seems to return the keys in use for IPSEC
connections.



signature.asc
Description: Dies ist ein digital signierter Nachrichtenteil


Re: dm-crypt crypt_status reports key?

2005-02-02 Thread Christophe Saout
Am Mittwoch, den 02.02.2005, 13:19 -0800 schrieb Matt Mackall:

 From looking at the dm_crypt code, it appears that it can be
 interrogated to report the current key. Some quick testing shows:
 
 # dmsetup table /dev/mapper/volume1
 0 200 crypt aes-plain 0123456789abcdef0123456789abcdef 0 7:0 0
 
 Obviously, root can in principle recover this password from the
 running kernel but it seems silly to make it so easy.

I already tried that. It took me about five minutes using a shell, dd
and hexdump to get the key out of the running kernel...

 Moreover, it seems this facility exists to support some form of
 automated table storage (LVM?). As we don't want anyone/anything
 accidentally storing our passwords on disk in the clear, we probably
 shouldn't facilitate this. Perhaps we can stick something here like
 secret that the dm_crypt constructor can reject.

Yes, the reason is that the device-mapper supports on-the-fly
modifications of the device. cryptsetup has a command to resize the
mapping for example. It can do that without asking for the password
again. Features like this are the reason I'm doing this. Userspace tools
should be able to assume that they can use the result of a table status
command to create a new table with this information.

An alternativ would be to use some form of handle to point to the key
after it has been given to the kernel. But that would require some more
infrastructure.

I could imagine something like this:

Some kernel infrastructure for key management. It can hold keys which
are referenced by some form of handle. The keys are refcounted and wiped
if the reference count drops to zero.

If you want to create a dm-crypt mapping (or something else that uses
some form of cryptographic key) you create a new handle and assign the
key. Then you give the handle to dm-crypt which increments reference
count. When cryptsetup exits its reference to the key is dropped but the
key still has a reference from the active dm-crypt mapping. Later on
another application could then safely do something with that handle. As
long as an application or in-kernel user references the key it won't be
dropped so it's safe for a userspace application to play around with it.

BTW: The setkey command also seems to return the keys in use for IPSEC
connections.



signature.asc
Description: Dies ist ein digital signierter Nachrichtenteil


Re: dm-crypt crypt_status reports key?

2005-02-02 Thread Christophe Saout
Am Mittwoch, den 02.02.2005, 17:52 -0800 schrieb Matt Mackall:

  An alternativ would be to use some form of handle to point to the key
  after it has been given to the kernel. But that would require some more
  infrastructure.
 
 There's been some talk about such infrastructure already. I believe
 some pieces of it may already be in place.

Yes, you are right. I didn't follow the discussion but it actually looks
very promising. The keys in the infrastructure are reference-counted.
That's good.

The keyrings can be attached to either thread, processes, sessions or
users. 

It seems that it's possible to have floating keys (not attached to any
keyring). So we would just need to figure out how to use these keyrings
to allow communication with userspace applications. Process keyrings
seem to have the advantage that the keyring is dropped when it exits so
that all keys that are not in use by the kernel are also dropped. A
keyring for the root user would have the problem that if the cryptsetup
application aborts in the middle you would end up with old keys lying
around forever.

The keyring API seems very flexible. You can define your own type of
keys and give them names. Well, the name is probably irrelevant here and
should be chosen randomly but it's less likely to collide with someone
else.



signature.asc
Description: Dies ist ein digital signierter Nachrichtenteil


Re: swap on dmcrypt crashes machine

2005-01-27 Thread Christophe Saout
Am Dienstag, den 25.01.2005, 18:28 +0100 schrieb Michael Buesch:

> I set up swap on an encrypted dmcrypt device.
> While stressing swap usage with make -j200 in the
> kernel tree, the machine crashes:
> 
> Adding 1461872k swap on /dev/mapper/swap.  Priority:-2 extents:1
> [ cut here ]
> kernel BUG at drivers/block/ll_rw_blk.c:2193!
> invalid operand:  [#1]
> SMP 
> Modules linked in: ipv6 ohci_hcd tuner tvaudio msp3400 bttv video_buf 
> firmware_class btcx_risc nvidia ehci_hcd uhci_hcd usbcore intel_agp agpgart 
> evdev
> CPU:0
> EIP:0060:[]Tainted: P  VLI
> EFLAGS: 00210002   (2.6.10-ck4-defaultidle) 
> EIP is at __blk_put_request+0x87/0x91

This seems to be this BUG:

BUG_ON(!list_empty(>queuelist));

in drivers/block/ll_rw_blk.c. I don't know how this could be non-empty
since scsi_end_request calls

if (blk_rq_tagged(req))
 blk_queue_end_tag(q, req);

just before calling end_that_request_last inside the same spinlock.

I don't know anything about tagged request handling but I don't think
this is the fault of the dm-crypt driver, perhaps it's stressing the
queue in some unexpected way? I have no idea what's going on here.



signature.asc
Description: Dies ist ein digital signierter Nachrichtenteil


Re: swap on dmcrypt crashes machine

2005-01-27 Thread Christophe Saout
Am Dienstag, den 25.01.2005, 18:28 +0100 schrieb Michael Buesch:

 I set up swap on an encrypted dmcrypt device.
 While stressing swap usage with make -j200 in the
 kernel tree, the machine crashes:
 
 Adding 1461872k swap on /dev/mapper/swap.  Priority:-2 extents:1
 [ cut here ]
 kernel BUG at drivers/block/ll_rw_blk.c:2193!
 invalid operand:  [#1]
 SMP 
 Modules linked in: ipv6 ohci_hcd tuner tvaudio msp3400 bttv video_buf 
 firmware_class btcx_risc nvidia ehci_hcd uhci_hcd usbcore intel_agp agpgart 
 evdev
 CPU:0
 EIP:0060:[c0218b8a]Tainted: P  VLI
 EFLAGS: 00210002   (2.6.10-ck4-defaultidle) 
 EIP is at __blk_put_request+0x87/0x91

This seems to be this BUG:

BUG_ON(!list_empty(req-queuelist));

in drivers/block/ll_rw_blk.c. I don't know how this could be non-empty
since scsi_end_request calls

if (blk_rq_tagged(req))
 blk_queue_end_tag(q, req);

just before calling end_that_request_last inside the same spinlock.

I don't know anything about tagged request handling but I don't think
this is the fault of the dm-crypt driver, perhaps it's stressing the
queue in some unexpected way? I have no idea what's going on here.



signature.asc
Description: Dies ist ein digital signierter Nachrichtenteil