from:"John Reiser"

Re: [RFC PATCH 1/4] ftrace: Make recordmcount.c handle fentry

2012-08-07 Thread John Reiser

On 08/07/2012 12:38 PM, Steven Rostedt wrote:
> From: Steven Rostedt 
> 
> With gcc 4.6.0 the -mfentry feature places the function profiling call at the 
> start of the function. When this is used, the call is to __fentry__ and not 
> mcount.
> 
> Change recordmcount.c to record both callers to __fentry__ and mcount.
   [snip]
> - (altmcount && strcmp(altmcount, symname) == 0))
> + (altmcount && strcmp(altmcount, symname) == 0) ||
> + (strcmp(fentry, symname) == 0))

The proposed change will work as long as all the *.o use the same name.
Only one of {"__fentry__", "mcount", "_mcount", altmcount} is allowed
for all the *.o as input for a particular run.  [Modulo the hack
of ignoring a leading '.' for 64-bit PowerPC, of course.]

If the user changes compilers (or changes CFLAGS by insert/remove "-mfentry")
without doing a "make clean", then recordmcount will omit some calls.

Those restrictions are easy to guess, and recovery is easy.  Therefore,
Ack'ed by: John Reiser 

-- 

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: STT_FUNC for assembler checksum and semaphore ops" in git-x86

2008-01-10 Thread John Reiser

Andi Kleen wrote:
> On Thu, Jan 10, 2008 at 04:59:52PM -0800, John Reiser wrote:
> 
>>Andi Kleen wrote:
>>
>>>But actually checking the default implementation in linkage.h already
>>>implements size: [snip]
>>
>>>Are you sure it doesn't work?  Your patch should be not needed. If it's
>>>still wrong then just ENDPROCs() need to be added.
>>
>>The ENDPROCs() were not used everywhere.  Some code used just END() instead,
>>while other code used nothing.  um/sys-i386/checksum.S didn't #include
> 
> 
> END() is fine too since it contains .size too:
> 
> #ifndef END
> #define END(name) \
>   .size name, .-name
> #endif
> 
> 
>>diff --git a/arch/x86/lib/semaphore_32.S b/arch/x86/lib/semaphore_32.S
>>index 444fba4..e2c6e0d 100644
>>--- a/arch/x86/lib/semaphore_32.S
>>+++ b/arch/x86/lib/semaphore_32.S
>>@@ -49,7 +49,7 @@ ENTRY(__down_failed)
>>  ENDFRAME
>>  ret
>>  CFI_ENDPROC
>>- END(__down_failed)
>>+ ENDPROC(__down_failed)
> 
> 
> I don't think these change makes sense given the definition of END()
> shown above.
> 
> The only change that would make sense is adding END() (or ENDPROC()) 
> to a function that doesn't have either of them yet.

No.  The pseudo op ".type name, @function" appears only in ENDPROC;
it does not appear in END.  So changing END to ENDPROC *does* alter
the Elf32_Sym for 'name'.  Just END produces STT_NOTYPE; ENDPROC
produces STT_FUNC.  A static analysis tool can get the info it wants
much more easily if all subroutines are marked as STT_FUNC.
In theory the tool could sort the symbols, notice the disjoint
coverage of the address space by the .size intervals of consecutive
symbols that are the targets of a CALL instruction, and thus deduce
that ".type foo, @function" *should* have been specified.  But this
is a heuristic, and it fails on boundaries where assembly code is
invoked via trap, interrupt, or exception (anything other than CALL).
Instead, specify STT_FUNC for each subroutine in the first place.
That requires .type, which means ENDPROC (not END) from linux/linkage.h.

-- 
John Reiser, [EMAIL PROTECTED]
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: STT_FUNC for assembler checksum and semaphore ops" in git-x86

2008-01-10 Thread John Reiser

Andi Kleen wrote:
> But actually checking the default implementation in linkage.h already
> implements size: [snip]

> Are you sure it doesn't work?  Your patch should be not needed. If it's
> still wrong then just ENDPROCs() need to be added.

The ENDPROCs() were not used everywhere.  Some code used just END() instead,
while other code used nothing.  um/sys-i386/checksum.S didn't #include
 .  I also got confused because gcc puts the
.type near the ENTRY, while ENDPROC puts it on the opposite end.

Here is a revised patch against  linux-2.6-x86.git , including a comment
in linkage.h which explains one motivation.  PowerPC is different
("_GLOBAL" instead of "ENTRY") and foreign to me, so I left it alone.

Signed off by: John Reiser <[EMAIL PROTECTED]>

diff --git a/include/linux/linkage.h b/include/linux/linkage.h
index ff203dd..024cfab 100644
--- a/include/linux/linkage.h
+++ b/include/linux/linkage.h
@@ -53,6 +53,10 @@
   .size name, .-name
 #endif

+/* If symbol 'name' is treated as a subroutine (gets called, and returns)
+ * then please use ENDPROC to mark 'name' as STT_FUNC for the benefit of
+ * static analysis tools such as stack depth analyzer.
+ */
 #ifndef ENDPROC
 #define ENDPROC(name) \
   .type name, @function; \
diff --git a/arch/x86/lib/semaphore_32.S b/arch/x86/lib/semaphore_32.S
index 444fba4..e2c6e0d 100644
--- a/arch/x86/lib/semaphore_32.S
+++ b/arch/x86/lib/semaphore_32.S
@@ -49,7 +49,7 @@ ENTRY(__down_failed)
ENDFRAME
ret
CFI_ENDPROC
-   END(__down_failed)
+   ENDPROC(__down_failed)

 ENTRY(__down_failed_interruptible)
CFI_STARTPROC
@@ -70,7 +70,7 @@ ENTRY(__down_failed_interruptible)
ENDFRAME
ret
CFI_ENDPROC
-   END(__down_failed_interruptible)
+   ENDPROC(__down_failed_interruptible)

 ENTRY(__down_failed_trylock)
CFI_STARTPROC
@@ -91,7 +91,7 @@ ENTRY(__down_failed_trylock)
ENDFRAME
ret
CFI_ENDPROC
-   END(__down_failed_trylock)
+   ENDPROC(__down_failed_trylock)

 ENTRY(__up_wakeup)
CFI_STARTPROC
@@ -112,7 +112,7 @@ ENTRY(__up_wakeup)
ENDFRAME
ret
CFI_ENDPROC
-   END(__up_wakeup)
+   ENDPROC(__up_wakeup)

 /*
  * rw spinlock fallbacks
@@ -132,7 +132,7 @@ ENTRY(__write_lock_failed)
ENDFRAME
ret
CFI_ENDPROC
-   END(__write_lock_failed)
+   ENDPROC(__write_lock_failed)

 ENTRY(__read_lock_failed)
CFI_STARTPROC
@@ -148,7 +148,7 @@ ENTRY(__read_lock_failed)
ENDFRAME
ret
CFI_ENDPROC
-   END(__read_lock_failed)
+   ENDPROC(__read_lock_failed)

 #endif

@@ -170,7 +170,7 @@ ENTRY(call_rwsem_down_read_failed)
CFI_ADJUST_CFA_OFFSET -4
ret
CFI_ENDPROC
-   END(call_rwsem_down_read_failed)
+   ENDPROC(call_rwsem_down_read_failed)

 ENTRY(call_rwsem_down_write_failed)
CFI_STARTPROC
@@ -182,7 +182,7 @@ ENTRY(call_rwsem_down_write_failed)
CFI_ADJUST_CFA_OFFSET -4
ret
CFI_ENDPROC
-   END(call_rwsem_down_write_failed)
+   ENDPROC(call_rwsem_down_write_failed)

 ENTRY(call_rwsem_wake)
CFI_STARTPROC
@@ -196,7 +196,7 @@ ENTRY(call_rwsem_wake)
CFI_ADJUST_CFA_OFFSET -4
 1: ret
CFI_ENDPROC
-   END(call_rwsem_wake)
+   ENDPROC(call_rwsem_wake)

 /* Fix up special calling conventions */
 ENTRY(call_rwsem_downgrade_wake)
@@ -214,6 +214,6 @@ ENTRY(call_rwsem_downgrade_wake)
CFI_ADJUST_CFA_OFFSET -4
ret
CFI_ENDPROC
-   END(call_rwsem_downgrade_wake)
+   ENDPROC(call_rwsem_downgrade_wake)

 #endif
diff --git a/arch/um/sys-i386/checksum.S b/arch/um/sys-i386/checksum.S
index 62c7e56..4f3f62b 100644
--- a/arch/um/sys-i386/checksum.S
+++ b/arch/um/sys-i386/checksum.S
@@ -26,6 +26,7 @@
  */

 #include 
+#include 

 /*
  * computes a partial checksum, e.g. for TCP/UDP fragments
@@ -48,7 +49,7 @@ unsigned int csum_partial(const unsigned char * buff, int 
len, unsigned int sum)
   * Fortunately, it is easy to convert 2-byte alignment to 4-byte
   * alignment for the unrolled loop.
   */   
-csum_partial:
+ENTRY(csum_partial)
pushl %esi
pushl %ebx
movl 20(%esp),%eax  # Function arg: unsigned int sum
@@ -113,12 +114,13 @@ csum_partial:
popl %ebx
popl %esi
ret
+   ENDPROC(csum_partial)

 #else

 /* Version for PentiumII/PPro */

-csum_partial:
+ENTRY(csum_partial)
pushl %esi
pushl %ebx
movl 20(%esp),%eax  # Function arg: unsigned int sum
@@ -211,6 +213,7 @@ csum_partial:
popl %ebx
popl %esi
ret
+   ENDPROC(csum_partial)

 #endif

@@ -250,7 +253,7 @@ unsigned int csum_partial_copy_generic (const char *src, 
char *dst,
 #define ARGBASE 16 
 #define FP 12

Re: STT_FUNC for assembler checksum and semaphore ops" in git-x86

2008-01-10 Thread John Reiser

>>+   .type csum_partial, @function
>> ENTRY(csum_partial)
>>+   .type csum_partial, @function
>> ENTRY(csum_partial)
>>CFI_STARTPROC
>>pushl %esi
>>@@ -141,11 +142,13 @@ ENTRY(csum_partial)
>>ret
>>CFI_ENDPROC
>> ENDPROC(csum_partial)
>>+   .size csum_partial, . - csum_partial

>>AK [Andi Kleen]:
>>Better option would be to just add to ENTRY/ENDPROC ...

> John got more or less same comment from me [Sam Ravnborg] - but
> I did not hear further.
> As it stands out now it not nice.

It does look ugly.  But .size and .type are oriented towards static
description, while the dwarf2 CurrentFrameInfo (CFI) annotations
are oriented towards dynamic traceback and unwinding.  Forcing the
coupling between static and dynamic annotation might lead to trouble
when the fast path to 'ret' is in the middle, and code for the slow
path appears after the 'ret'; or when a recursive tail 'falls
through" into the entry point.

A quick test shows that multiple .size pseudo-ops for the same symbol
are accepted (the last one wins) so default coupling can be overridden.

I'll test revised macros and report shortly.

-- 
John Reiser, [EMAIL PROTECTED]
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

STT_FUNC for assembler checksum and semaphore ops

2008-01-07 Thread John Reiser

 
;; r10 - src
@@ -121,4 +122,4 @@
addu.b  [$r10],$r12
ret
move.d  $r12, $r10
-   
+   .size csum_partial, . - csum_partial
--- ./arch/cris/arch-v32/lib/checksum.S.orig2007-11-02 07:03:45.0 
-0700
+++ ./arch/cris/arch-v32/lib/checksum.S 2008-01-07 08:45:39.0 -0800
@@ -6,6 +6,7 @@
  */

.globl  csum_partial
+   .type csum_partial, @function
 csum_partial:

;; r10 - src
@@ -109,3 +110,4 @@
addu.b  [$r10],$r12
ret
move.d  $r12,$r10
+   .size csum_partial, . - csum_partial

-- 
John Reiser, [EMAIL PROTECTED]

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: slab quirks in DEBUG, ctor, and initialization

2007-12-17 Thread John Reiser

Hi Pekka,

>>In mm/slab.c, the DEBUG variant of cache_alloc_debugcheck_after
>>might call  cachep->ctor(objp, cachep, 0);  but the non-DEBUG
>>variant does absolutely nothing.  idr_pre_get is a routine
>>which notices the difference.
> 
> 
> How does ipr_pre_get notice this?

idr_pre_get calls kmem_cache_alloc, which constructs 'struct idr_layer'
via the cachep->ctor() call from cache_alloc_debugcheck_after to
idr_cache_ctor, and not via cache_init_objs.  So if DEBUG is off,
then idr_cache_ctor does not get its chance to call memset,
which could leave the struct idr_layer potentially undefined.

>>Even when cache_alloc_debugcheck_after does invoke the ctor,
>>then it is conditional upon  cachep->flags & SLAB_POISON.  This
>>assumes that the only two states are poisoned and all-zero
>>(from .bss static, or via a cleared new page frame.)
>>So if SLAB_POISON is not specified, then a ctor which
>>does anything other than memset(,0,) is out of luck.
>>Instead: if a ctor is specified then it should be called
>>for each successful allocation.
> 
> 
> Sorry, I don't understand at all what's the problem is here. For the
> common (non-poison) case, we initialize all objects *once* whenever a
> cache is grown (see cache_grow calling cache_init_objs) which is the
> whole point of having constructors. When poisoning is enabled, we
> obviously cannot do this which is why we call the constructor for
> every allocation.

There is a comment near the beginning of mm/slab.c:
 * This means, that your constructor is used only for newly allocated
 * slabs and you must pass objects with the same initializations to
 * kmem_cache_free.
Note that the comment should be updated to account for the constructor
also being called if SLAB_POISON is specified.  A significant implication
is that calling the constructor always establishes good state, even
though the call might not be necessary, and although it might take
more time than not calling it.

In order to avoid the allocator having to call the constructor, then the
caller of kmem_cache_free must call the constructor just before freeing,
or establish the same values.  In contrast to calling the constructor
every time, the existing convention increases complexity and risk of bugs,
in exchange for faster performance when actions equivalent to the ctor
are performed in the dtor instead, or when state that is equivalent to
the ctor occurs as a natural byproduct of being done with the object.

The real gripe is that the current code makes it cumbersome for a
dynamic checking program such as valgrind(memcheck) to record correctly
the state of the object.  Is the object initialized (and which parts
of it, and were the initializations performed into allocated space),
or is it merely allocated and uninitialized?

-- 
John Reiser, [EMAIL PROTECTED]
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: /dev/urandom uses uninit bytes, leaks user data

2007-12-17 Thread John Reiser

Theodore Tso wrote:
> On Fri, Dec 14, 2007 at 04:30:08PM -0800, John Reiser wrote:
> 
>>There is a path that goes from user data into the pool.

Note particularly that the path includes data from other users.
Under the current implementation, anyone who accesses /dev/urandom
is subject to having some bytes from their address space being captured
and mixed into the pool.

>>   This path
>>is subject to manipulation by an attacker, for both reading and
>>writing.  Are you going to guarantee that in five years nobody
>>will discover a way to take advantage of it?

> Yep, I'm confident about making such a guarantee.  Very confident.

A direct attack (determining the specific values or partial values
of some captured bytes) is not the only way to steal secrets.
An indirect attack, such as traffic analysis, also may be effective.

Here is one idea.  Use output from /dev/urandom to generate a random
permutation group.  Analyze the group: determine all its subgroups, etc.
If the structure of those groups has different properties depending
on the captured bytes, even after SHA1 and folding and twisting,
then that may be enough to help steal secrets.

Indirect attacks may be subject to "exponent doubling."  The state
modulo 2**(2n) may correspond to a system of 2**n congruences
in 2**n variables.  So a property modulo 2**n might be hoisted to a
related property modulo 2**(2n).  This might make 2**1024 seem to be
not so big.

Also, "getting lucky" is allowed, both via initial conditions and
via other coincidences.  Running on a newly-booted, newly-installed
system might be especially advantageous.  A completely formal
Goedel-numbering proof often has a formal checker that is logarithmic
in the length of the proof.  If such a logarithmic property applies
every once in a while to /dev/urandom, then that might be enough.

The bottom line:  At a cost of at most three unpredictable branches
(whether to clear the bytes in the last word with indices congruent
to 1, 2, or 3 modulo 4), then the code can reduce the risk from something
small but positive, to zero.  This is very inexpensive insurance.

-- 
John Reiser, [EMAIL PROTECTED]
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

slab quirks in DEBUG, ctor, and initialization

2007-12-17 Thread John Reiser

In mm/slab.c, the DEBUG variant of cache_alloc_debugcheck_after
might call  cachep->ctor(objp, cachep, 0);  but the non-DEBUG
variant does absolutely nothing.  idr_pre_get is a routine
which notices the difference.

Even when cache_alloc_debugcheck_after does invoke the ctor,
then it is conditional upon  cachep->flags & SLAB_POISON .  This
assumes that the only two states are poisoned and all-zero
(from .bss static, or via a cleared new page frame.)
So if SLAB_POISON is not specified, then a ctor which
does anything other than memset(,0,) is out of luck.
Instead: if a ctor is specified then it should be called
for each successful allocation.

Invoking the ctor from within cache_alloc_debugcheck_after makes it
awkward for a dynamic checker of allocations and  initializations.
Valgrind would be happier if allocation and initialization were
invoked at the same subroutine nesting level, such as in __cache_alloc:

cache_alloc_debugcheck_before(cachep, flags);
local_irq_save(save_flags);
objp = __do_cache_alloc(cachep, flags);
/* checker marks the space as allocated */
local_irq_restore(save_flags);
objp = cache_alloc_debugcheck_after(cachep, flags, objp, caller);
if (objp && cachep->ctor)
cachep->ctor(objp, cachep, 0);
/* checker notes initializations during ctor [above] */

-- 
John Reiser, [EMAIL PROTECTED]
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: /dev/urandom uses uninit bytes, leaks user data

2007-12-14 Thread John Reiser

Theodore Tso wrote:
> On Fri, Dec 14, 2007 at 12:45:23PM -0800, John Reiser wrote:
> 
>>>It's getting folded into the random number pool, where it will be
>>>impossible to recover it unless you already know what was in the
>>>pool.  And if you know what's in the pool, you've already broken into
>>>the kernel.
>>
>>The combination of capturing data from other users, plus seeding
>>the pool with your own data, just might be powerful enough to help
>>steal secrets, sometime in the next five years, from data that is
>>recorded today.
> 
> 
> Um, no.  Just seeding the pool with your own data won't help, since
> that still won't tell you the initial contents of the pool.  And if
> you know the initial contents of the pool, then you've broken root.
> And being able to steal from the pool also assumes that you've broken
> into the system; it is never, ever exported to userspace, even if
> you're root (unless you use something like /dev/kmem).  Furthermore,
> if you don't know the previous contents of the pool, you'll never be
> able to recover the information, either now or five years in the
> future, since information is XOR'ed into the pool.

There is a path that goes from user data into the pool.  This path
is subject to manipulation by an attacker, for both reading and
writing.  Are you going to guarantee that in five years nobody
will discover a way to take advantage of it?  Five years ago
there were no public attacks against MD5 except brute force;
now MD5 is on the "weak" list.

>>>But I'm sympathetic to making Valgrind happy.  ...
> 
> 
> How about wrapping it in a #ifdef CONFIG_UML, which is the only way
> you can use Valgrind?  The memset will slow down things unnecessarily,
> and mixing in the unknown previous contents of the stack (a) doesn't
> hurt, and (b) could make life more complicated for an attacker.

If speed matters that much, then please recoup 33 cycles on x86
by using shifts instead of three divides, such as (gcc 4.1.2):

    add_entropy_words(r, tmp, (bytes + 3) / 4);

0x8140689 :lea0x3(%esi),%eax
0x814068c :mov$0x4,%dl
0x814068e :mov%edx,%edi
0x8140690 :cltd
0x8140691 :idiv   %edi

-- 
John Reiser, [EMAIL PROTECTED]
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: /dev/urandom uses uninit bytes, leaks user data

2007-12-14 Thread John Reiser

Matt Mackall wrote:
> Yes, we use uninitialized data. But it's not a leak in any useful
> sense. To the extent the previous data is secret, this actually
> improves our entropy.
> 
> It's getting folded into the random number pool, where it will be
> impossible to recover it unless you already know what was in the
> pool.  And if you know what's in the pool, you've already broken into
> the kernel.

The combination of capturing data from other users, plus seeding
the pool with your own data, just might be powerful enough to help
steal secrets, sometime in the next five years, from data that is
recorded today.

> But I'm sympathetic to making Valgrind happy.  ...

> [The code in John's patch is] hideous.  How about a memset instead ...

> And [John's] change is broken..   We have to add precisely the
> number of bytes returned by extract_entropy to keep the books
> balanced.

Matt is correct.  The rolled-up result follows.

Signed off by: [EMAIL PROTECTED]

--- ./drivers/char/random.c.orig2007-12-14 11:06:03.0 -0800
+++ ./drivers/char/random.c 2007-12-14 12:27:23.0 -0800
@@ -708,6 +708,8 @@

bytes=extract_entropy(r->pull, tmp, bytes,
  random_read_wakeup_thresh / 8, rsvd);
+   /* clear uninitialized bytes at the end to make valgrind happy 
*/
+   memset((char *)tmp + bytes, 0, -bytes & 3);
add_entropy_words(r, tmp, (bytes + 3) / 4);
credit_entropy_store(r, bytes*8);
}
-- 
John Reiser, [EMAIL PROTECTED]

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

/dev/urandom uses uninit bytes, leaks user data

2007-12-14 Thread John Reiser

xfer_secondary_pool() in drivers/char/random.c tells add_entropy_words()
to use uninitialized tmp[] whenever bytes is not a multiple of 4.
Besides being unfriendly to automated dynamic checkers, this is a
potential leak of user data into the output stream.  When called from
extract_entropy_user, then uninit tmp[] can capture leftover data
from a previous copy_from_user().

Signed off by: [EMAIL PROTECTED]

--- ./drivers/char/random.c.orig2007-12-14 11:06:03.0 -0800
+++ ./drivers/char/random.c 2007-12-14 11:06:57.0 -0800
@@ -708,7 +708,19 @@

bytes=extract_entropy(r->pull, tmp, bytes,
  random_read_wakeup_thresh / 8, rsvd);
-   add_entropy_words(r, tmp, (bytes + 3) / 4);
+   /*
+* 2007-12-13 (valgrind/memcheck) Do not use undefined bytes.
+* Avoid info leak when called from extract_entropy_user:
+* uninit tmp[] can have data from previous copy_from_user().
+* Instead: fill last word using first bytes.
+*/
+   {
+   __u8 *src = (__u8 *)&tmp[0];
+   __u8 *dst = bytes + src;
+   for (; 0!=(3 & bytes); ++bytes)
+   *dst++ = *src++;
+   }
+   add_entropy_words(r, tmp, bytes>>2);
credit_entropy_store(r, bytes*8);
    }

-- 
John Reiser, [EMAIL PROTECTED]

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: + fully-honor-vdso_enabled.patch added to -mm tree

2007-03-02 Thread John Reiser

Chuck Ebbert wrote:
> John Reiser wrote:
> 
>>The value of ->sysenter_return is interpreted in user space by the
>>sysexit instruction; nobody else cares what the value is.  The kernel
>>is not required to provide a good value when vdso_enabled is zero,
>>because the kernel has not told the process that sysenter is valid
>>(by setting AT_SYSINFO.)
> 
> 
> Doesn't matter because a malicious user can still execute sysenter.
> We do have to deal with that somehow, so we have to put something
> safe in there.

All values are safe as far as the kernel is concerned.  The sysexit
instruction is the only consumer of the value.  The sysexit instruction
interprets the value as a _user_ virtual address, and jumps to it,
in _user_ mode.  If user code does not like jumping to a random address
when vdso_enabled is zero, then user code should not use sysenter
when vdso_enabled is zero.  But execution of kernel code is not
affected in any way regardless of the value.

I'd be happy to set ->sysenter_return to 0 (or any other suggested value)
when vdso_enabled is 0, but this would be a politeness only.  There is
no added security risk to the kernel by leaving it unset.

>>Correct.  Changing vdso_enabled from 0 to non-zero must be prepared
>>to lose this race if it is not prevented.  Ordinarily it won't matter
>>because the administrator will perform such changes at a "quiet" time.
>>
> 
> 
> We have to deal with all the possibilities here, too.

Documentation is one method of dealing with it.

-- 
John Reiser, [EMAIL PROTECTED]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: + fully-honor-vdso_enabled.patch added to -mm tree

2007-03-02 Thread John Reiser

Oleg Nesterov wrote:

> Still, I don't understand why we don't pass NEW_AUX_ENT(AT_SYSINFO) when
> vdso_enabled == 0. We don't need linux-gate.so to use __kernel_vsyscall,
> we have FIX_VDSO. In that case we should s/PAGE_KERNEL_RO/PAGE_READONLY/
> of course. I guess the reason is some magic in glibc.

In the past there were problems where user code could not single-step
(or even read the code of) such system calls.  Can these two operations
be performed today with what you propose, particularly if FIX_VDSO is
near 0xf000 ?

-- 
John Reiser, [EMAIL PROTECTED]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: + fully-honor-vdso_enabled.patch added to -mm tree

2007-03-02 Thread John Reiser

Oleg Nesterov wrote:
> John Reiser wrote:

>>+ switch (vdso_enabled) {
>>+ case 0:  /* none */
>>+ return 0;
> 
> 
> This means we don't initialize mm->context.vdso and ->sysenter_return.
> 
> Is it ok? For example, setup_rt_frame() uses VDSO_SYM(&__kernel_rt_sigreturn),
> sysenter_past_esp pushes ->sysenter_return on stack.

Paul Mundt has commented on setup_rt_frame() and provided a patch which
bullet-proofs that area.  I will include that patch into the next revision.

The value of ->sysenter_return is interpreted in user space by the
sysexit instruction; nobody else cares what the value is.  The kernel
is not required to provide a good value when vdso_enabled is zero,
because the kernel has not told the process that sysenter is valid
(by setting AT_SYSINFO.)  The kernel requires specific register values
for sysenter+sysexit and these values may change at the whim of the
kernel, so correct code must follow the kernel's protocol.
glibc uses sysenter only when AT_SYSINFO is present.  User code can
screw up even when vdso_enabled is non-zero, by overwriting or re-
mapping the vdso page (clobber memory at the destination of sysexit.)

Both context.vdso and sysenter_return could be set to zero whenever
vdso_enabled is zero; those two values might even be defaulted.
I'll add such a change to the next revision of the patch, if you'll
defend it against claims of "unnecessary code."

> 
> Note also that load_elf_binary does
> 
>   arch_setup_additional_pages()
>   create_elf_tables()
> 
> , looks like application can crash after exec if vdso_enabled changes from 0
> to 1 in between.

Correct.  Changing vdso_enabled from 0 to non-zero must be prepared
to lose this race if it is not prevented.  Ordinarily it won't matter
because the administrator will perform such changes at a "quiet" time.

-- 
John Reiser, [EMAIL PROTECTED]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

fully honor vdso_enabled [i386, sh; x86_64?]

2007-02-22 Thread John Reiser

Architectures such as i386, sh, x86_64 have a flag /proc/sys/vm/vdso_enabled
to choose whether the kernel should setup a process to use vdso after execve().
Informing the user code via AT_SYSINFO* is controlled by macro ARCH_DLINFO in
fs/binfmt_elf.c and include/asm-$ARCH/elf.h, but the vdso page is established
always via arch_setup_additonal_pages() called from load_elf_binary().
If vdso_enabled is off, then current code wastes kernel time during execve()
and fragments the address space unnecessarily.

This patch changes arch_setup_additonal_pages() to honor vdso_enabled.
For i386 it also allows the option of a fixed addresss to avoid
fragmenting the address space.  Compiles and runs on i386.
x86_64 [IA32 support] and sh maintainers also please comment.

For some related history, including interaction with exec-shield, see:
http://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=229304
and also 207020 and 162797.


Signed-off-by: John Reiser <[EMAIL PROTECTED]>

diff --git a/arch/i386/kernel/sysenter.c b/arch/i386/kernel/sysenter.c
index 13ca54a..f8c4d76 100644
--- a/arch/i386/kernel/sysenter.c
+++ b/arch/i386/kernel/sysenter.c
@@ -22,6 +22,8 @@
 #include 
 #include 
 #include 
+#include 
+#include 

 /*
  * Should the kernel map a VDSO page into processes and pass its
@@ -105,10 +107,25 @@ int arch_setup_additional_pages(struct linux_binprm 
*bprm, int exstack)
 {
struct mm_struct *mm = current->mm;
unsigned long addr;
+   unsigned long flags;
int ret;

+   switch (vdso_enabled) {
+   case 0:  /* none */
+   return 0;
+   default:
+   case 1:  /* vdso in random available page */
+   addr = 0ul;
+   flags = 0ul;
+   break;
+   case 2:  /* out of user's way */
+   addr = STACK_TOP;
+   flags = MAP_FIXED;
+   break;
+   }
+
down_write(&mm->mmap_sem);
-   addr = get_unmapped_area(NULL, 0, PAGE_SIZE, 0, 0);
+   addr = get_unmapped_area(NULL, addr, PAGE_SIZE, 0, flags);
if (IS_ERR_VALUE(addr)) {
ret = addr;
goto up_fail;
diff --git a/arch/sh/kernel/vsyscall/vsyscall.c 
b/arch/sh/kernel/vsyscall/vsyscall.c
index 7b0f66f..2b789fb 100644
--- a/arch/sh/kernel/vsyscall/vsyscall.c
+++ b/arch/sh/kernel/vsyscall/vsyscall.c
@@ -64,6 +64,9 @@ int arch_setup_additional_pages(struct linux_binprm *bprm,
unsigned long addr;
int ret;

+   if (!vdso_enabled)
+   return 0;
+
down_write(&mm->mmap_sem);
addr = get_unmapped_area(NULL, 0, PAGE_SIZE, 0, 0);
if (IS_ERR_VALUE(addr)) {
diff --git a/include/asm-i386/a.out.h b/include/asm-i386/a.out.h
index ab17bb8..9894f73 100644
--- a/include/asm-i386/a.out.h
+++ b/include/asm-i386/a.out.h
@@ -19,7 +19,7 @@ struct exec

 #ifdef __KERNEL__

-#define STACK_TOP  TASK_SIZE
+#define STACK_TOP  (TASK_SIZE - PAGE_SIZE)  /* 1 page optional for vdso */

 #endif

-- 

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

bug in load_elf_binary [PATCH]

2001-06-02 Thread John Reiser


In kernel 2.4.3 and 2.2.18, there is a bug in fs/binfmt_elf.c function
load_elf_binary().  An ET_DYN file that asks for a PT_INTERP, and whose
first PT_LOAD is not at 0, gets AT_PHDR that is (load_bias + 2 * p_vaddr),
instead of the correct (load_bias + p_vaddr).  The patch for 2.4.3 is

--- fs/OLDbinfmt_elf.c  Mon Mar 19 17:05:16 2001
+++ fs/binfmt_elf.c Sat Jun  2 16:20:54 2001
@@ -632,5 +632,5 @@
load_bias += error -
 ELF_PAGESTART(load_bias + vaddr);
-   load_addr += error;
+   load_addr += load_bias;
}
}
=
Please cc: me if appropriate; I'm not subscribed.

To demonstrate the problem on i386:
cat >foo.s </maps  # I see  pid_of_foo.so = 2 + pid_of_nice
# output:
# 88048000-88049000 r-xp  03:07 2207  foo.so  # load_bias is 0x8000
# 88049000-8804a000 rw-p  03:07 2207  foo.so

gdb foo.so   # Attach to process, and look at its memory.
x/64x $esp  # and continue until seeing the AT_* on the stack.  I see

# 0xba90: 0xbfe8  0x  0x0003  0x90090034
# 0xbaa0: 0x0004  0x0020  0x0005  0x0005
# 0xbab0: 0x0006  0x1000  0x0007  0x4000
# 0xbac0: 0x0008  0x  0x0009  0x88048114

# which shows AT_PHDR at 0xba9c of 0x90090034
#  which is too big by 0x08048000  [p_vaddr of first PT_LOAD]
# because the correct value is 0x88048034.
# Note that  AT_ENTRY at 0xbacc is 0x88048114  which is correct.

-- 
John Reiser, [EMAIL PROTECTED]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: tighter compression for x86 kernels

2000-12-21 Thread John Reiser


> > Both source (GPLv2) and pre-compiled binary for x86 are available.
>^
> That's not true. Read
>   http://wildsau.idv.uni-linz.ac.at/mfx/upx-license.html

The UPX team owns all copyright in all of UPX and in each part of UPX.
Therefore, the UPX team may choose which license(s), and has chosen two.
One of them is GPLv2.  The UPX team understands, and fully intends
to abide by, its obligations under GPLv2 when any software that is subject
to GPLv2 is contributed to UPX and re-distributed by the UPX team.  The
other license is detailed in the LICENSE file, but may be summarized as:
free to use if unmodified, and if used only to invoke the program, and
sublicensable only under the same terms.  This permits using UPX to pack
a non-GPL executable.  Users of UPX (as distributed by the UPX team)
may choose whether to use UPX according to GPLv2, or according to
the other license.
[I am not subscribed to this mailing list, so CC: or mail me if appropriate.]

-- 
John Reiser, [EMAIL PROTECTED]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

tighter compression for x86 kernels

2000-12-20 Thread John Reiser


Beta release v1.11 of the UPX executable compressor http://upx.tsx.org
offers new, tighter re-compression of compressed Linux kernels for x86.
Additional space savings of about 15% have been seen using
"upx --best vmlinuz" (example: 617431 ==> 525099, saving 92332 bytes).
Both source (GPLv2) and pre-compiled binary for x86 are available.
[I'm not subscribed to this mailing list, so CC: or mail me if appropriate.]

-- 
John Reiser, [EMAIL PROTECTED]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: [RFC PATCH 1/4] ftrace: Make recordmcount.c handle fentry

Re: STT_FUNC for assembler checksum and semaphore ops" in git-x86

Re: STT_FUNC for assembler checksum and semaphore ops" in git-x86

Re: STT_FUNC for assembler checksum and semaphore ops" in git-x86

STT_FUNC for assembler checksum and semaphore ops

Re: slab quirks in DEBUG, ctor, and initialization

Re: /dev/urandom uses uninit bytes, leaks user data

slab quirks in DEBUG, ctor, and initialization

Re: /dev/urandom uses uninit bytes, leaks user data

Re: /dev/urandom uses uninit bytes, leaks user data

/dev/urandom uses uninit bytes, leaks user data

Re: + fully-honor-vdso_enabled.patch added to -mm tree

Re: + fully-honor-vdso_enabled.patch added to -mm tree

Re: + fully-honor-vdso_enabled.patch added to -mm tree

fully honor vdso_enabled [i386, sh; x86_64?]

bug in load_elf_binary [PATCH]

Re: tighter compression for x86 kernels

tighter compression for x86 kernels

18 matches

Site Navigation

Mail list logo

Footer information