from:"Kyle Moffett"

[NET/IPv6] Race condition with flow_cache_genid?

2008-02-02 Thread Kyle Moffett

Hi, I was poking around trying to figure out how to install the Mobile
IPv6 daemons this evening and noticed they required a kernel patch,
although upon further inspection the kernel patch seemed to already be
applied in 2.6.24.  Unfortunately the flow cache appears to be
horribly racy.  Attached below are the only uses of the variable
"flow_cache_genid" in 2.6.24.

Now, I am no expert in this particular area of the code, but the
"atomic_t flow_cache_genid" variable is ONLY ever used with
atomic_inc() and atomic_read().  There are no memory barriers or other
dec_and_test()-style functions, so that variable could just as easily
be replaced with a plain old C int.

Basically either there is some missing locking here or it does not
need to be "atomic_t".  Judging from the way it *appears* to be used
to check if cache entries are up-to-date with the latest changes in
policy, I would guess the former.

In particular that whole "flow_cache_lookup()" thing looks racy as
hell, since somebody could be in the middle of that looking at "if
(fle->genid == atomic_read(&flow_cache_genid))".  It does the
atomic_read(), which BTW is literally implemented as:
  #define atomic_read(atomicvar) ((atomicvar)->value)
on some platforms.  Immediately after the atomic read (or even before,
since there's no cache-flush or read-modify-write), somebody calls
into "selinux_xfrm_notify_policyload()" and increments the
flow_cache_genid becase selinux just loaded a security policy.  Now
we're accepting a cache entry which applies to PREVIOUS security
policy.  I can only assume that's really bad.

Even worse, there seems to be a race between SELinux loading a new
policy and calling selinux_xfrm_notify_policyload(), since we could
easily get packets and process them according to the old cache entry
on one CPU before SELinux has had a chance to update the generation ID
from the other.  Furthermore, there's no guarantee the CPU caches will
get updated in reasonable time.  Clearly SELinux needs to have some
way of atomically invalidating the flow cache of all CPUs
*simultaneously* with loading a new policy, which probably means they
both need to be under the same lock, or something.

The same problem appears to occur with updating the XFRM policy and
incrementing flow_cache_genid.  Probably the fastest solution is to
put the flow cache under the xfrm_policy_lock (which already disables
local bottom-halves), and either take that lock during SELinux policy
load or if there are lock ordering problems then add a variable
"flow_cache_ignore" and change the xfrm_notify hooks:

void selinux_xfrm_notify_policyload_pre(void)
{
write_lock_bh(&xfrm_policy_lock);
flow_cache_genid++;
flow_cache_ignore = 1;
write_unlock_bh(&xfrm_policy_lock);
}

void selinux_xfrm_notify_policyload_post(void)
{
write_lock_bh(&xfrm_policy_lock);
flow_cache_ignore = 0;
write_unlock_bh(&xfrm_policy_lock);
}

Cheers,
Kyle Moffett


BEGIN QUOTED CODE INVOLVING flow_cache_genid:

include/net/flow.h:94:
extern atomic_t flow_cache_genid;

net/core/flow.c:39:
atomic_t flow_cache_genid = ATOMIC_INIT(0);

net/core/flow.c:169:flow_cache_lookup():
if (flow_hash_rnd_recalc(cpu))
flow_new_hash_rnd(cpu);
hash = flow_hash_code(key, cpu);

head = &flow_table(cpu)[hash];
for (fle = *head; fle; fle = fle->next) {
if (fle->family == family &&
fle->dir == dir &&
flow_key_compare(key, &fle->key) == 0) {
if (fle->genid == atomic_read(&flow_cache_genid)) {
void *ret = fle->object;

if (ret)
atomic_inc(fle->object_ref);
local_bh_enable();

return ret;
}
break;
}
}

net/xfrm/xfrm_policy.c:1025:
int xfrm_policy_delete(struct xfrm_policy *pol, int dir)
{
write_lock_bh(&xfrm_policy_lock);
pol = __xfrm_policy_unlink(pol, dir);
write_unlock_bh(&xfrm_policy_lock);
if (pol) {
if (dir < XFRM_POLICY_MAX)
atomic_inc(&flow_cache_genid);
xfrm_policy_kill(pol);
return 0;
}
return -ENOENT;
}

net/ipv6/inet6_connection_sock.c:142:
static inline
void __inet6_csk_dst_store(struct sock *sk, struct dst_entry *dst,
struct in6_addr *daddr, struct in6_addr *saddr)
{
__ip6_dst_store(sk, dst, daddr, saddr);

#ifdef CONFIG_XFRM
{
struct rt6_info *rt = (struct rt6_info  *)dst;
rt->rt6i_flow_cache_genid = atomic_read(&flow

Re: [PATCH] Allow NBD to be used locally

2008-02-02 Thread Kyle Moffett

Whoops, only hit "Reply" on the first email, sorry Jan.

On Feb 2, 2008 7:54 PM, Jan Engelhardt <[EMAIL PROTECTED]> wrote:
> On Feb 2 2008 18:31, [EMAIL PROTECTED] wrote:
> >
> >> How will that work? Fuse makes up a filesystem - not helpful
> >> if you have a raw disk without a known fs to mount.
> >
> >take zfs-fuse or ntfs-3g for example.
> >you have a blockdevice or backing-file containing data structures and fuse 
> >makes those show up as a filesystem.
> >i think vmware-mount is not different here.
>
> vmware-mount IS different, it provides the _block_ device,
> which is then mounted through the usual mount(2) mechanism
> (if there is a filesystem driver for it).

As far as I can tell, vmware-mount should be re-implemented as a
little perl script around "dmsetup" and/or "losetup", possibly with
"dm-userspace" patched into the kernel to allow you to handle
non-mapped blocks in your userspace daemon when somebody tries to
access them.  If you don't need that ability then straight dm-loop and
dm-linear will work.

Cheers,
Kyle Moffett
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[RFC] Best method to control a "transmit-only" mode on fiber NICs (specifically sky2)

2008-02-15 Thread Kyle Moffett

Hi,

The company I'm working for has an unusual fiber NIC configuration
that we use for one of our network appliances.  We connect only a
single fiber from the TX port on one NIC to the RX port on another
NIC, providing a physically-one-way path for enhanced security.
Unfortunately this doesn't work with most NIC drivers, as even with
auto-negotiation off they look for link probe pulses before they
consider the link "up" and are willing to send packets.  We have been
able to use Myricom 10GigE NICs with a custom firmware image.  More
recently we have patched the sky2 driver to turn on the FIB_FORCE_LNK
flag in the PHY control register; this seems to work on the
Marvell-chipset boards we have here.

What would be the preferred way to control this "force link" flag?
Right now we are accessing it using ethtool; we have added an
additional "duplex" mode: "DUPLEX_TXONLY", with a value of 2.  When
you specify a speed and turn off autonegotiation ("./patched-ethtool
-s eth2 speed 1000 autoneg off duplex txonly"), it will turn on the
specified bit in the PHY control register and the link will
automatically come up.  We also have one related bug-fix^Wdirty hack
for sky2 to reset the PHY a second time during netif-up after enabling
interrupts; otherwise the immediate "link up" interrupt gets lost.
Once I get approval from the company I will patch the post itself for
review.

I look forward to your comments and suggestions

Cheers,
Kyle Moffett
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] target: Update copyright ownership to 2012

2012-11-10 Thread Kyle Moffett

On Fri, Nov 9, 2012 at 3:00 PM, Nicholas A. Bellinger
 wrote:
> This patch to update copyright year to current for principal target core
> ownership is now being pushed into target-pending/for-next.

Pardon me, but you were just publicly accused of violating the GPL, so
your response is to send a patch removing the copyright notices of all
other organizations from the SCSI-target code?  Have you obtained
ownership of all the relevant copyrights for Linux-iSCSI.org, PyX
Technologies, Inc, and SBE, Inc?  If not, then this patch is an
attempted violation of those organizations copyrights and of the GPL
(which requires that you preserve copyright notices).

Further, while these notices are the only ones listed in those files,
they are not the only individuals outside of RisingTide Systems which
have significant copyright interest in this code.  If your goal is to
obtain exclusive copyright ownership over this code then there are a
great many other people you must contact and convince first.

I would encourage you to talk privately with the Software Freedom
Conservancy before sending more patches of this nature.

Cheers,
Kyle Moffett

> diff --git a/drivers/target/target_core_alua.c 
> b/drivers/target/target_core_alua.c
> - * Copyright (c) 2009-2010 Linux-iSCSI.org
> diff --git a/drivers/target/target_core_configfs.c 
> b/drivers/target/target_core_configfs.c
> - * Copyright (c) 2008-2011 Linux-iSCSI.org
> diff --git a/drivers/target/target_core_device.c 
> b/drivers/target/target_core_device.c
> - * Copyright (c) 2003, 2004, 2005 PyX Technologies, Inc.
> - * Copyright (c) 2005-2006 SBE, Inc.  All Rights Reserved.
> - * Copyright (c) 2008-2010 Linux-iSCSI.org
> diff --git a/drivers/target/target_core_fabric_configfs.c 
> b/drivers/target/target_core_fabric_configfs.c
> - * Copyright (c) 2010,2011 Linux-iSCSI.org
> diff --git a/drivers/target/target_core_fabric_lib.c 
> b/drivers/target/target_core_fabric_lib.c
> - * Copyright (c) 2010 Linux-iSCSI.org
> diff --git a/drivers/target/target_core_file.c 
> b/drivers/target/target_core_file.c
> - * Copyright (c) 2005 PyX Technologies, Inc.
> - * Copyright (c) 2005-2006 SBE, Inc.  All Rights Reserved.
> - * Copyright (c) 2008-2010 Linux-iSCSI.org
> diff --git a/drivers/target/target_core_hba.c 
> b/drivers/target/target_core_hba.c
> - * Copyright (c) 2003, 2004, 2005 PyX Technologies, Inc.
> - * Copyright (c) 2005, 2006, 2007 SBE, Inc.
> - * Copyright (c) 2008-2010 Linux-iSCSI.org
> diff --git a/drivers/target/target_core_iblock.c 
> b/drivers/target/target_core_iblock.c
> - * Copyright (c) 2003, 2004, 2005 PyX Technologies, Inc.
> - * Copyright (c) 2005, 2006, 2007 SBE, Inc.
> - * Copyright (c) 2008-2010 Linux-iSCSI.org
> diff --git a/drivers/target/target_core_pr.c b/drivers/target/target_core_pr.c
> - * Copyright (c) 2009, 2010 Linux-iSCSI.org
> diff --git a/drivers/target/target_core_pscsi.c 
> b/drivers/target/target_core_pscsi.c
> - * Copyright (c) 2003, 2004, 2005 PyX Technologies, Inc.
> - * Copyright (c) 2005, 2006, 2007 SBE, Inc.
> - * Copyright (c) 2008-2010 Linux-iSCSI.org
> diff --git a/drivers/target/target_core_rd.c b/drivers/target/target_core_rd.c
> - * Copyright (c) 2003, 2004, 2005 PyX Technologies, Inc.
> - * Copyright (c) 2005, 2006, 2007 SBE, Inc.
> - * Copyright (c) 2008-2010 Linux-iSCSI.org
> diff --git a/drivers/target/target_core_sbc.c 
> b/drivers/target/target_core_sbc.c
> - * Copyright (c) 2002, 2003, 2004, 2005 PyX Technologies, Inc.
> - * Copyright (c) 2005, 2006, 2007 SBE, Inc.
> - * Copyright (c) 2008-2010 Linux-iSCSI.org
> diff --git a/drivers/target/target_core_spc.c 
> b/drivers/target/target_core_spc.c
> - * Copyright (c) 2002, 2003, 2004, 2005 PyX Technologies, Inc.
> - * Copyright (c) 2005, 2006, 2007 SBE, Inc.
> - * Copyright (c) 2008-2010 Linux-iSCSI.org
> diff --git a/drivers/target/target_core_stat.c 
> b/drivers/target/target_core_stat.c
> - * Copyright (c) 2011 Linux-iSCSI.org
> - * Copyright (c) 2006-2007 SBE, Inc.  All Rights Reserved.
> diff --git a/drivers/target/target_core_tmr.c 
> b/drivers/target/target_core_tmr.c
> - * Copyright (c) 2009,2010 Linux-iSCSI.org
> diff --git a/drivers/target/target_core_tpg.c 
> b/drivers/target/target_core_tpg.c
> - * Copyright (c) 2002, 2003, 2004, 2005 PyX Technologies, Inc.
> - * Copyright (c) 2005, 2006, 2007 SBE, Inc.
> - * Copyright (c) 2008-2010 Linux-iSCSI.org
> diff --git a/drivers/target/target_core_transport.c 
> b/drivers/target/target_core_transport.c
> - * Copyright (c) 2002, 2003, 2004, 2005 PyX Technologies, Inc.
> - * Copyright (c) 2005, 2006, 2007 SBE, Inc.
> - * Copyright (c) 2008-2010 Linux-iSCSI.org
> diff --git a/drivers/target/target_core_ua.c b/drivers/target/target_core_ua.c
> - * Copyright (c) 2009,2010 Linux-iSCSI.org
--
To unsubscrib

[NET/IPv6] Race condition with flow_cache_genid?

2008-02-06 Thread Kyle Moffett

Whoops, I accidentally sent this to [EMAIL PROTECTED] instead of
[EMAIL PROTECTED]  Original email below:


Hi, I was poking around trying to figure out how to install the Mobile
IPv6 daemons this evening and noticed they required a kernel patch,
although upon further inspection the kernel patch seemed to already be
applied in 2.6.24.  Unfortunately the flow cache appears to be
horribly racy.  Attached below are the only uses of the variable
"flow_cache_genid" in 2.6.24.

Now, I am no expert in this particular area of the code, but the
"atomic_t flow_cache_genid" variable is ONLY ever used with
atomic_inc() and atomic_read().  There are no memory barriers or other
dec_and_test()-style functions, so that variable could just as easily
be replaced with a plain old C int.

Basically either there is some missing locking here or it does not
need to be "atomic_t".  Judging from the way it *appears* to be used
to check if cache entries are up-to-date with the latest changes in
policy, I would guess the former.

In particular that whole "flow_cache_lookup()" thing looks racy as
hell, since somebody could be in the middle of that looking at "if
(fle->genid == atomic_read(&flow_cache_genid))".  It does the
atomic_read(), which BTW is literally implemented as:
  #define atomic_read(atomicvar) ((atomicvar)->value)
on some platforms.  Immediately after the atomic read (or even before,
since there's no cache-flush or read-modify-write), somebody calls
into "selinux_xfrm_notify_policyload()" and increments the
flow_cache_genid becase selinux just loaded a security policy.  Now
we're accepting a cache entry which applies to PREVIOUS security
policy.  I can only assume that's really bad.

Even worse, there seems to be a race between SELinux loading a new
policy and calling selinux_xfrm_notify_policyload(), since we could
easily get packets and process them according to the old cache entry
on one CPU before SELinux has had a chance to update the generation ID
from the other.  Furthermore, there's no guarantee the CPU caches will
get updated in reasonable time.  Clearly SELinux needs to have some
way of atomically invalidating the flow cache of all CPUs
*simultaneously* with loading a new policy, which probably means they
both need to be under the same lock, or something.

The same problem appears to occur with updating the XFRM policy and
incrementing flow_cache_genid.  Probably the fastest solution is to
put the flow cache under the xfrm_policy_lock (which already disables
local bottom-halves), and either take that lock during SELinux policy
load or if there are lock ordering problems then add a variable
"flow_cache_ignore" and change the xfrm_notify hooks:

void selinux_xfrm_notify_policyload_pre(void)
{
write_lock_bh(&xfrm_policy_lock);
flow_cache_genid++;
flow_cache_ignore = 1;
write_unlock_bh(&xfrm_policy_lock);
}

void selinux_xfrm_notify_policyload_post(void)
{
write_lock_bh(&xfrm_policy_lock);
flow_cache_ignore = 0;
write_unlock_bh(&xfrm_policy_lock);
}

Cheers,
Kyle Moffett


BEGIN QUOTED CODE INVOLVING flow_cache_genid:

include/net/flow.h:94:
extern atomic_t flow_cache_genid;

net/core/flow.c:39:
atomic_t flow_cache_genid = ATOMIC_INIT(0);

net/core/flow.c:169:flow_cache_lookup():
if (flow_hash_rnd_recalc(cpu))
flow_new_hash_rnd(cpu);
hash = flow_hash_code(key, cpu);

head = &flow_table(cpu)[hash];
for (fle = *head; fle; fle = fle->next) {
if (fle->family == family &&
fle->dir == dir &&
flow_key_compare(key, &fle->key) == 0) {
if (fle->genid == atomic_read(&flow_cache_genid)) {
void *ret = fle->object;

if (ret)
atomic_inc(fle->object_ref);
local_bh_enable();

return ret;
}
break;
}
}

net/xfrm/xfrm_policy.c:1025:
int xfrm_policy_delete(struct xfrm_policy *pol, int dir)
{
write_lock_bh(&xfrm_policy_lock);
pol = __xfrm_policy_unlink(pol, dir);
write_unlock_bh(&xfrm_policy_lock);
if (pol) {
if (dir < XFRM_POLICY_MAX)
atomic_inc(&flow_cache_genid);
xfrm_policy_kill(pol);
return 0;
}
return -ENOENT;
}

net/ipv6/inet6_connection_sock.c:142:
static inline
void __inet6_csk_dst_store(struct sock *sk, struct dst_entry *dst,
struct in6_addr *daddr, struct in6_addr *saddr)
{
__ip6_dst_store(sk, dst, daddr, saddr);

#ifdef CONFIG_XFRM
{
struc

Re: Use of C99 int types

2005-04-04 Thread Kyle Moffett

On Apr 04, 2005, at 17:25, Richard B. Johnson wrote:
I don't find stdint.h in the kernel source (up to 2.6.11). Is this
going to be a new addition?
Uhh, no.  stdint.h is part of glibc, not the kernel.
It would be very helpful to start using the uint(8,16,32,64)_t types
because they are self-evident, a lot more than size_t or, my favorite
wchar_t.
You miss the point of size_t and ssize_t/ptrdiff_t.  They are types
guaranteed to be at least as big as the pointer size.  uint8/16/32/64,
on the other hand, are specific bit-sizes, which may not be as fast or
correct as a simple size_t.  Linus has pointed out that while it
doesn't matter which of __u32, u32, uint32_t, etc you use for kernel
private interfaces, you *cannot* use anything other than __u32 in the
parts of headers that userspace will see, because __u32 is defined
only by the kernel and so there is no risk for conflicts, as opposed
to uint32_t, which is also defined by libc, resulting in collisions
in naming.
Cheers,
Kyle Moffett
-BEGIN GEEK CODE BLOCK-
Version: 3.12
GCM/CS/IT/U d- s++: a18 C>$ UB/L/X/*(+)>$ P+++()>$
L(+++) E W++(+) N+++(++) o? K? w--- O? M++ V? PS+() PE+(-) Y+
PGP+++ t+(+++) 5 X R? tv-(--) b(++) DI+ D+ G e->$ h!*()>++$ r  
!y?(-)
--END GEEK CODE BLOCK--

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Use of C99 int types

2005-04-05 Thread Kyle Moffett

On Apr 05, 2005, at 05:23, Renate Meijer wrote:
uint8/16/32/64, on the other hand, are specific bit-sizes, which
may not be as fast or correct as a simple size_t.
Using specific widths may yield benefits on one platform, whilst
proving a real bottleneck when porting something to another. A
potential of problems easily avoided by using plain-vanilla
integers.
The point of specific-width integers is to preserve a specific
binary format, such as a filesystem on-disk data structure, or a
kernel-userspace ABI, etc.  If you just need a number, use a
different type.
Strictly speaking, a definition starting with a double
underscore is reserved for use by the compiler and associated
libs
Well, _strictly_speaking_, it's "implementation defined", where the
"implementation" includes the kernel (due to the syscall interface).
this such a declaration would invade implementation namespace.
The compilers implementation, that is.
But the C library is implicitly dependent on the kernel headers for
a wide variety of datatypes.
In this case, the boundary is a bit vague, i see that, since a lot
of header definitions also reside in the /usr/include hierarchy.
Some of which are produced by kernel sources: /usr/include/linux,
/usr/include/asm, etc.
I think it would be usefull to at least *agree* on a standard type
for 8/16/32/64-bit integer types. What I see now as a result of
grepping for 'uint32' is a lot more confusing than stdint.h
Well, Linus has supported that there is no standard, except where
ABI is concerned, there we must use __u32 so that it does not clash
with libc or user programs.
Especially the types with leading underscores look cool, but in
reality may cause a conflict with compiler internals and should only
be used when defining compiler libraries.
It's "implementation" (kernel+libc+gcc) defined.  It just means that
gcc, the kernel, and libc have to be much more careful not to tread
on each others toes.
The '__' have explicitly been put in by ISO in order to avoid
conflicts between user-code and the standard libraries,
The "standard libraries" includes the syscall interface here.  If
the kernel types could not be prefixed with __, then what _should_
we prefix them with?
Furthermore, I think it's wise to convince the community that if
not needed, integers should not be specified by any specific width.
That doesn't work for an ABI.  If you switch compilers (or from 32-bit
to 64-bit like from x86 to x86-64, you _must_ be able to specify
certain widths for all the ABI numbers to preserve compatibility.
Cheers,
Kyle Moffett
-BEGIN GEEK CODE BLOCK-
Version: 3.12
GCM/CS/IT/U d- s++: a18 C>$ UB/L/X/*(+)>$ P+++()>$
L(+++) E W++(+) N+++(++) o? K? w--- O? M++ V? PS+() PE+(-) Y+
PGP+++ t+(+++) 5 X R? tv-(--) b(++) DI+ D+ G e->$ h!*()>++$ r  
!y?(-)
--END GEEK CODE BLOCK--

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Use of C99 int types

2005-04-05 Thread Kyle Moffett

On Apr 05, 2005, at 08:18, Richard B. Johnson wrote:
One cannot just use 'int' or 'long', in particular when interfacing
with an operating system. For example, look at the socket interface
code. Parameters are put into an array of longs and a pointer to
this array is passed to the socket interface. It's a mess when
converting this code to 64-bit world.
Exactly
If originally one used a structure of the correct POSIX integer
types, and a pointer to the structure was passed, then absolutely
nothing in the source-code would have to be changed at all when
compiling that interface for a 64-bit machine.
But you _can't_ use the POSIX integer types.  When compiling the
kernel, if you use the types, you must define them in the kernel
headers.  On the other hand, when compiling userspace stuff, you
_can't_ have them defined in the kernel headers because libc also
defines them.  The solution is to use __{s,u}{8,16,32,64}, which
are _only_ defined by the kernel, not by libc or gcc, and can be
therefore used in the ABI.
The continual short-cuts, with the continual "special-case"
hacks is what makes porting difficult. That's what the POSIX
types was supposed to help prevent.
Except the POSIX types themselves are not usable for the boundary
code for the reasons of double definition.  Google for Linus'
posts on this topic a couple months ago.
That's why I think if there was a stdint.h file in the kernel,
when people were performing maintenance or porting their code,
they could start using those types.
The types _are_ available from the kernel headers, but only when
compiling with __KERNEL__, to avoid conflicts from the libc
definitions.
Cheers,
Kyle Moffett
-BEGIN GEEK CODE BLOCK-
Version: 3.12
GCM/CS/IT/U d- s++: a18 C>$ UB/L/X/*(+)>$ P+++()>$
L(+++) E W++(+) N+++(++) o? K? w--- O? M++ V? PS+() PE+(-) Y+
PGP+++ t+(+++) 5 X R? tv-(--) b(++) DI+ D+ G e->$ h!*()>++$ r  
!y?(-)
--END GEEK CODE BLOCK--

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Use of C99 int types

2005-04-05 Thread Kyle Moffett

h with my user-defined types.
Anything you like. 'kernel_' or simply 'k_' would be appropriate.
As long as you do not invade compiler namespace. It is separated
and uglyfied for a purpose.
But the _entire_ non _ namespace is reserved for anything user
programs want to do with it.  I think most of the kernel types in
the current headers use __kernel_, which is safe enough.
Does not work when you are touching externally defined interfaces
in general, including that of a CPU.  There are places for uint32_t
and friends and even for __uint32_t and it's kin, but abusing them
will cause trouble in a world that is accommodating more than one
register-size. This is all I am saying.
But in a world with more than one register size, you _must_ use them,
for example, the x86-64 code uses them to handle 32-bit backwards
compatibility, and the ppc64 code does likewise.  When a program
compiled as ppc32 gets run on my ppc64 box, the kernel understands
that anything pushed onto the stack as arguments is 32-bit, and must
use specifically sized types to handle that properly.
Cheers,
Kyle Moffett
-BEGIN GEEK CODE BLOCK-
Version: 3.12
GCM/CS/IT/U d- s++: a18 C>$ UB/L/X/*(+)>$ P+++()>$
L(+++) E W++(+) N+++(++) o? K? w--- O? M++ V? PS+() PE+(-) Y+
PGP+++ t+(+++) 5 X R? tv-(--) b(++) DI+ D+ G e->$ h!*()>++$ r  
!y?(-)
--END GEEK CODE BLOCK--

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Use of C99 int types

2005-04-06 Thread Kyle Moffett

On Apr 06, 2005, at 07:41, Renate Meijer wrote:
On Apr 6, 2005, at 12:11 AM, Kyle Moffett wrote:
Please don't remove Linux-Kernel from the CC, I think this is an
important discussion.
GAAH!!! Read my lips!!! Quit removing Linux-Kernel from the CC!!!
As I see it, there are a number of issues
- Use of double underscores invades compiler namespace (except in 
those cases
  where kernel definitions end up as the basis for definitions in 
/usr/include/*, i.e.
  those that actually are part of the C-implementation for Linux.
It is these that I'm talking about.  This is exactly my point (The 
cases where
the kernel definitions are part of /usr/include).

- Some type that does not conflict with compiler namespace to replace 
the variety
   of definitions for e.g. 32-bit unsigned integers we have now.
As I said, I don't care about this, so do whatever you want.
- Removal of anything prefixed with a double underscore from 
non-C-implementation
  files.
ATM, much of the stuff in include/linux and include/asm-* is considered
"C-implementation" because it is used from userspace.  If you want to 
clean
that up and start moving abi files to include/kernel-abi or somesuch, 
feel
free, but that's a lot of work

Personally, I don't care what you feel like requiring for purely
in-kernel interfaces, but __{s,u}{8,16,32,64} must stay to avoid
namespace collisions with glibc in the kernel include files as used
by userspace.
Aye, but as I have pointed out several times, these types should be 
restricted
to those files and *only* those files which eventually end up in the 
compilers
includes. In every other place, they invite exactly the trouble they 
are intended
to avoid.
Precisely.
So if you want to make the millions of patches, go right ahead, be my 
guest. :-P
Until somebody steps forward to clean up the huge mess, nothing will 
get done.

So in every place exept those files which may actually cause a 
namespace conflict or
a bug because some newer version does not support __foobar, or changed 
the
semantics. Since using any __foobar type implies relying on the 
compiler internals,
which may change without prior notice, it is ipso facto undesirable.
Except the kernel wants to be optimized and work and use what features 
are available.
The kernel uses __foobar stuff provided by the compiler because it has 
gccX.h files
specifically designed to take compiler interfaces, provide backups when 
they don't
exist, and use them (and their better checking) when they do.

This is kinda arguing semantics, but:
A particular set of software (linux+libc+gcc), running in a particular
translation environment (userspace) under particular control options
(Signals, nice values, etc), that performs translation of programs for
(emulating missing instructions), and supports execution of functions
(syscalls) in, a particular execution environment (also userspace).
Ok. And where exactly are linux and libc when compiling code for an
Atmel ATmega32 (40 pin DIL) using gcc?
Where do you get Atmel ATmega32 from?  I _only_ care about what symbols
Linux can use, and as I've mentioned, when running under *Linux*, then 
it just so
happens that *Linux* is part of my implementation, therefore the 
*Linux* sources,
which by definition aren't used elsewhere, can assume they are part of 
said
implementation.

The 'set of software' does
*not* include any OS. Not Windows, not Linux, not MacOSX, since the
whole thing might be directed at a lowly microcontroller, which DOES
NOT HAVE ANY OPERATING SYSTEM WHATSOEVER.
Nevertheless, gcc works fine.
This is unrelated and off topic.  Heck, you've even consented above that
Linux can use
Without the kernel userspace wouldn't have anything, because anything
syscall-related (which is basically everything) involves the kernel.
Sure. The same goes for every other program. However, it would be 
pretty
stoopid to say the kernel is an integral part of (say) the Gimp . More 
so, since
the Gimp and GCC run on completely different architectures aswell.

By the same token, linux is part of XFree86 despite the fact XFree86 
does not
require linux to run.
But an XFree86 binary compiled on FreeBSD, or a GIMP binary compiled on 
FreeBSD,
for the most part, will not run on Linux, because the compiler uses the 
_Linux_
environment to build the binary, including the _Linux_ headers and 
such.  The
built binary is nearly useless without Linux, but not vice-versa, hence 
even
though the binary is not a derivative work of linux, it requires it to 
run.

Heck, the kernel and its ABI is _more_ a part of the implementation
than glibc is!  I can write an assembly program that doesn't link to
or use libc, but without using syscalls I can do nothing whatsoever.
I can write entire applications using gcc without even thinking of 
using
any 'syscall' or any other part of linux/bsd/whatever. Still... it's 
gcc.
Uhh, what exactly is your application going to do?  So

Re: RFC: turn kmalloc+memset(,0,) into kcalloc

2005-04-07 Thread Kyle Moffett

On Apr 06, 2005, at 11:50, Paulo Marques wrote:
kzalloc it is, then.
[...]
So we gain 8kB on the uncompressed image and 1347 bytes on the 
compressed one. This was just a dumb test and actual results might be 
better due to smarter human cleanups.

Not a spectacular gain per se, but the increase in code readability is 
still worth it, IMHO.
Perhaps this could eventually be modified to draw from a prezeroed 
block of
memory, similar to the current code for doing the same thing for 
userspace.
It probably wouldn't give much performance gain, especially since it's 
not
used for large blocks or large numbers of small objects (As you would 
use a
slabcache for those), but it might help a bit.  Of course, the code 
would
need to fall back quickly if such an allocation would be messy or 
expensive
for any reason.

Cheers,
Kyle Moffett
-BEGIN GEEK CODE BLOCK-
Version: 3.12
GCM/CS/IT/U d- s++: a18 C>$ UB/L/X/*(+)>$ P+++()>$
L(+++) E W++(+) N+++(++) o? K? w--- O? M++ V? PS+() PE+(-) Y+
PGP+++ t+(+++) 5 X R? tv-(--) b(++) DI+ D+ G e->$ h!*()>++$ r  
!y?(-)
--END GEEK CODE BLOCK--

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[Long OT] Re: non-free firmware in kernel modules, aggregation and unclear copyright notice.

2005-04-13 Thread Kyle Moffett

This thread should probably get moved off-list soon, it's like
beating the dead horse long after its flesh has decayed and its
bones disintegrated to dust.
On Apr 13, 2005, at 21:54, David Schwartz wrote:
On Tue, Apr 12, 2005 at 12:05:59PM -0700, David Schwartz wrote:
Yes, the GPL can give you rights you wouldn't otherwise have. A
EULA can take away rights you would otherwise have.

What compels you to agree with an EULA?
If you do not agree with the EULA, you cannot and do not acquire lawful
possession of the work.
Of course, one could always assert the following:
  1) I went to a store
  2) I found a box
  3) I went to the cash register
  4) I gave money to the cashier for the box
  5) I took the box home
  6) I opened the box and took out the contents
Now, to the end user, the above is the same procedure for purchasing a
box of cereal or a piece of software, therefore the restrictions are the
same.  I'm not allowed to distribute the copyrightable materials, which
for a cereal box is the images on the box, and for a CD is the digital
data stored therein.  Other than that, I can take a hammer and smash my
CD/cereal, I can make a dozen copies of the CD/box-art and mount them
on the wall or burn them, both of which are symbolic speech.  I can make
backup copies of my cereal box-art/CD too.
At what point of the above did I agree to any license?  As far as I
know, a license (IE: contract) is not valid for a product unless made at
the point-of-sale, before exchanging money.  This is especially valid
since almost all computer retailers refuse refunds for opened software.
When you have to open the box to see the license, that's bad, but when,
as I've seen far too many times, you have to break the seal and insert
the CD to even _see_ the license, it cannot be valid.
The only real point of most of the EULAs is to protect the owners
copyright, which is implicitly protected in any case.
Cheers,
Kyle Moffett
-BEGIN GEEK CODE BLOCK-
Version: 3.12
GCM/CS/IT/U d- s++: a18 C>$ UB/L/X/*(+)>$ P+++()>$
L(+++) E W++(+) N+++(++) o? K? w--- O? M++ V? PS+() PE+(-) Y+
PGP+++ t+(+++) 5 X R? tv-(--) b(++) DI+ D+ G e->$ h!*()>++$ r  
!y?(-)
--END GEEK CODE BLOCK--

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

SCSI opcode 0x80 and 3ware Escalade 7000 ATA RAID

2005-04-15 Thread Kyle Moffett

 "0\n", 2)  = 2
write(1, "201 Soft_Read_Error_Rate", 28) = 28
write(1, "0x000a   253   251   000Old_"..., 59) = 59
write(1, "5\n", 2)  = 2
write(1, "202 TA_Increase_Count   ", 28) = 28
write(1, "0x000a   253   252   000Old_"..., 59) = 59
write(1, "0\n", 2)  = 2
write(1, "203 Run_Out_Cancel  ", 28) = 28
write(1, "0x000b   253   252   180Pre-"..., 59) = 59
write(1, "0\n", 2)  = 2
write(1, "204 Shock_Count_Write_Opern ", 28) = 28
write(1, "0x000a   253   252   000Old_"..., 59) = 59
write(1, "0\n", 2)  = 2
write(1, "205 Shock_Rate_Write_Opern  ", 28) = 28
write(1, "0x000a   253   252   000Old_"..., 59) = 59
write(1, "0\n", 2)  = 2
write(1, "207 Spin_High_Current   ", 28) = 28
write(1, "0x002a   252   252   000Old_"..., 59) = 59
write(1, "0\n", 2)  = 2
write(1, "208 Spin_Buzz   ", 28) = 28
write(1, "0x002a   252   252   000Old_"..., 59) = 59
write(1, "0\n", 2)  = 2
write(1, "209 Offline_Seek_Performnce ", 28) = 28
write(1, "0x0024   196   191   000Old_"..., 59) = 59
write(1, "0\n", 2)  = 2
write(1, " 99 Unknown_Attribute   ", 28) = 28
write(1, "0x0004   253   253   000Old_"..., 59) = 59
write(1, "0\n", 2)  = 2
write(1, "100 Unknown_Attribute   ", 28) = 28
write(1, "0x0004   253   253   000Old_"..., 59) = 59
write(1, "0\n", 2)  = 2
write(1, "101 Unknown_Attribute   ", 28) = 28
write(1, "0x0004   253   253   000Old_"..., 59) = 59
write(1, "0\n", 2)  = 2
write(1, "\n", 1)   = 1
ioctl(3, FIBMAP, 0xbfffe290)= 0
write(1, "SMART Error Log Version: 1\n", 27) = 27
write(1, "ATA Error Count: 1\n", 19)= 19
write(1, "\tCR = Command Register [HEX]\n\tFR"..., 490) = 490
write(1, "Error 1 occurred at disk power-o"..., 77) = 77
write(1, "  When the command that caused t"..., 88) = 88
write(1, "  After command completion occur"..., 121) = 121
write(1, "  Error: UNC 3 sectors at LBA = "..., 51) = 51
write(1, "\n\n", 2) = 2
write(1, "  Commands leading to the comman"..., 194) = 194
write(1, "  25 00 08 bb bc 0c e0 00  23d+0"..., 58) = 58
write(1, "  25 00 10 83 bc 0c e0 00  23d+0"..., 58) = 58
write(1, "  25 00 08 3b bc 0c e0 00  23d+0"..., 58) = 58
write(1, "  25 00 08 33 bc 0c e0 00  23d+0"..., 58) = 58
write(1, "  25 00 08 13 bc 0c e0 00  23d+0"..., 58) = 58
write(1, "\n", 1)   = 1
ioctl(3, FIBMAP, 0xbfffe2a0)= 0
write(1, "SMART Self-test log structure re"..., 48) = 48
write(1, "Num  Test_DescriptionStatus "..., 96) = 96
write(1, "# 1  Short offline   Complet"..., 79) = 79
write(1, "# 2  Short offline   Complet"..., 79) = 79
write(1, "# 3  Short offline   Complet"..., 79) = 79
write(1, "# 4  Short offline   Complet"..., 79) = 79
write(1, "# 5  Short offline   Complet"..., 79) = 79
write(1, "# 6  Extended offlineComplet"..., 79) = 79
write(1, "# 7  Short offline   Complet"..., 79) = 79
write(1, "# 8  Short offline   Complet"..., 79) = 79
write(1, "# 9  Short offline   Complet"..., 79) = 79
write(1, "#10  Short offline   Complet"..., 79) = 79
write(1, "#11  Short offline   Complet"..., 79) = 79
write(1, "#12  Short offline   Complet"..., 79) = 79
write(1, "#13  Extended offlineComplet"..., 79) = 79
write(1, "#14  Short offline   Complet"..., 79) = 79
write(1, "#15  Short offline   Complet"..., 79) = 79
write(1, "#16  Short offline   Complet"..., 79) = 79
write(1, "#17  Short offline   Complet"..., 79) = 79
write(1, "#18  Short offline   Complet"..., 79) = 79
write(1, "#19  Short offline   Complet"..., 79) = 79
write(1, "#20  Extended offlineComplet"..., 79) = 79
write(1, "#21  Short offline   Complet"..., 79) = 79
write(1, "\n", 1)   = 1
ioctl(3, FIBMAP, 0xbfffe290)= 0
write(1, "SMART Selective self-test log da"..., 63) = 63
write(1, " SPAN  MIN_LBA  MAX_LBA  CURRENT"..., 45) = 45
write(1, "100  Not_tes"..., 37) = 37
write(1, "200  Not_tes"..., 37) = 37
write(1, "300  Not_tes"..., 37) = 37
write(1, "400  Not_tes"..., 37) = 37
write(1, "500  Not_tes"..., 37) = 37
write(1, "Selective self-test flags (0x0):"..., 33) = 33
write(1, "  After scanning selected spans,"..., 69) = 69
write(1, "If Selective self-test is pendin"..., 76) = 76
write(1, "\n", 1)   = 1
munmap(0x40018000, 4096)= 0
exit_group(64)  = ?

Cheers,
Kyle Moffett
-BEGIN GEEK CODE BLOCK-
Version: 3.12
GCM/CS/IT/U d- s++: a18 C>$ UB/L/X/*(+)>$ P+++()>$
L(+++) E W++(+) N+++(++) o? K? w--- O? M++ V? PS+() PE+(-) Y+
PGP+++ t+(+++) 5 X R? tv-(--) b(++) DI+ D+ G e->$ h!*()>++$ r  
!y?(-)
--END GEEK CODE BLOCK--

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: SCSI opcode 0x80 and 3ware Escalade 7000 ATA RAID

2005-04-15 Thread Kyle Moffett

On Apr 15, 2005, at 18:50, adam radford wrote:
Make sure you are are using the 3ware character ioctl interface at
/dev/twe0 (dynamic major, controller number minor) for your
smartmontools, not /dev/sda.
Hmm, I don't have any /dev/twe* here.  I _do_ have hotplug, udev, etc,
installed, and this is a 2.6 machine, so I'm not sure what could be 
wrong.
How recent was this change?

The old interface from smartmontools used SCSI_IOCTL_SEND_COMMAND
ioctls with a special passthru opcode of 0x80 that would get passed
to the driver.  This interface is deprecated in the driver and the
kernel.
Ok.  Now if only I could find it.  Is there anyplace in sysfs that I
can check manually to see what the dynamic major is?  I'd like to
try creating the device by hand if I can't get Debian hotplug to see
it.
Cheers,
Kyle Moffett
-BEGIN GEEK CODE BLOCK-
Version: 3.12
GCM/CS/IT/U d- s++: a18 C>$ UB/L/X/*(+)>$ P+++()>$
L(+++) E W++(+) N+++(++) o? K? w--- O? M++ V? PS+() PE+(-) Y+
PGP+++ t+(+++) 5 X R? tv-(--) b(++) DI+ D+ G e->$ h!*()>++$ r  
!y?(-)
--END GEEK CODE BLOCK--

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: kernel guide to space

2005-07-19 Thread Kyle Moffett


On Jul 13, 2005, at 21:12:08, [EMAIL PROTECTED] wrote:

I don't think there's a strict 80 column rule anymore.  It's 2005...



Think again.  There are a lot of people who use 80 column windows so
that we can see two code windows side-by-side.


Agreed.  If you're having trouble with width, it's a sign that the  
code

needs to be refactored.

Also, my personal rule is if that a source file exceeds 1000 lines,  
start
looking for a way to split it.  It can go longer (indeed, there is  
little

reason to split the fs/nls/nls_cp9??.c files), but
(I will refrain from discussing drivers/scsi/advansys.c)


A simple set of code refactoring rules that I try to abide by:

1)  If a function is more than a few 25 or 40 line screens, it's likely
too big (unless a big switch statement or a list of initialization calls
or something).  If necessary, use static inline functions to factor out
repetitive behavior.

2)  If a file is more than 30-40 functions, it's likely too big, and you
should try to split it.  It's _ok_ to have 4 source files implementing
code for manipulating a single struct.

3)  If a normal line of code is more than 80 characters, one of the
following is probably true: you need to break the line up and use temps
for clarity, or your function is so big that you're tabbing over too
far.

Cheers,
Kyle Moffett

-BEGIN GEEK CODE BLOCK-
Version: 3.12
GCM/CS/IT/U d- s++: a18 C>$ UB/L/X/*(+)>$ P+++()>$ L(+ 
++) E
W++(+) N+++(++) o? K? w--- O? M++ V? PS+() PE+(-) Y+ PGP+++ t+(+++) 5  
X R?

tv-(--) b(++) DI+ D+ G e->$ h!*()>++$ r  !y?(-)
--END GEEK CODE BLOCK--


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: kernel guide to space

2005-07-21 Thread Kyle Moffett


On Jul 20, 2005, at 20:45:21, Paul Jackson wrote:

drivers/scsi/BusLogic.c:

  %2d %5d %5d %5d%5d %5d %5d   %5d %5d %5d\n",  
TargetID, TargetStatistics[TargetID].CommandAbortsRequested,  
TargetStatistics[TargetID].CommandAbortsAttempted, TargetStatistics 
[TargetID].CommandAbortsCompleted, TargetStatistics 
[TargetID].BusDeviceResetsRequested, TargetStatistics 
[TargetID].BusDeviceResetsAttempted, TargetStatistics 
[TargetID].BusDeviceResetsCompleted, TargetStatistics 
[TargetID].HostAdapterResetsRequested, TargetStatistics 
[TargetID].HostAdapterResetsAttempted, TargetStatistics 
[TargetID].HostAdapterResetsCompleted);


Ugh!!!  From CodingStyle (although this is not always followed):
The limit on the length of lines is 80 columns and this is a hard  
limit.

Statements longer than 80 columns will be broken into sensible chunks.
Descendants are always substantially shorter than the parent and  
are placed
substantially to the right.  The same applies to function headers  
with a long

argument list.  Long strings are as well broken into shorter strings.
[example relevant to the above code snipped]



Also:
C is a Spartan language, and so should your naming be.  Unlike  
Modula-2 and

Pascal programmers, C programmers do not use cute names like
ThisVariableIsATemporaryCounter.  A C programmer would call that  
variable
"tmp", which is much easier to write, and not the least more  
difficult to

understand.

[...] mixed-case names are frowned upon [...]


*cough* TargetStatistics[TargetID].HostAdapterResetsCompleted *cough*

I suspect linus would be willing to accept a few cleanup patches for the
BusLogic.c file.  Perhaps even one that renames BusLogic.c to buslogic.c
like all the other files in the source tree, instead of using nasty
StudlyCaps all over :-D

Cheers,
Kyle Moffett

-BEGIN GEEK CODE BLOCK-
Version: 3.12
GCM/CS/IT/U d- s++: a18 C>$ UB/L/X/*(+)>$ P+++()>$ L(+ 
++) E
W++(+) N+++(++) o? K? w--- O? M++ V? PS+() PE+(-) Y+ PGP+++ t+(+++) 5  
X R?

tv-(--) b(++) DI+ D+ G e->$ h!*()>++$ r  !y?(-)
--END GEEK CODE BLOCK--


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Regression: radeonfb: No synchronisation on CRT with linux-2.6.13-rc5

2005-08-07 Thread Kyle Moffett


On Aug 7, 2005, at 03:51:07, Benjamin Herrenschmidt wrote:

On Fri, 2005-08-05 at 19:38 +0200, Bodo Eggert wrote:

On Fri, 5 Aug 2005, Benjamin Herrenschmidt wrote:


On Fri, 2005-08-05 at 00:03 +0200, Bodo Eggert wrote:
My CRT is out of sync after radeonfb from 2.6.13-rc5 is  
initialized.

2.6.12 does not show this behaviour.


I'm out of town at the moment, could you maybe diff radeonfb between
working & non-working and CC me the diff ? I don't have my work  
stuff at

hand not my kernel images so...


There were no changes in radeonfb.c, but I could trace to to
CONFIG_PREEMPT. With _NONE, it works as expected.


Ah ! Interesting... I don't see why PREEMPT would affect radeonfb
though ... Can you try something like wrapper radeon_write_mode() with
preempt_disable()/preempt_enable() and tell me if it makes a
difference ?


I'm having a similar issue with my shiny new 17" Powerbook G4.  The
radeon chip works fine with framebuffer in 2.6.12.4 _with_ PREEMPT,
but not in 2.6.13-rc5 _with_ PREEMPT (configs are virtually identical).
I'll try your idea this afternoon when I get the chance.

I wonder if perhaps some code in radeonfb is used under the BKL, which
is now preemptable (Or maybe an ordinary spinlock changed or went
away?), because I also set PREEMPT_BKL.  I've got an LCD, and on mine
it looks like every third pixel-line gets shifted about 32-64 pixels to
the left, and they move with display refresh.  My guess is that
something is interrupting radeonfb during a critical time in display
syncing and forcing the video card to wait too far into the next line
before sending pixels.

One other data point, I've seen something like this, except not nearly
as bad, is stock debian 2.6.8 vs. stock debian 2.6.11 on powerpc.  The
former exhibits some similar (but not nearly as bad) symptoms.  (Same
Powerbook), whereas 2.6.11 doesn't.  In that case, neither has PREEMPT.
I'll run more tests this afternoon/evening, to try to track it down.

Cheers,
Kyle Moffett

--
There are two ways of constructing a software design. One way is to  
make it so
simple that there are obviously no deficiencies. And the other way is  
to make
it so complicated that there are no obvious deficiencies.  The first  
method is

far more difficult.
  -- C.A.R. Hoare


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Regression: radeonfb: No synchronisation on CRT with linux-2.6.13-rc5

2005-08-07 Thread Kyle Moffett


On Aug 7, 2005, at 12:13:38, Benjamin Herrenschmidt wrote:

I've got an LCD, and on mine
it looks like every third pixel-line gets shifted about 32-64  
pixels to

the left, and they move with display refresh.  My guess is that
something is interrupting radeonfb during a critical time in display
syncing and forcing the video card to wait too far into the next line
before sending pixels.


radeonfb is mostly inactive after it has setup the framebuffer and
unless you actually draw something, in which case, accel code is  
called.


_However_ there is an unrelated problem with some panels, including  
some

of the 17": The panel doesn't always "sync" properly. This seem to be
related to some subtle timing issue in the LVDS code but I don't know
exactly what yet. You can usually get it back by repeately turning the
backlight all the way down (which shuts the panel off) and back up  
until

it "catches".


Hmm.  This doesn't really fit as my issues are very reproducible.  The
behaviour under stock Debian 2.6.8 is identical during reboots and after
fblevel 0 ; sleep X ; fblevel 15.  Likewise, stock 2.6.11, 2.6.12.4, and
2.6.13-rc5, although I'm just getting back to testing things.

Cheers,
Kyle Moffett

--
There are two ways of constructing a software design. One way is to  
make it so
simple that there are obviously no deficiencies. And the other way is  
to make
it so complicated that there are no obvious deficiencies.  The first  
method is

far more difficult.
  -- C.A.R. Hoare


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] i386 No-Idle-Hz aka Dynamic-Ticks 5

2005-08-07 Thread Kyle Moffett


On Aug 7, 2005, at 19:51:25, Con Kolivas wrote:

On Mon, 8 Aug 2005 02:58, Srivatsa Vaddagiri wrote:

Con,
I am afraid until SMP correctness is resolved, then this is not
in a position to go in (unless you want to enable it only for UP,  
which

I think should not be our target). I am working on making this work
correctly on SMP systems. Hopefully I will post a patch soon.


Great! I wasn't sure what time frame you meant when you last  
posted. I won't
do anything more, leaving this patch as it is, and pass the baton  
to you.


I'm curious what has happened to the PPC side of the patch.  IIRC,  
someone

was working on such a port, but it seems to have been lost along the way
at some point.  Is there any additional information on that patch?

Cheers,
Kyle Moffett

--
Unix was not designed to stop people from doing stupid things,  
because that

would also stop them from doing clever things.
  -- Doug Gwyn


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Regression: radeonfb: No synchronisation on CRT with linux-2.6.13-rc5

2005-08-07 Thread Kyle Moffett


On Aug 7, 2005, at 21:13:54, Kyle Moffett wrote:

On Aug 7, 2005, at 12:13:38, Benjamin Herrenschmidt wrote:
_However_ there is an unrelated problem with some panels,  
including some

of the 17": The panel doesn't always "sync" properly. This seem to be
related to some subtle timing issue in the LVDS code but I don't know
exactly what yet. You can usually get it back by repeately turning  
the
backlight all the way down (which shuts the panel off) and back up  
until

it "catches".


Hmm.  This doesn't really fit as my issues are very reproducible.  The
behaviour under stock Debian 2.6.8 is identical during reboots and  
after
fblevel 0 ; sleep X ; fblevel 15.  Likewise, stock 2.6.11,  
2.6.12.4, and

2.6.13-rc5, although I'm just getting back to testing things.


Damn.  As soon as I say this, I go back and am completely unable to make
2.6.13-rc5 reproduce the issue.  *grumble* black magic *grumble* :-D.

Cheers,
Kyle Moffett

--
There are two ways of constructing a software design. One way is to  
make it so
simple that there are obviously no deficiencies. And the other way is  
to make
it so complicated that there are no obvious deficiencies.  The first  
method is

far more difficult.
  -- C.A.R. Hoare


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Wireless support

2005-08-09 Thread Kyle Moffett


On Aug 9, 2005, at 05:09:55, Jochen Friedrich wrote:
Third, both ndiswrapper and binary-only drivers only work on one  
platform.


E.g. broadcom has a binary-only driver for their WLAN card on  
Linux, but

only for mipsel (wrt54g).

On Alpha or PowerPC, most WLAN equipment doesn't work under Linux,  
at all.


Definitely.  I want my Airport Extreme to work!  Many users of the  
BCM4301
chip can get it to work (kinda) with Linux via ndiswrapper, but that  
means
they are much less likely to participate in any kind of reverse  
engineering

effort, even if it's just testing a new driver.

Cheers,
Kyle Moffett

-BEGIN GEEK CODE BLOCK-
Version: 3.12
GCM/CS/IT/U d- s++: a18 C>$ UB/L/X/*(+)>$ P+++()>$ L(+ 
++) E
W++(+) N+++(++) o? K? w--- O? M++ V? PS+() PE+(-) Y+ PGP+++ t+(+++) 5  
X R?

tv-(--) b(++) DI+ D+ G e->$ h!*()>++$ r  !y?(-)
--END GEEK CODE BLOCK--


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: understanding Linux capabilities brokenness

2005-08-09 Thread Kyle Moffett


On Aug 9, 2005, at 11:16:33, Christopher Warner wrote:

In my observer pragmatic view; yes. On many occasion, i've come to CAP
calls only to be frustrated with the sheer disconnect of it all. It
simply doesn't work. If it means having to break posix conformance  
for a

working implementation. Then so be it.

On Tue, 2005-08-09 at 00:46 -0400, James Morris wrote:


Let me play the Devil's advocate here.

Should we be thinking about deprecating and removing capabilities  
from

Linux?


One brief suggestion:

A key/token interface was recently introduced that might be useful to  
allow
a simple new inheritance model for "capabilities", "roles",  
"rootperms" or

whatever other abstraction you create.

Cheers,
Kyle Moffett

--
There are two ways of constructing a software design. One way is to  
make it so
simple that there are obviously no deficiencies. And the other way is  
to make
it so complicated that there are no obvious deficiencies.  The first  
method is

far more difficult.
  -- C.A.R. Hoare


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC/PATCH] Add pci_walk_bus function to PCI core

2005-08-09 Thread Kyle Moffett


On Aug 10, 2005, at 02:10:49, Arjan van de Ven wrote:

On Wed, 2005-08-10 at 11:36 +1000, Paul Mackerras wrote:


Greg,

Any comments on this patch?  Would you be amenable to it going in  
post

2.6.13?

The PCI error recovery infrastructure needs to be able to contact all
the drivers affected by a PCI error event, which may mean traversing
all the devices under a given PCI-PCI bridge.  This patch adds a
function to the PCI core that traverses all the PCI devices on a PCI
bus and under any PCI-PCI bridges on that bus (recursively),  
calling a

given function for each device.


is there a way to avoid the recursion somehow? Recursion is "not fun"
stack usage wise, esp if you have really deep hierarchies


Hmm, it looks like PCI error recovery wants breadth-first recursion, so
you should be able to do some sort of tail-recursion or something.  If
only one error-recovery action on a given subtree can be going at a  
time,

you should be able to add an "error_recovery" linked-list to the device
structure and do something like this:

void recover(...) {
struct list_head recovery_list = LIST_HEAD_INIT(recovery_list);
list_add(&dev->error_recovery, &recovery_list);

while(!list_empty(&recovery_list)) {
struct some_device_type *dev =
list_entry(recovery_list->next, struct some_device_type,  
error_recovery);


dev->some_recovery_function(dev, [...]);

list_del(&dev->error_recovery);
}
}

Then each PCI-PCI bridge's some_recovery_function could do this:

void some_recovery_function(struct some_device_type *dev, [...]) {
struct some_device_type *child;

actually_do_my_recovery();

list_for_each_entry(child, dev->some_pci_subdev_list,  
some_pci_list) {

if (needs_recovery(child))
list_add_tail(&child->error_recovery,&dev->error_recovery);
}
}

With such an arrangement, the callstack is as shallow as possible:

recover
some_recovery_function
actually_do_my_recovery
needs_recovery
childs_recovery_function
[...]

If you can have multiple simultaneous error-recovery actions per  
subtree,

that wouldn't properly work unless they were exclusive-blocking, IE:
an error recovery action triggers an error on a subtree which must
recover itself.  In that case, with some extra state saved in the  
recover

function and passed to the "some_recovery_function", you could allow the
other recovery to continue before resuming.

If you can have two CPUs recovering the same device tree, I'd be  
inclined

to wonder what kind of strange errors you're causing on the PCI bus :-D,
and I'd be interested in an example of how that could work in any  
sane way.


Cheers,
Kyle Moffett

--
Premature optimization is the root of all evil in programming
  -- C.A.R. Hoare



-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [Linux-cluster] Re: [PATCH 00/14] GFS

2005-08-10 Thread Kyle Moffett


On Aug 10, 2005, at 09:26:26, AJ Lewis wrote:

On Wed, Aug 10, 2005 at 12:11:10PM +0100, Christoph Hellwig wrote:


On Wed, Aug 10, 2005 at 01:09:17PM +0200, Lars Marowsky-Bree wrote:

So for every directory hierarchy on a shared filesystem, each  
user needs
to have the complete list of bindmounts needed, and automatically  
resync
that across all nodes when a new one is added or removed? And  
then have

that executed by root, because a regular user can't?


Do it in an initscripts and let users simply not do it, they  
shouldn't

even know what kind of filesystem they are on.


I'm just thinking of a 100-node cluster that has different mounts  
on different
nodes, and trying to update the bind mounts in a sane and efficient  
manner

without clobbering the various mount setups.  Ouch.


How about something like the following:
cpslink()  => Create a Context Dependent Symlink
readcpslink()  => Return the Context Dependent path data
readlink() => Return the path of the Context Dependent  
Symlink as it
  would be evaluated in the current context,  
basically as a

  normal symlink.
lstat()=> Return information on the Context Dependent  
Symlink in

  the same format as a regular symlink.
unlink()   => Delete the Context Dependent Symlink.

You would need an extra userspace tool that understands cpslink/ 
readcpslink to
create and get information on the links for now, but ls and ln could  
eventually
be updated, and until then the would provide sane behavior.  Perhaps  
this

should be extended into a new API for some of the strange things several
filesystems want to do in the VFS:
extlink()  => Create an extended filesystem link (with type  
specified)

readextlink()  => Return the path (and type) for the link

The filesystem could define how each type of link acts with respect  
to other
syscalls.  OpenAFS could use extlink() instead of their symlink magic  
for
adjusting the AFS volume hierarchy.  The new in-kernel AFS client  
could use it
in similar fashion (It has no method to adjust hierarchy, because  
it's still
read-only).  GFS could use it for their Context Dependent Symlinks.   
Since it
would pass the type in as well, it would be possible to use it for  
different

kinds of links on the same filesystem.

Cheers,
Kyle Moffett

--
Simple things should be simple and complex things should be possible
  -- Alan Kay



-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: CCITT-CRC16 in kernel

2005-08-11 Thread Kyle Moffett


On Aug 11, 2005, at 11:19:59, linux-os (Dick Johnson) wrote:

On Thu, 11 Aug 2005 [EMAIL PROTECTED] wrote:

You're wrong in two ways:
1) You've got CRC-16 and CRC-CCITT mixed up, and
2) You've got the bit ordering backwards.  Remember, I said very  
clearly,

  the lsbit is the first bit, and the first bit is the highest power
  of x.  You can reverse the convention and still have a CRC, but  
that's

  not the way it's usually done and it's more awkward in software.

CRC-CCITT = X^16 + X^12 + X^5 + X^0 = 0x8408, and NOT 0x1021
CRC-16 =  X^16 + X^15 + X^2 + X^0 = 0xa001, and NOT 0x8005


Thank you very much for your time, but what you say is completely
different than anything else I have found on the net.

Do the math:

 2^ 16 = 65536
 2^ 12 =  4096
 2^  5 =32
 2^  0 = 1
--
 69655 = 0x11021


No, it's like this: first, the 16 term is ignored, then:

2^ ( 15 - 12 ) = 2^  3 = 8 = 0x0008
2^ ( 15 -  5 ) = 2^ 10 =  1024 = 0x0400
2^ ( 15 -  0 ) = 2^ 15 = 32768 = 0x8000
---
   = 0x8408

This has 2 things:
1) The least-significant bit is the first bit
2) The first bit is the _highest_ power of X.

Cheers,
Kyle Moffett

-BEGIN GEEK CODE BLOCK-
Version: 3.12
GCM/CS/IT/U d- s++: a18 C>$ UB/L/X/*(+)>$ P+++()>$ L(+ 
++) E
W++(+) N+++(++) o? K? w--- O? M++ V? PS+() PE+(-) Y+ PGP+++ t+(+++) 5  
X R?

tv-(--) b(++) DI+ D+ G e->$ h!*()>++$ r  !y?(-)
--END GEEK CODE BLOCK--


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: CCITT-CRC16 in kernel

2005-08-11 Thread Kyle Moffett


On Aug 11, 2005, at 13:08:56, linux-os (Dick Johnson) wrote:

Okay. Thanks. This means that hardware somehow swapped bits
before doing a CRC. I wasn't aware that this was even possible
as it would require additional storage, well I guess anything
is now possible in a FPGA.

The "Bible" has been:
 http://www.joegeluso.com/software/articles/ccitt.htm

Note that on the very first page, reference, is made to
the 0x1021 poly. Then there is source-code that is entirely
incompatible with anything in the kernel, but is supposed to
work (it does work on my hardware).

I have spent over a week grabbing everything on the Web that
could help decipher the CCITT CRC and they all show this
same kind of code and same kind of organization. Nothing
I could find on the Web is like the linux kernel ccitt_crc.
Go figure.

Do you suppose it was bit-swapped to bypass a patent?


It could be that, or it could be some kernel genius figured
out that one method is faster or better or more magical than
the other on most platforms.  Since the code works well, I
would be disinclined to tinker with it. :-D.

Cheers,
Kyle Moffett

--
Q: Why do programmers confuse Halloween and Christmas?
A: Because OCT 31 == DEC 25.



-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Wireless support

2005-08-11 Thread Kyle Moffett


On Aug 11, 2005, at 23:17:07, Lee Revell wrote:

On Fri, 2005-08-12 at 12:59 +1000, roucaries bastien wrote:


They post on this list 1 year and a half ago no answer.


I guess everyone on LKML has day jobs now, no one has time for fun  
stuff

like reverse engineering drivers anymore... :-(


Much as I would love to help, I'm usually buried under schoolwork.   
In any

case, I really have to admire the people behind the project, translating
tens of thousands of MIPS assembly instructions to C, documenting the C,
then giving the documentation to somebody else to write the driver even
though by that point you could write it backwards in a blindfold,  
that has

_got_ to be hard and frustrating work.

Cheers,
Kyle Moffett

--
Somone asked me why I work on this free (http://www.fsf.org/philosophy/)
software stuff and not get a real job. Charles Shultz had the best  
answer:


"Why do musicians compose symphonies and poets write poems? They do  
it because
life wouldn't have any meaning for them if they didn't. That's why I  
draw

cartoons. It's my life."
  -- Charles Shultz


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [Patch] Support UTF-8 scripts

2005-08-13 Thread Kyle Moffett


On Aug 13, 2005, at 20:57:45, Alan Cox wrote:

   I have "setxkbmap -symbols 'en_US(pc102)+gb'" in my ~/.xsession,
and « and » are available as AltGr-z and AltGr-x respectively.


Most keyboards don't have an AltGr key.


You must be an American. Most old the worlds keyboards have an AltGr
key. You'll find that US keyboards have two alt keys to avoid  
confusing
people (like one button mice ;)) but the right one is understood by  
the

X bindings to be "AltGr". Even though the US keyboard is apparently
lacking functionality its purely a text label issue


And those of us who are Mac OS X oriented have patched our console and X
keycodes to match the mac way of generating symbols:

Alt-\= «
Alt-Shift-\  = »
Alt-Shift-+  = ±

If only someone could come up with a good character palette like exists
on that OS, something that could generate a wide variety of keysyms,
preferably all of UTF-8, and send them to the topmost window.

Cheers,
Kyle Moffett

--
Unix was not designed to stop people from doing stupid things,  
because that

would also stop them from doing clever things.
  -- Doug Gwyn


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [ck] [PATCH] dynamic-tick patch modified for SMP

2005-08-13 Thread Kyle Moffett


On Aug 13, 2005, at 20:18:28, Con Kolivas wrote:

It does seems there are some timing issues
with this patch, although it is also quite stable (up for 10 hours  
now).

I've had a few interesting messages in my syslog suggesting problems:
Hangcheck: hangcheck value past margin!

and then later on a few of:
set_rtc_mmss: can't update from 0 to 59


It may be a good idea to rebase this patch off the new generic time- 
keeping
subsystem that John Stultz is working on.  He's cleaned up much of  
the code

relating to system time processing, which may make it easier to get it
right when skipping ticks (IE: You probably don't need to do anything
special to replay missed ticks, the new timer code automatically  
handles it

for you).  There is an excellent LWN article on his project here:
http://lwn.net/Articles/120850/

Cheers,
Kyle Moffett

--
Simple things should be simple and complex things should be possible
  -- Alan Kay



-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [Patch] Support UTF-8 scripts

2005-08-14 Thread Kyle Moffett


On Aug 14, 2005, at 02:18:13, Jason L Tibbitts III wrote:

"LR" == Lee Revell <[EMAIL PROTECTED]> writes:

LR> Is Larry smoking crack?

From the Perl6-Bible: http://search.cpan.org/dist/Perl6-Bible/lib/ 
Perl6/Bible/S03.pod:


I think this confirms that the answer is yes.  See the following at  
the above URL:
Note that ?^ is functionally identical to !.?| differs from || in  
that ?| always
returns a standard boolean value (either 1 or 0), whereas ||  
returns the actual

value of the first of its arguments that is true.


Since when is the string "!.?|" an operator???  Or "?^", "+|", "~|",  
"?|", etc.  I
think Larry's gone off the deep end on this one.  It may be an  
incredibly powerful
and expressive language, but it seems _really_ strange, and probably  
will produce
the best Obfuscated-code contest the world has ever seen. (Better  
even than the

Perl5 one).

Cheers,
Kyle Moffett

--
Simple things should be simple and complex things should be possible
  -- Alan Kay



-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] Fix PPC signal handling of NODEFER, should not affect sa_mask

2005-08-14 Thread Kyle Moffett


On Aug 12, 2005, at 17:53:53, Steven Rostedt wrote:

Two more systems that are different from Linux.

So far, Linux is the odd ball out.


Make that three more systems (Mac OS X has the same behavior as the  
BSDs):


zeus:~ kyle$ uname -a
Darwin zeus.moffetthome.net 8.2.0 Darwin Kernel Version 8.2.0: Fri  
Jun 24 17:46:54 PDT 2005; root:xnu-792.2.4.obj~3/RELEASE_PPC Power  
Macintosh powerpc

zeus:~ kyle$ ./test_signal
sa_mask blocks other signals
SA_NODEFER does not block other signals
SA_NODEFER does not affect sa_mask
SA_NODEFER and sa_mask blocks sig
!SA_NODEFER blocks sig
SA_NODEFER does not block sig
sa_mask blocks sig

Cheers,
Kyle Moffett

--
Q: Why do programmers confuse Halloween and Christmas?
A: Because OCT 31 == DEC 25.



-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC][PATCH 2.6.13-rc6] add Dell Systems Management Base Driver (dcdbas) with sysfs support

2005-08-15 Thread Kyle Moffett


On Aug 15, 2005, at 16:05:22, Doug Warzecha wrote:
This patch adds the Dell Systems Management Base Driver with sysfs  
support.



+On some Dell systems, systems management software must access certain
+management information via a system management interrupt (SMI).   
The SMI data
+buffer must reside in 32-bit address space, and the physical  
address of the
+buffer is required for the SMI.  The driver maintains the memory  
required for

+the SMI and provides a way for the application to generate the SMI.
+The driver creates the following sysfs entries for systems management
+software to perform these system management interrupts:


Why can't you just implement the system management actions in the kernel
driver?  This is tantamount to a binary SMI hook to userspace.  What
functionality does this provide on a dell system from an administrator's
point of view?


+Host Control Action
+
+Dell OpenManage supports a host control feature that allows the  
administrator
+to perform a power cycle or power off of the system after the OS  
has finished
+shutting down.  On some Dell systems, this host control feature  
requires that

+a driver perform a SMI after the OS has finished shutting down.
+
+The driver creates the following sysfs entries for systems  
management software
+to schedule the driver to perform a power cycle or power off host  
control

+action after the system has finished shutting down:
+
+/sys/devices/platform/dcdbas/host_control_action
+/sys/devices/platform/dcdbas/host_control_smi_type
+/sys/devices/platform/dcdbas/host_control_on_shutdown


How is this different from shutdown() or reboot()?  What exactly is  
smi_type used
for?  Please provide better documentation on how to use this and what  
it does.


If this is supposed to be used with the RBU code to trigger a BIOS  
update, then
why not integrate it into one kernel driver that receives firmware,  
loads it into
the BIOS, and properly resets the machine at powerdown?  I think  
PowerPC does a
similar thing with OpenFirmware flash memory.  When I change the  
default boot
device or other firmware environment, I get a message from the kernel  
upon

shutdown:
Erasing  flash bank 1...
Writing  flash bank 1...

Would not a similar system work for Dell?  It would be far simpler to  
use than
the current mess of patches you've proposed.  If done properly, I  
could even

do this:

cat firmware-with-checksum.img >/sys/devices/platform/dellbios/ 
firmware_upgrade


Then an ordinary system reboot or shutdown would automatically use  
the SMI and
host-control-action to upgrade the firmware and shutdown or reboot,  
instead of

the normal ACPI shutdown and reboot code.

Cheers,
Kyle Moffett

--
Somone asked me why I work on this free (http://www.fsf.org/philosophy/)
software stuff and not get a real job. Charles Shultz had the best  
answer:


"Why do musicians compose symphonies and poets write poems? They do  
it because
life wouldn't have any meaning for them if they didn't. That's why I  
draw

cartoons. It's my life."
  -- Charles Shultz


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC][PATCH 2.6.13-rc6] add Dell Systems Management Base Driver (dcdbas) with sysfs support

2005-08-15 Thread Kyle Moffett


On Aug 15, 2005, at 18:58:56, [EMAIL PROTECTED] wrote:

Why can't you just implement the system management actions in
the kernel driver?  This is tantamount to a binary SMI hook to
userspace.  What functionality does this provide on a dell
system from an administrator's point of view?


Kyle,
I'm sure that not everybody agrees with the whole concept of SMI
calls. Nevertheless, these calls exist, and in order to have a  
complete

systems-management solution, we have to provide a way to do SMI calls.
Now, we have developed a way to do these SMI calls from userspace
without kernel support, but we are trying to be community-friendly and
show our hooks in the open, rather than trying to sneak them in under
the covers.




You might not like the concept of a generic hook for SMI calls
in the kernel, but the alternatives are hardly better. One alternative
is the already-mentioned method that we do things under the covers in
userspace. Another alternative is that we write separate kernel code
for each and every SMI call that exists in the Dell BIOS.



The second alternative is not entirely feasible. We have over 60
SMI functions, and we would have to write a kernel-mode wrapper for
each and every one. I hope you agree that code that doesn't exist is
less buggy than code that is, and that code that is in userspace is a
whole lot less likely to cause a kernel crash than code that is in
the kernel.


I think the second alternative is actually feasible and preferable. The
point of the kernel is to provide safe and secure access to two things:
  1) Hardware through an abstraction layer
  2) Software services (like IP stack) that are not feasible to do in
 userspace.

A system that just provides a hunk of DMA RAM and the ability to  
generate

interrupts is definitely not 2, and does not really follow the ideal
behind 1 either.  I gave the firmware example earlier.  There are  
several

devices that provide access to update firmware by reading and writing a
firmware file directly in sysfs, then updating it on reboot if  
necessary.


 We are trying to keep our kernel bloat down. We don't really think  
that
customers of IBM or HP really want their Red Hat kernels loaded  
down with

a bunch of Dell-only code.


That's what kconfig is for.  My G4 Powerbook doesn't have support for
hardware found in my G4 desktop any more than an IBM box should be  
forced

to have support for Dell hardware, yet all platforms work fine from the
same kernel tree.


Additionally, we are releasing an open source library (GPL/OSL
dual  license) that can use these hooks to perform many systems
management functions in userspace. See
http://linux.dell.com/libsmbios/main/. We should have code in  
libsmbios to
do SMI using this driver within about two weeks.  We currently  
writing the
SMI hooks in libsmbios using this posted version of the driver. I  
am the
maintainer of this project, and it is my goal to have code in  
libsmbios

for every Dell SMI call.


That's a nice project.  I applaud Dell for it's openness, but that's  
not the

only issue here, the kernel needs good engineering too.

I would suggest that you try to implement as much as is possible in a  
kernel
driver.  Firmware loading support, for example, or hardware sensors,  
should

integrate well into sysfs and be accessible through existing tools if
possible.  Doug also mentions fan status and control in his mail.   
Could you
provide such access through existing fan status/control interfaces so  
that

existing tools work as well?


We would welcome feedback on a better way to implement this
driver in the kernel, but the fact remains that we have to have a  
way to do
this, and we are open-sourcing all of the code necessary to get  
this done.


Thank you for your effort.  You guys have made significant progress,  
but IMHO,

you've still got a ways to go.  Keep up the good work, though!

Cheers,
Kyle Moffett

--
Unix was not designed to stop people from doing stupid things,  
because that

would also stop them from doing clever things.
  -- Doug Gwyn


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC][PATCH 2.6.13-rc6] add Dell Systems Management Base Driver (dcdbas) with sysfs support

2005-08-15 Thread Kyle Moffett


On Aug 15, 2005, at 19:38:49, Doug Warzecha wrote:

On Mon, Aug 15, 2005 at 04:23:37PM -0400, Kyle Moffett wrote:
Why can't you just implement the system management actions in the  
kernel

driver?


We want to minimize the amount of code in the kernel and avoid  
having to

update the driver each time a new system management command is added.


One of the recent trends in kernel driver development is to make as much
as possible accessible through standard tools (like with echo and cat  
via

sysfs).

The libsmbios project is being updated to use this code.  http:// 
linux.dell.com/libsmbios/main/.  Using the libsmbios code, you

will be able to set all of the options in BIOS F2 screen from Linux
userspace.  Also, libsmbios is looking at implementing a few other  
things
like fan status.  Libsmbios is 100% open-source (OSL/GPL dual  
license).


From my point of view, this driver could use sysfs almost entirely  
and put

all of the hardware-manipulation code completely in kernel space, along
with the hardware detection code.  You could have plain-text files in
/sys/bus/platform/dellbios/ that have all of the BIOS F2 options  
accessible

to the admin from the command line, without special tools.  (You could
always add an extra program that presents a BIOS-like interface)


The power cycle feature of the system powers off the system for a few
seconds and then powers the system back on without user intervention.
shutdown() and reboot() don't provide that feature.


Please ensure that the code is only run on reboot (and maybe halt), but
definitely not in the poweroff code.

What exactly is smi_type used for?  Please provide better  
documentation

on how to use this and what it does.


The method of generating a host control SMI is not exactly the same  
for
each PowerEdge system listed in dcdbas.txt.  host_control_smi_type  
tells

the driver how to generate the host control SMI for the system in use.
I'll update dcdbas.txt with the SMI type value associated with the  
systems

listed in that file.


This is an _excellent_ reason why more of this should be in the kernel.
What happens if the wrong SMI is used?  Shouldn't it be relatively easy
for the kernel to determine the correct SMI itself?

Thanks for your hard work!

Cheers,
Kyle Moffett

--
Unix was not designed to stop people from doing stupid things,  
because that

would also stop them from doing clever things.
  -- Doug Gwyn


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC][PATCH 2.6.13-rc6] add Dell Systems Management Base Driver (dcdbas) with sysfs support

2005-08-15 Thread Kyle Moffett


On Aug 16, 2005, at 00:34:51, Chris Wedgwood wrote:

On Mon, Aug 15, 2005 at 04:23:37PM -0400, Kyle Moffett wrote:

Why can't you just implement the system management actions in the
kernel driver?


Why put things in the kernel unless it's really needed?

I'm not thrillied about the lack of userspace support for this driver
but that still doesn't mean we need to shovel wads of crap into the
kernel.


I'm worried that it might be more of a mess in userspace than it  
could be
if done properly in the kernel.  Hardware drivers, especially for  
something

as critical as the BIOS, should probably be done in-kernel.  Look at the
mess that X has become, it mmaps /dev/mem and pokes at the PCI busses
directly.  I just don't want an MSI-driver to become another /dev/mem.

Cheers,
Kyle Moffett

-BEGIN GEEK CODE BLOCK-
Version: 3.12
GCM/CS/IT/U d- s++: a18 C>$ UB/L/X/*(+)>$ P+++()>$ L(+ 
++) E
W++(+) N+++(++) o? K? w--- O? M++ V? PS+() PE+(-) Y+ PGP+++ t+(+++) 5  
X R?

tv-(--) b(++) DI+ D+ G e->$ h!*()>++$ r  !y?(-)
--END GEEK CODE BLOCK--


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 2.6.13-rc6 1/2] New Syscall: get rlimits of any process

2005-08-16 Thread Kyle Moffett


On Aug 16, 2005, at 13:34:34, Wieland Gmeiner wrote:

On Sat, 2005-08-13 at 15:11 -0700, Greg KH wrote:

On Fri, Aug 12, 2005 at 07:48:22PM +0200, Wieland Gmeiner wrote:


@@ -294,3 +294,4 @@ ENTRY(sys_call_table)
 .long sys_inotify_init
 .long sys_inotify_add_watch
 .long sys_inotify_rm_watch
+.long sys_getprlimit



Please follow the proper kernel coding style when writing new kernel
code...


Hm, Documentation/CodingStyle suggests using descriptive names, so
something like getrlimit(...)/getrlimit_per_process(pid_t pid, ...)
would be more appropriate?


I think he was commenting more on the code indentation and braces  
placement

than any naming issue.  There was also a good guide to kernel whitespace
posted to the LKML a week or so ago, please check the archives and  
review

that as well.

I have one small comment on something you stated in your original mail:

Otherwise some checking on the validity of the given pid is
done and if the given process is found access is granted if

- the calling process holds the CAP_SYS_RESOURCE capability or
- the calling process uid equals the uid of the process whose rlimit
  is being read or
- the calling process uid equals the suid of the process whose rlimit
  is being read or
- the calling process euid equals the uid of the process whose rlimit
  is being read or
- the calling process euid equals the suid of the process whose
  rlimit is being read


I suggest that you revise this list to the following:
If the calling process can ptrace the target process, then allow  
rlimits to be
read and written such that the hard limits may not be raised unless  
one of the

two processes possesses the CAP_SYS_RESOURCE capability


ptrace implies the ability to execute arbitrary code in the given  
process, which
means that even without this new function the calling process  
theoretically
could obtain and set rlimits for that process anyways, subject to its  
own
CAP_SYS_RESOURCE capability.  Such a situation would guarantee that  
there are
no new security holes, and would limit the number of inter-process  
access rules
which kernel developers need to understand.  I believe some simple  
Googling and
grepping through the kernel code should reveal the necessary ptrace- 
related

process checks.

Cheers,
Kyle Moffett

--
There are two ways of constructing a software design. One way is to  
make it so
simple that there are obviously no deficiencies. And the other way is  
to make
it so complicated that there are no obvious deficiencies.  The first  
method is

far more difficult.
  -- C.A.R. Hoare


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Bug#321442: kernel-source-2.6.8: fails to compile on powerpc (drivers/ide/ppc/pmac.c)

2005-08-16 Thread Kyle Moffett


On Aug 13, 2005, at 18:54:30, LT-P wrote:

Le lun 08 aoû 2005 17:57:04 CEST, Horms <[EMAIL PROTECTED]> a écrit:
Can you please enable BLK_DEV_IDEDMA_PCI and see if that resolves  
your

problem. If it does, then the following patch should fix Kconfig
so that BLK_DEV_IDEDMA_PCI needs to be enabled for BLK_DEV_IDE_PMAC
to be enabled. It should patch cleanly against Debian's 2.6.8 and
Linus' current Git tree.

It seems to solve the problem, thanks.
Sometimes, I feel like I am the only person in the world to compile  
the kernel on

powerpc... :)


Actually, I ran into this same bug a day or so ago when updating to  
2.6.13-rc6,
it's just I noticed the error, fixed my config, then recompiled and  
forgot

about it completely until now :-D.  Thanks for the bug report, though!

Cheers,
Kyle Moffett

--
I have yet to see any problem, however complicated, which, when you  
looked at

it in the right way, did not become still more complicated.
  -- Poul Anderson



-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC][PATCH 2.6.13-rc6] add Dell Systems Management Base Driver (dcdbas) with sysfs support

2005-08-17 Thread Kyle Moffett


On Aug 17, 2005, at 01:33:00, Matt Domsch wrote:

This is conceptually similar to how SCSI Generic (either
/dev/sg or ioctl(SG_IO)) works (userspace passes in preformated SCSI
CDBs and gets back the resultant CDBs and extended sense data).  The
sg driver doesn't look at the data being passed down to any great
extent.  It doesn't validate that the command will make sense to the
end device.


This is not true anymore.  Recently the SG driver obtained a basic
form of SCSI command checking to prohibit vendor commands from those
processes without CAP_RAW_IO, even if said process had full access
to the device node itself.


Cheers,
Kyle Moffett

--
Simple things should be simple and complex things should be possible
  -- Alan Kay



-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Why Ext2/3 needs immutable attribute?

2005-04-17 Thread Kyle Moffett

On Apr 17, 2005, at 12:12, Xin Zhao wrote:
Thanks for your reply.
Yes. I know,  with immutable,  even root cannot modify sensitive
files. What I am curious is if an intruder has root access, he may
have many ways to turn off the immutable protection and modify files.
So immutable is designed just to prevent a valid root from making
silly mistakes?
Xin
But without the proper capability, root _can't_ change the immutable
bit.  Of course, that also applies to DAC checks too.  Personally, I
find the immutable bit most useful at preventing accidents.  I have
several scripts designed specifically to access the same file, and I
want to prevent one of my admins from accidentally editing that file
by hand.  The best way is with a big comment in the file itself and
the immutable bit.
Cheers,
Kyle Moffett
-BEGIN GEEK CODE BLOCK-
Version: 3.12
GCM/CS/IT/U d- s++: a18 C>$ UB/L/X/*(+)>$ P+++()>$
L(+++) E W++(+) N+++(++) o? K? w--- O? M++ V? PS+() PE+(-) Y+
PGP+++ t+(+++) 5 X R? tv-(--) b(++) DI+ D+ G e->$ h!*()>++$ r  
!y?(-)
--END GEEK CODE BLOCK--

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: More performance for the TCP stack by using additional hardware chip on NIC

2005-04-17 Thread Kyle Moffett

On Apr 17, 2005, at 19:37, Horst von Brand wrote:
Andreas Hartmann <[EMAIL PROTECTED]> said:
Alacritech developed a new chip for NIC's
(http://www.alacritech.com/html/tech_review.html), which makes it 
possible
to take away the TCP stack from the host CPU. Therefore, the host CPU 
has
more performance for the applications according Alacritech.

This sounds interesting.
This idea has been discussed around here a couple of times, and the
consensus is that it is a bad idea: IP (and upper protocol) processing
is not expensive, if done right, so this really doesn't buy much; this
forces a particular interface to networking into the kernel, loosing
flexibility that way is always bad; there is no access to futzing
around in between (for example, for firewalling and such); and if the
"hardware implementation" has bugs, you are screwed.
What I think would be _much_ more useful is a generic low-power 
multi-proc
MIPS/PPC system on a PCI card with a certain amount of RAM, etc that 
could
be programmed at runtime by the master CPU.  Then you lose none of the
flexibility, it can be run in the same endian-mode as the host CPU, and 
it
would allow you to program it for much more complicated DMA.  You could 
do
anything from linux software RAID, audio processing, encryption, TCP/IP
stack acceleration, extra scatter-gather for your disk controller, etc.
If it was low-cost, IE: cheaper than adding extra full-speed CPUs to the
system, and using a decent bi-endian, vector-capable CPU (Like PPC), you
might find that people will buy them for the flexibility.  Such a thing
might also be useful for the prezero folks, it could be used (when not
otherwise occupied) for zeroing unused pages.

Personally, I think I'd buy one or two just to tinker with them :-D.
Cheers,
Kyle Moffett
-BEGIN GEEK CODE BLOCK-
Version: 3.12
GCM/CS/IT/U d- s++: a18 C>$ UB/L/X/*(+)>$ P+++()>$
L(+++) E W++(+) N+++(++) o? K? w--- O? M++ V? PS+() PE+(-) Y+
PGP+++ t+(+++) 5 X R? tv-(--) b(++) DI+ D+ G e->$ h!*()>++$ r  
!y?(-)
--END GEEK CODE BLOCK--

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Power consumption HZ100, HZ250, HZ1000: new numbers

2005-07-31 Thread Kyle Moffett


On Jul 31, 2005, at 18:32:47, Pavel Machek wrote:


and cpufreq is usefull to keep your desktop cold, too.



But I don't want my desktop cold!!!  That would ruin its usefulness as a
400W dorm space-heater!!! :-D

*starts boinc client running in the background*


Cheers,
Kyle Moffett

--
There are two ways of constructing a software design. One way is to  
make it so
simple that there are obviously no deficiencies. And the other way is  
to make
it so complicated that there are no obvious deficiencies.  The first  
method is

far more difficult.
  -- C.A.R. Hoare



-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 00/14] GFS

2005-08-02 Thread Kyle Moffett


On Aug 2, 2005, at 21:00:02, Hans Reiser wrote:

Arjan van de Ven wrote:

because reiser got merged before jbd. Next question.

That is the wrong reason.  We use our own journaling layer for the
reason that Vivaldi used his own melody.

I don't know anything about GFS, but expecting a filesystem author to
use a journaling layer he does not want to is a bit arrogant.  Now, if
you got into details, and said jbd does X, Y and Z, and GFS does the
same X and Y, and does not do Z as well as jbd, that would be a more
serious comment.  He might want to look at how reiser4 does wandering
logs instead of using jbd. but I would never claim that for sure
some other author should be expected to use it.  and something  
like

changing one's journaling system is not something to do just before a
merge.


I don't want to start another big reiser4 flamewar, but...

"I don't know anything about Reiser4, but expecting a filesystem author
to use a VFS layer he does not want to is a bit arrogant.  Now, if you
got into details, and said the linux VFS does X, Y, and Z, and Reiser4
does..."

Do you see my point here?  If every person who added new kernel code
just wrote their own thing without checking to see if it had already
been done before, then there would be a lot of poorly maintained code
in the kernel.  If a journalling layer already exists, _new_ journaled
filesystems should either (A) use the layer as is, or (B) fix the layer
so it has sufficient functionality for them to use, and submit patches.
That way if somebody later says, "Ah, crap, there's a bug in the kernel
journalling layer", and fixes it, there are not eight other filesystems
with their own open-coded layers that need to be audited for similar
mistakes.

This is similar to why some kernel developers did not like the Reiser4
code, because it implemented some private layers that looked kinda like
stuff the VFS should be doing  (Again, I don't want to get into that
argument again, I'm just bringing up the similarities to clarify _this_
particular point, as that one has been beaten to death enough already).

Now the question for GFS is still a valid one; there might be  
reasons to

not use it (which is fair enough) but if there's no real reason then
using jdb sounds a lot better given it's maturity (and it is used  
by 2

filesystems in -mm already).


Personally, I am of the opinion that if GFS cannot use jdb, the  
developers

ought to clarify why it isn't useable, and possibly submit fixes to make
it useful, so that others can share the benefits.

Cheers,
Kyle Moffett

--
I lost interest in "blade servers" when I found they didn't throw  
knives at

people who weren't supposed to be in your machine room.
  -- Anthony de Boer


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Calling suspend() in halt/restart/shutdown -> not a good idea

2005-08-03 Thread Kyle Moffett


On Aug 3, 2005, at 07:40:54, Benjamin Herrenschmidt wrote:

I'd like to get rid of shutdown callback. Having two copies of code
(one in callback, one in suspend) is ugly.


Well, it's obviously not a good time for this. First, suspend and
shutdown don't necessarily do the same thing, then it just doesn't  
work
in practice. So either do it right completely or not at all, but  
2.6.13

isn't the place for an half-assed hack that looks like a solution to
you.


One possible way to proceed might be to add a new callback that takes a
pm_message_t: powerdown()  If it exists, it would be called in both the
suspend and shutdown paths, before the suspend() and shutdown() calls to
that driver are made.  As drivers are fixed to clean up and combine that
code, they could put the merged result into the powerdown() function,
and remove their suspend() and shutdown() functions.

Cheers,
Kyle Moffett

--
I lost interest in "blade servers" when I found they didn't throw  
knives at

people who weren't supposed to be in your machine room.
  -- Anthony de Boer


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Why is kmem_bufctl_t different across platforms?

2005-08-28 Thread Kyle Moffett


While exploring the asm-*/types.h files, I discovered that the
type "kmem_bufctl_t" is differently defined across each platform,
sometimes as a short, and sometimes as an int.  The only file
where it's used is mm/slab.c, and as far as I can tell, that file
doesn't care at all, aside from preferring it to be a small-sized
type.  I found this comment:


/*
 * kmem_bufctl_t:
 *
 * Bufctl's are used for linking objs within a slab
 * linked offsets.
 *
 * This implementation relies on "struct page" for locating the  
cache &

 * slab an object belongs to.
 * This allows the bufctl structure to be small (one int), but limits
 * the number of objects a slab (not a cache) can contain when off- 
slab
 * bufctls are used. The limit is the size of the largest general  
cache

 * that does not use off-slab slabs.
 * For 32bit archs with 4 kB pages, is this 56.
 * This is not serious, as it is only for large objects, when it is  
unwise

 * to have too many per slab.
 * Note: This limit can be raised by introducing a general cache  
whose size

 * is less than 512 (PAGE_SIZE<<3), but greater than 256.
 */


It appears to state that the max kmem_bufctl_t value is ~56 on most
setups, although it could be higher with 64-bit or bigger pages.
Since this value is never used by anything except that kernel-internal
file, should it be unified across all architectures?  If so, I'll send
a patch to remove the various typedefs and introduce a single
"typedef unsigned short kmem_bufctl_t" in include/linux/types.h

Cheers,
Kyle Moffett

--
Premature optimization is the root of all evil in programming
  -- C.A.R. Hoare



-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Why is kmem_bufctl_t different across platforms?

2005-08-28 Thread Kyle Moffett


On Aug 28, 2005, at 19:37:16, Adrian Bunk wrote:

On Sun, Aug 28, 2005 at 02:55:03PM -0700, Andrew Morton wrote:

Kyle Moffett <[EMAIL PROTECTED]> wrote:

While exploring the asm-*/types.h files, I discovered that the
 type "kmem_bufctl_t" is differently defined across each platform,
 sometimes as a short, and sometimes as an int.  The only file
 where it's used is mm/slab.c, and as far as I can tell, that file
 doesn't care at all, aside from preferring it to be a small-sized
 type.


I don't think there's any good reason for this.  -mm's
slab-leak-detector.patch switches them all to unsigned long.


What about moving it to include/linux/types.h ?


Or, since it's _only_ used in mm/slab.c, why not put it in there?
Here is a really simple patch that does just that:


kmem_bufctl_t-consolidation.patch
Description: Binary data


Cheers,
Kyle Moffett

--
Q: Why do programmers confuse Halloween and Christmas?
A: Because OCT 31 == DEC 25.

Re: Is cdrecord dependent on some kind of bus type?

2005-08-29 Thread Kyle Moffett


On Aug 29, 2005, at 07:46:04, jeff shia wrote:

Hello,

Is cdrecord dependent on some kind of bus type,such as pci or usb?
And the older version such as cdrecord-1.2?
can cdrecord-1.2 run on kernel-2.4.18?


Please ask these kinds of questions of the cdrecord mailing-list or the
cdrecord author Jörg Schilling, instead of on this list (this is a
kernel development list, as opposed to a linux-users list).  Also, you
sent duplicate copies of your message only hours apart.  Please don't
do this.  Yes, we did get your message, but nobody replied to it because
it was off-topic and indicated a complete lack of RTFM and STFW.  Please
go read the associated documentation before asking questions, and then
ask them on the appropriate forum (if you still have questions).

Here is a good document about asking good questions:
http://www.catb.org/~esr/faqs/smart-questions.html

Cheers,
Kyle Moffett

--
Premature optimization is the root of all evil in programming
  -- C.A.R. Hoare



-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: inotify and IN_UNMOUNT-events

2005-08-30 Thread Kyle Moffett


On Aug 30, 2005, at 23:33:27, Robert Love wrote:

On Tue, 2005-08-30 at 21:46 +0200, Juergen Quade wrote:


Playing around with inotify I have some problems
to generate/receive IN_UNMOUNT-events (using
a self written application and inotify_utils-0.25;
kernel 2.6.13).

Doing:
- mount /dev/hda1 /mnt
- add a watch to the path /mnt/ ("./inotify_test /mnt")
- umount /mnt

results in two events:
1. IN_DELETE_SELF (mask=0x0400)
2. IN_IGNORED (mask=0x8000)

Any ideas?


"/mnt" is not unmounted, stuff inside of it is.

Watch, say, "/mnt/foo/bar" and when /dev/hda1 is unmounted, you  
will get

an IN_UNMOUNT on the watch.


I think this might work as well:
# mount /dev/hda1 /mnt
# ./inotify_test /mnt/. &
# umount /mnt

That should get the effect you are looking for

Cheers,
Kyle Moffett

--
I have yet to see any problem, however complicated, which, when you  
looked at

it in the right way, did not become still more complicated.
  -- Poul Anderson



-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: APs from the Kernel Summit run Linux

2005-08-31 Thread Kyle Moffett


On Aug 31, 2005, at 16:32:11, Vojtech Pavlik wrote:

On Wed, Aug 31, 2005 at 08:53:19PM +0100, Russell King wrote:


On Wed, Aug 31, 2005 at 12:55:12PM -0400, Mark Lord wrote:


I'll try loading the works into another ARM
system I have here, and see (1) if it runs as-is,
and (2) what the disassembly shows.



You can identify ARM code quite readily - look for a large number of
32-bit words naturally aligned and grouped together whose top nibble
is 14 - ie 0xE...

The top nibble is the conditional execution field, and 14 is  
"always".


Didn't find that. Anyway:

The first and third parts contain a repeating 7-byte sequence

81 40 20 10 08 04 02

near the beginning, while part 2 is padded with zeroes in the same
place.


That sequence is altered in the first and last repetitions, like this:

88 4020 1008 0402
81 4020 1008 0402
[...]
81 4020 1008 0402
81 4020 1008 04c2

The 4020 and 0402 look oddly symmetrical to me, but that could just
be my imagination.

I wrote a quick perl script to find the number of occurrences of 8-bit
aligned sequences of 16-bits, for all 16-bit values.  It has some
interesting (and potentially useful) results.

The script:
http://zeus.moffetthome.net/~kyle/hexfreq

The output:
http://zeus.moffetthome.net/~kyle/dwl.hexmult

Reprocessed output by frequency:
http://zeus.moffetthome.net/~kyle/dwl.hexfreq

Reprocessing command:
dwl.hexfreq


Cheers,
Kyle Moffett

--
Somone asked me why I work on this free (http://www.fsf.org/philosophy/)
software stuff and not get a real job. Charles Shultz had the best  
answer:


"Why do musicians compose symphonies and poets write poems? They do  
it because
life wouldn't have any meaning for them if they didn't. That's why I  
draw

cartoons. It's my life."
  -- Charles Shultz


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC] A more general timeout specification

2005-09-01 Thread Kyle Moffett


On Sep 1, 2005, at 11:18:52, Roman Zippel wrote:

On Thu, 1 Sep 2005, Joe Korty wrote:

On Thu, Sep 01, 2005 at 11:19:51AM +0200, Roman Zippel wrote:

You still didn't explain what's the point in choosing
different clock sources for a _timeout_.


Well, if CLOCK_REALTIME is set forward by a minute,
timers & timeout specified against that clock will expire
a minute earlier than expected.


That just rather suggests that the pthread API is broken as usual.
(No other possible user was mentioned so far.)


How about a hypothetical time-based event daemon.  I want to run
some jobs every 10 minutes that the system is running (not off or
suspended), I want to run other jobs every hour in real time, and
if one such timer expires while suspended, I want to run it
immediately to catch up.  The first suggests CLOCK_MONOTONIC, and
the second works better with CLOCK_REALTIME.


So in practice it's easier to advance CLOCK_MONOTONIC/CLOCK_REALTIME
equally and only apply time jumps to CLOCK_REALTIME.


I thought that's what he said, but maybe I'm just confused :-D.

Cheers,
Kyle Moffett

--
Premature optimization is the root of all evil in programming
  -- C.A.R. Hoare



-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[RFC] Splitting out kernel<=>userspace ABI headers

2005-09-01 Thread Kyle Moffett


A while ago there was a big discussion about splitting out the
userspace-accessible portions of the kernel headers into a separate
directory, "kabi", "kernel-abi", "linux-abi", or a half-dozen other
suggestions.  Linus sprinkled a bit of holy-penguin-pee on the idea,
but nothing ever really happened after that.  I have some available
time at the moment, and I would be willing to undertake the task,
but I would like a bit of guidance first, both from Linus/akpm/etc,
and from the list in general, about a few initial issues I see from
my initial attempts to sort through the mess:

  1)  There are a couple header files upon which almost everything
else depends, among them {asm,linux}/{posix_,}types.h, which have
some significant duplications.  Many of the archs have weird sizes
for those types to preserve some backwards-compatibility ABI, but
nowhere does it explain if there are any type-size restrictions in
general.  I would propose that those headers be reorganized so that
there are sane defaults for all the types in kabi/types.h, and
archs that require different would #define exceptions in their
kabi/arch-foo/types.h.  This would allow new archs to start with a
sane standard ABI before it becomes set in stone.

  2)  There is a bunch of stuff that would be _really_ useful in
userspace programs as well, even though not kernel ABI, such as
list.h, atomic.h (with a few archs modified due to privilege
restrictions), etc.  If there is interest, I would attempt to split
off those headers into a kcore/kerncore/linuxcore/whatever inline
header collection included in the linux distribution and installed
as part of the kernel headers.

  3)  What names are preferable for the above?  My personal
preferences are "kabi" and "kcore", because those save the most
typing for the sucker trying to do all this (IE: me), although if
someone has good reasons otherwise, I'll listen.

I realize this project is only slightly short of massive, however I
do have a bunch of time and am willing to do the grunt work if
enlightened as to the community desires.  I have a few different
semi-patches almost ready, and I can probably finish up a couple
this weekend if I can figure out which way people want to go.  One
of the major challenges is that kernel files have historically kind
of indiscriminately included asm/foo.h when they really meant
linux/foo.h (See the types.h example), only to have it magically
work because some other header already included linux/types.h
anyways.  If arch/driver/etc maintainers are willing to take patches
to clean that up, I'll start with that and eventually get a decent
set of kabi/* headers.


Cheers,
Kyle Moffett

--
Unix was not designed to stop people from doing stupid things,  
because that

would also stop them from doing clever things.
  -- Doug Gwyn


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC] Splitting out kernel<=>userspace ABI headers

2005-09-02 Thread Kyle Moffett


On Sep 2, 2005, at 09:41:09, Erik Andersen wrote:

On Thu Sep 01, 2005 at 11:00:16PM -0400, Kyle Moffett wrote:

A while ago there was a big discussion about splitting out the
userspace-accessible portions of the kernel headers into a separate
directory, "kabi", "kernel-abi", "linux-abi", or a half-dozen other
suggestions.  Linus sprinkled a bit of holy-penguin-pee on the idea,
but nothing ever really happened after that.


Have you seen the linux-libc-headers:
http://ep09.pld-linux.org/~mmazur/linux-libc-headers/
which, while not an official part of the kernel, do a pretty
good job...


Well, the eventual goal of this project would be to eliminate the
need for linux-libc-headers by making that task trivial (IE: Just copy
the kcore/ and kabi/ (or whatever they get called) directories into
/usr/include.  There would probably be some compatibility headers
installed into /usr/include/linux until 2.8 is released or 2.7 is
forked for some major internal modification, but other than that, the
stuff shared by userspace and kernelspace would be only in kcore and
kabi, and eventually the linux/* stuff could remove all the __KERNEL__
ifdefs contained therein.  Right now linux-libc-headers is maintained
by one person at each kernel revision.  It would be much better if
that maintenance load could be undertaken instead by those who create
the code that uses those headers, the kernel developers themselves,
because they surely understand it better and are likely to be able to
do it more easily and accurately.

Cheers,
Kyle Moffett

--
Unix was not designed to stop people from doing stupid things,  
because that

would also stop them from doing clever things.
  -- Doug Gwyn


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC] Splitting out kernel<=>userspace ABI headers

2005-09-02 Thread Kyle Moffett


On Sep 2, 2005, at 17:55:54, H. Peter Anvin wrote:

UML really needs something like this, both 1 and 2.  See
http://groups.google.com/group/fa.linux.kernel/browse_thread/ 
thread/34d3c02372861a5c/71816a3c7863ea2b?lnk=st&q=%22jeff+dike% 
22&rnum=27&hl=en#71816a3c7863ea2b

for my take on system.h and ptrace.h when a change in the host
architecture broke the UML build.

UML takes most of its headers from the underlying arch.  It  
simplifies

things since most of the definitions are usable in UML.  I don't have
to clone and maintain my versions of all the other arch headers.

OTOH, there are things in those headers which UML can't use, and  
these

are eliminated in various ways (undefining them after the include of
the host arch header, redefining them before the include).  But this
is a pain.

It has long been my opinion that splitting headers into userspace
usable and userspace unusable pieces is the right thing for UML.   
Less

clear for the host arch.

Your post seems to indicate that there is a non-UML demand for  
exactly

this.


There definitely is.  The kernel needs to export its ABI in a way that
userspace (UML, various libcs, etc) can import in a sane manner.  In
addition, the Linux kernel contains a fair bit of
architecture-specific support which go well beyond what one can
typically find in userspace, and it would be nice to have those.

The current linux-libc-headers aren't it, because they have a fair bit
of glibc-centric assumptions in those headers.  That's part of why
klibc doesn't use them.


What I would try to do is package up as much architecture/abi knowledge
in one place as possible, the former in kcore/kern-core/whatever, the
latter in kabi/kern-abi/linux-abi/whatever.  I would also try (as much
as possible), to make everything in those directories use some kind of
prefix guaranteed not to clash with other stuff, so list_add() for
example would become _kcore_list_add().  The linux kernel headers in
such a modified kernel would then just do this to make the kernel code
happy:
#ifdef __KERNEL__
# define list_add(x,y) _kcore_list_add(x,y)
/**/
#endif

My far-into-the-future ideal for this is to have a generic vDSO-type
library that is compiled into the kernel that provides a collection of
architecture-optimized routines available in both kernelspace and
userspace by mapping it into each process' address space.  Such a
library could effectively automatically provide correct and optimized
assembly routines for the currently booted CPU/arch/subarch/etc, so
that userspace tools could be compiled once and run on an entire
family of CPUs without modification.  On the other hand, for those
applications that need every last ounce of speed (Including parts of
the kernel), you could pass appropriate options to the compiler to
tell it to inline the assembly routines (alternative) for a single
CPU make/model.

Possibly some of the generic-arch stuff should be pushed back
upstream to GCC, maybe have __builtin_{s,u,i,f}{8,16,32,64,128} types,
etc, provided directly by GCC, so we don't have to mess with that
so much.


We should probably also consider the licensing of headers that are
meant to be included into userspace.  Userspace still includes a fair
bit of GPL headers, which is technically not kosher.


I think that this is mostly a nonissue.  The copyright holders of the
headers/inline assembly/etc should look at perhaps licensing those
as LGPL or providing an exception to allow glibc, klibc, etc to link
with them.  On the other hand, were glibc to use the optimized
routines to provide the Standard C Library, programs using said
Standard C Library would not be infringing, because just like with
the "userspace <=syscall=> kernelspace" boundary, that does not imply
that the code is a derived work.  IANAL, however, so if you know one
who is willing to contribute some time, this might be an interesting
issue.  (Also:  What procedure might be required to get some of the
stuff relicensed as LGPL?  How do we find all significant copyright
holders/contributors from whom we need permission?)

Thanks for the encouraging posts!  It's good to hear that others are
interested in the project, because maybe I won't need to do it _all_
myself :-D.  I'll take a look at the patches mentioned, to get more
of an idea on the various technical issues.

Cheers,
Kyle Moffett

--
Simple things should be simple and complex things should be possible
  -- Alan Kay



-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC] Splitting out kernel<=>userspace ABI headers

2005-09-02 Thread Kyle Moffett


On Sep 2, 2005, at 19:24:22, H. Peter Anvin wrote:

Kyle Moffett wrote:

My far-into-the-future ideal for this is to have a generic vDSO-type
library that is compiled into the kernel that provides a  
collection of

architecture-optimized routines available in both kernelspace and
userspace by mapping it into each process' address space.  Such a
library could effectively automatically provide correct and optimized
assembly routines for the currently booted CPU/arch/subarch/etc, so
that userspace tools could be compiled once and run on an entire
family of CPUs without modification.  On the other hand, for those
applications that need every last ounce of speed (Including parts of
the kernel), you could pass appropriate options to the compiler to
tell it to inline the assembly routines (alternative) for a single
CPU make/model.


I don't see why this should be compiled into the kernel.


The kernel already needs those same optimized routines for its own
operation (EX: all the ASM alternative() statements).  Since userspace
wants some of those as well, it would make sense to share them between
kernel and userspace and reduce the number of libraries you would need
to optimize when adding a new arch.  I don't think that we should add
optimized assembly for things that _aren't_ needed in the kernel, but
it should share what code it does have.

A side benefit of the vDSO method is that you would be able to take a
standard distro install and have the kernel automatically select the
correct vDSO image at runtime, simultaneously optimizing itself and
chunks of userspace.

Cheers,
Kyle Moffett

--
Unix was not designed to stop people from doing stupid things,  
because that

would also stop them from doing clever things.
  -- Doug Gwyn


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC] Splitting out kernel<=>userspace ABI headers

2005-09-02 Thread Kyle Moffett


On Sep 2, 2005, at 20:07:58, H. Peter Anvin wrote:

Followup to:  <[EMAIL PROTECTED]>
By author:Erik Andersen <[EMAIL PROTECTED]>
In newsgroup: linux.dev.kernel


That would be wonderful.


It would be especially nice if everything targeting user space
were to use only all the nice standard ISO C99 types as defined
in include/stdint.h such as uint32_t and friends...


Absolutely not.  This would be a POSIX namespace violation; they
*must* use double-underscore types.


I would actually be more inclined to provide and use types like
_kabi_{s,u}{8,16,32,64}, etc.  Then the glibc/klibc/etc authors would
have the option of just doing "typedef _kabi_u32 uint32_t;" in their
header files.

Cheers,
Kyle Moffett

-BEGIN GEEK CODE BLOCK-
Version: 3.12
GCM/CS/IT/U d- s++: a18 C>$ UB/L/X/*(+)>$ P+++()>$ L(+ 
++) E
W++(+) N+++(++) o? K? w--- O? M++ V? PS+() PE+(-) Y+ PGP+++ t+(+++) 5  
X R?

tv-(--) b(++) DI+ D+ G e->$ h!*()>++$ r  !y?(-)
--END GEEK CODE BLOCK--


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC] Splitting out kernel<=>userspace ABI headers

2005-09-02 Thread Kyle Moffett


On Sep 2, 2005, at 20:34:11, H. Peter Anvin wrote:

Kyle Moffett wrote:

I would actually be more inclined to provide and use types like
_kabi_{s,u}{8,16,32,64}, etc.  Then the glibc/klibc/etc authors would
have the option of just doing "typedef _kabi_u32 uint32_t;" in their
header files.


They have to be *double-underscore*.

We have that.  They're called __[su]{8,16,32,64}.


I realize this completely.  The point of moving to kabi/* and kcore/*
would be to remove the dependence of userspace-accessible headers on
kernel-internal stuff.  As I see it, part of that means exporting a
reasonably clean and straightforward API from kabi/kcore, including a
decent namespace prefix.  The goal would be something that the kernel
headers could map to types useable in kernel code, that various *libc
in userspace could map to POSIX types, and that would have a nice
prefix to be namespace clean and avoid the risk of contamination.
Given this set of goals, I think that something like the below would
probably work and satisfy the needs of both *libc and the kernel:



/* kcore/types.h */
typedef unsigned char __kabi_u8;
typedef   signed char __kabi_s8;
typedef [...]



/* linux/types.h */
#include 

#ifndef __KERNEL__
# warning "Insert some kind of deprecation warning here
#endif

  /* These for compatibility only.  When the last ABI headers move
 to kcore or kabi, these should go in __KERNEL__ */
typedef __kabi_u8 __u8;
typedef __kabi_s8 __s8;
[...]

#ifdef __KERNEL__
typedef __kabi_u8 u8;
typedef __kabi_s8 s8;
#endif



/* stdint.h */
#include 
typedef __kabi_u8 uint8_t;
typedef __kabi_s8 int8_t;
[...]



Cheers,
Kyle Moffett

--
There is no way to make Linux robust with unreliable memory subsystems,
sorry.  It would be like trying to make a human more robust with an
unreliable O2 supply. Memory just has to work.
  -- Andi Kleen


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC] Splitting out kernel<=>userspace ABI headers

2005-09-02 Thread Kyle Moffett


On Sep 3, 2005, at 00:28:59, Erik Andersen wrote:

Absolutely not.  This would be a POSIX namespace violation; they
*must* use double-underscore types.


I assume you are worried about the stuff under asm that ends up
being included by nearly every header file in the world.  Of
course asm must use double-underscore types.  But the thing is,
the vast majority of the kernel headers live under
linux/include/linux/ and do not use double-underscore types, they
use kernel specific, non-underscored types such as s8, u32, etc.
My copy of IEEE 1003.1 and my copy of ISO/IEC 9899:1999 both fail
to prohibit using the shiny new ISO C99 type for the various
#include  header files, which is what I was suggesting.


Anything in linux/* that is included by userspace should not
presume that stdint.h has already been included or include it on
its own, because the userspace program may have already made its
own definitions of uint32_t, or it may not want them defined at
all.


The world would be so much nicer a place if user space were free
to #include linux/* header files rather than keeping a
per-project private copy of all kernel structs of interest.


Exactly!  This is why I want to create kcore/* and kabi/* that
define the appropriate types, then both userspace and the kernel
could use whatever types fit their fancy, defined in terms of the
__kcore_ and __kabi_ types, which could be _depended_ on to exist
because they are guaranteed not to conflict with other namespaces

Cheers,
Kyle Moffett

--
I have yet to see any problem, however complicated, which, when you  
looked at

it in the right way, did not become still more complicated.
  -- Poul Anderson



-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC] Splitting out kernel<=>userspace ABI headers

2005-09-02 Thread Kyle Moffett


On Sep 3, 2005, at 01:57:26, H. Peter Anvin wrote:

Kyle Moffett wrote:


The world would be so much nicer a place if user space were free
to #include linux/* header files rather than keeping a
per-project private copy of all kernel structs of interest.

Exactly!  This is why I want to create kcore/* and kabi/* that
define the appropriate types, then both userspace and the kernel
could use whatever types fit their fancy, defined in terms of the
__kcore_ and __kabi_ types, which could be _depended_ on to exist
because they are guaranteed not to conflict with other namespaces


Agreed.  We should use well-defined namespaces that won't conflict.
However, I think the __[us][0-9]+ namespace can be considered
well-established.


True, however, IMNSHO it would be much better if the kcore/kabi stuff  
had

a _consistent_ namespace as well.  If every macro begins with "__KABI_"
and every type and function with "__kabi_" (With a few function-like  
macro
exceptions, of course), then it is trivial to see where it originally  
came
from and provides a standard naming scheme that external parties can  
kind

of rely upon.  It also means there are fewer exceptions to remember when
coding.  My thought for the __[us][0-9]+ types is that they should still
be defined in linux/types.h for compatibility (outside of __KERNEL__)  
and

based off the __kabi_* types.

Cheers,
Kyle Moffett

--
There are two ways of constructing a software design. One way is to  
make it so
simple that there are obviously no deficiencies. And the other way is  
to make
it so complicated that there are no obvious deficiencies.  The first  
method is

far more difficult.
  -- C.A.R. Hoare


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC] Splitting out kernel<=>userspace ABI headers

2005-09-03 Thread Kyle Moffett


On Sep 3, 2005, at 11:36:22, Denis Vlasenko wrote:

Is this an exercise in academia? Userspace app which defines
uint32_t to anything different than 'typedef '
deserves the punishment, and one which does have such typedef
instead of #include stdint.h will not notice.


That's not the issue.  Say I do this (which is perfectly valid on
most platforms):

typedef unsigned int uint32_t;
#include 

What exactly should happen?  If linux/loop.h includes stdint.h to get
uint32_t, then I'll get duplicate definition errors.  If it omits
stdint.h, then uint16_t won't be defined (because the userspace app
doesn't think that it needs it) and I'll get undefined type errors.
Either way, depending on the existence or nonexistence of the POSIX
types in userspace-accessible kernel headers is not viable.


All these u32, uint32_t, __u32 end up typedef-ing to same
integer type anyway...


The point is to provide a type that _isn't_ in some standard so that
_we_ can define its inclusion rules.  If the standards had gone and
defined "Userspace must include stdint.h or define _all_ types
appropriately", then we would not have had this issue, but many apps
in userspace would cease to compile on standards compliant platforms.

Cheers,
Kyle Moffett

--
Somone asked me why I work on this free (http://www.fsf.org/philosophy/)
software stuff and not get a real job. Charles Shultz had the best  
answer:


"Why do musicians compose symphonies and poets write poems? They do  
it because
life wouldn't have any meaning for them if they didn't. That's why I  
draw

cartoons. It's my life."
  -- Charles Shultz


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC] Splitting out kernel<=>userspace ABI headers

2005-09-03 Thread Kyle Moffett


On Sep 3, 2005, at 11:19:17, H. Peter Anvin wrote:
Thus, an ABIzed  or whatever it's called might  
export
"struct __kabi_stat" and "struct __kabi_stat64" with the  
expectation that
the caller would "#define __kabi_stat64 stat" if that is the  
version they
want.  A typedef isn't good enough for C, since you can't typedef  
struct

tags.


Didn't you mean "#define stat __kabi_stat64"?  Also, I can see that  
would

pose other issues as well say my app does "struct stat stat;"  Any error
messages would refer to a variable "__kabi_stat64" instead of the  
expected

"stat":

A userspace program:
struct stat stat;
stat.invalid = 1;

Preprocesses into:
struct __kabi_stat64 __kabi_stat64;
__kabi_stat64.invalid = 1;

And gives an error something like this for that line, confusing the
programmer:
Invalid member "invalid" for "__kabi_stat64"


As far as I can tell, this is not a solvable issue unless GCC can  
come up

with a way to either:
typedef struct foo struct bar;
or
struct bar { unnamed struct foo; };
the former being much nicer.  On the other hand, I think the following
should work, because the st_* names are within the C namespace and  
should

be much easier to redefine, although misuse of one of those names might
be a bit more catastrophic for the user app.

struct stat {
struct __kabi_stat64 __stat64;
};
#define st_dev __stat64.st_dev
#define st_ino __stat64.st_ino
[...]

Then the userspace program could do this:
struct stat foo;
foo.st_ino = 0;

And it would be preprocessed into:
struct stat foo;foo.__stat64.st_ino = 0;

Cheers,
Kyle Moffett

--
Somone asked me why I work on this free (http://www.fsf.org/philosophy/)
software stuff and not get a real job. Charles Shultz had the best  
answer:


"Why do musicians compose symphonies and poets write poems? They do  
it because
life wouldn't have any meaning for them if they didn't. That's why I  
draw

cartoons. It's my life."
  -- Charles Shultz


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: RFC: i386: kill !4KSTACKS

2005-09-04 Thread Kyle Moffett


On Sep 4, 2005, at 23:41:58, Alex Davis wrote:

--- Sean <[EMAIL PROTECTED]> wrote:

It's not a philosophical issue, it's what Linux _is_: an open source
operating system! That's what the developers are working on; not your
half-baked vision.

Um, ever hear of 'compromise'?? All I'm saying is let people use what
currently works until we can get an open-source solution.  
Ndiswrapper's
existence is not stopping you (or anyone else) from pestering  
manufacturers
for spec's and writing drivers. I look at ndiswrapper as a stop-gap  
solution.
Hey, even Linus himself has said 'better a sub-optimal solution  
than no solution'.


In any case, this discussion is moot because the kernel API is changing
for the better and there is a clearly defined fix for ndiswrapper that
will allow it to continue to work even with the new interface:  allocate
a separate ndiswrapper stack (IE: Not the kernel stacks).  The kernel is
under no obligation not to break out-of-tree drivers, etc, even semi- 
non-

-binary-only ones such as ndiswrapper.  Figure out how to fix it and
move on!

Cheers,
Kyle Moffett

--
Q: Why do programmers confuse Halloween and Christmas?
A: Because OCT 31 == DEC 25.



-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: RFC: i386: kill !4KSTACKS

2005-09-05 Thread Kyle Moffett


On Sep 5, 2005, at 18:32:32, Thorild Selen wrote:

Adrian Bunk <[EMAIL PROTECTED]> writes:

Please name situations where 8K stacks may be preferred that do not
involve binary-only modules.


How about NFS-exporting a filesystem on LVM atop md?  I believe it has
been mentioned before in discussions that 8k stacks are strongly
recommended in this case.  Are those issues solved?


I think the worst overflow case anyone found was  
nfs=>xfs=>lvm=>dm=>scsi, if
someone has such a configuration, please retest with current -mm or  
similar.
I think there are several patches in there to resolve the excessive  
stack
usage and a few to do some sort of bio chaining (Instead of recursive  
calls).
I don't remember what underlying hardware was behind the SCSI, but I  
suspect
something like iSCSI or USB would push some extra stack in there for  
stress

testing.

Cheers,
Kyle Moffett

--
I have yet to see any problem, however complicated, which, when you  
looked at

it in the right way, did not become still more complicated.
  -- Poul Anderson



-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC] Splitting out kernel<=>userspace ABI headers

2005-09-05 Thread Kyle Moffett


On Sep 5, 2005, at 12:35:42, H. Peter Anvin wrote:

Followup to:  <[EMAIL PROTECTED]>
By author:    Kyle Moffett <[EMAIL PROTECTED]>
In newsgroup: linux.dev.kernel


Didn't you mean "#define stat __kabi_stat64"?  Also, I can see that
would pose other issues as well say my app does "struct stat stat;"
Any error messages would refer to a variable "__kabi_stat64" instead
of the expected "stat":


No, I didn't.  That's *exactly* why I didn't mean that.

#define __kabi_stat64 stat
#include 

That being said, I would personally like to see it possible to typedef
struct, union and enum tags.


_OH_!!! Forgive me for missing the point entirely!  I can see how  
that would
work very well.  Nice trick, BTW!  Very sneaky, needs significant  
explanatory
comments in whatever header file it ends up in lest others get  
confused in
the same fashion as I.  With all of that mess out of the way, I'll  
work on
getting a few initial RFC patches out the door, and then we can  
revisit this

discussion once there is something tangible to talk about.

Cheers,
Kyle Moffett

--
Somone asked me why I work on this free (http://www.fsf.org/philosophy/)
software stuff and not get a real job. Charles Shultz had the best  
answer:


"Why do musicians compose symphonies and poets write poems? They do  
it because
life wouldn't have any meaning for them if they didn't. That's why I  
draw

cartoons. It's my life."
  -- Charles Shultz


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[RFC][MEGAPATCH] Change ASSEMBLY to ASSEMBLER (defined by GCC from 2.95 to current CVS)

2005-09-05 Thread Kyle Moffett


On Sep 5, 2005, at 19:28:07, Kyle Moffett wrote:
With all of that mess out of the way, I'll work on getting a few  
initial RFC
patches out the door, and then we can revisit this discussion once  
there is

something tangible to talk about.


Ugh.  Step one for my cleanup is to rename __ASSEMBLY__ to something  
defined
automatically by GCC (IE: __ASSEMBLER__).  And yes, I checked,  
__ASSEMBLER__
is defined by everything from old 2.95 to 4.0, even though it wasn't  
really
documented in anything older than 3.4.  This megapatch is basically a  
search
and replace of __ASSEMBLY__ with __ASSEMBLER__ over the whole kernel  
source,
except in Makefiles, where I just delete the -D__ASSEMBLY__  
argument.  If
this is generally acceptable, I'll break it up into small digestible  
pieces
and send to individual maintainers, unless someone wants to pass the  
whole
monster through their tree in one big lump.  This is a lot of code  
churn,
but it's a valid cleanup and will help me out as I try to make more  
of the

kernel headers easily digestible for userspace.

Ok, the patch itself is temporarily located here (Please be nice to my
desktop, it has a 650MB/day upload limit imposed by Virginia Tech  
that I'd

rather not go over) [patch is 308k]:

http://zeus.moffetthome.net/~kyle/rename-__ASSEMBLY__-to- 
__ASSEMBLER__.patch


And here's the diffstat [27k]

http://zeus.moffetthome.net/~kyle/rename-__ASSEMBLY__-to- 
__ASSEMBLER__.diffstat


Cheers,
Kyle Moffett

--
Unix was not designed to stop people from doing stupid things,  
because that

would also stop them from doing clever things.
  -- Doug Gwyn



-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Modifying Cryptography code

2005-09-06 Thread Kyle Moffett


On Sep 6, 2005, at 08:38:48, Alaa Dalghan wrote:

What I am looking for is the portion of the C code in the kernel where
the Decryption function is called to decrypt a received packet. When I
find this statement, maybe i can make it conditionnal such as:  If the
destination is me then Decrypt  else DO NOT!


You can't make this work.  First of all, the other WinXP clients would
be completely unable to decrypt your packets, because they don't have
the right key.  Secondly, the kernel cannot know what the destination
is until *after* it has decrypted the packet, because the real target
address is encrypted along with the rest of the data for security.  If
your OpenSwan box is too slow, get a faster OpenSwan box, don't try to
break the encryption to make it faster.  You cannot remove enough
encryption features to get the required extra speed without disabling
the encryption entirely.

Cheers,
Kyle Moffett

-BEGIN GEEK CODE BLOCK-
Version: 3.12
GCM/CS/IT/U d- s++: a18 C>$ UB/L/X/*(+)>$ P+++()>$ L(+ 
++) E
W++(+) N+++(++) o? K? w--- O? M++ V? PS+() PE+(-) Y+ PGP+++ t+(+++) 5  
X R?

tv-(--) b(++) DI+ D+ G e->$ h!*()>++$ r  !y?(-)
--END GEEK CODE BLOCK--


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [ham] Re: Gracefully killing kswapd, or any kernel thread

2005-09-07 Thread Kyle Moffett


On Sep 7, 2005, at 17:07:12, Kristis Makris wrote:
To kill a kernel thread, you need to make __it__ call exit(). It  
must be


There must be another way to do it. Perhaps one could have another
process effectively issue the contents of do_exit for the kswapd
task_struct ?


Umm, so then the kernel does what, exactly?  You have a process in some
indeterminate state, possibly holding semaphores, definitely pinning
memory/resources/etc, and you just stop it, turn it off, and expect
things to continue working?  This is similar in nature to that thread
a while ago about kernel error recovery and killing uninterruptible
user processes.  To extend this to kernel threads, unless the kernel
thread has been _specifically_ coded to be interruptible, it isn't, and
furthermore, *can't* be.


CODED to do that! You can't do it externally although you can send


I'm clearly asking for the case where the thread wasn't coded to do
that.


You can't.  This is flatly impossible.  Go see the thread a while back
about a hot-patch system call for several reasons why that is a bad
idea.  In particular, look at the post that discusses phone switches,
the one with the quote "'So why don't you just reboot the affected
switches?' [...] 'That assumes the switches had ever been booted in
the first place'".


it a signal, after which it will spin forever


kflushd and keventd don't seem to spin forever. I still haven't
determined what makes kswapd spin forever after it receives the  
signal.


Probably a while(1) loop that isn't intended to stop until the machine
physically powers off.  If you want to patch one specific kernel thread,
you might be able to do that, but you can't just expect to hot-patch
random parts of the kernel at runtime and have things work.

Cheers,
Kyle Moffett

-BEGIN GEEK CODE BLOCK-
Version: 3.12
GCM/CS/IT/U d- s++: a18 C>$ UB/L/X/*(+)>$ P+++()>$ L(+ 
++) E
W++(+) N+++(++) o? K? w--- O? M++ V? PS+() PE+(-) Y+ PGP+++ t+(+++) 5  
X R?

tv-(--) b(++) DI+ D+ G e->$ h!*()>++$ r  !y?(-)
--END GEEK CODE BLOCK--


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: freeze vs freezer

2008-01-04 Thread Kyle Moffett


On Jan 04, 2008, at 15:54:06, Oliver Neukum wrote:

Am Donnerstag, 3. Januar 2008 23:06:07 schrieb Nigel Cunningham:

Hi.

a) mount fuse on /tmp/first
b) mount fuse on /tmp/second

Then the server task for (a) does "ls /tmp/second". So it will be  
frozen, right? How do you then freeze (a)? And keep in mind that  
the server task may have forked.


I guess I should first ask, is this a real life problem or a  
hypothetical twisted web? I don't see why you would want to make  
two filesystems interdependent - it sounds like the way to create  
livelock and deadlocks in normal use, before we even begin to  
think about hibernating.


Good questions. I personally don't use fuse, but I do care about  
power management. The problem I see is that an unprivileged user  
could make that dependency, even inadvertedly.


I don't think it makes sense for the kernel to try to keep track of  
hard data dependencies for FUSE filesystems, or to even *attempt* to  
auto-suspend them.  You should instead allow a privileged program to  
initiate a "freeze-and-flush" operation on a particular FUSE  
filesystem and optionally wait for it to finish.  Then your userspace  
would be configured with the appropriate data dependencies and would  
stop FUSE filesystems in the appropriate order.


In addition, the kernel would automatically understand  
ext3=>loopback=>fuse, and when asked to freeze the "fuse" part, it  
would first freeze the "ext3" and the "loopback" parts using similar  
mechanisms as device-mapper currently uses when you do "dmsetup  
suspend mydev" followed by "echo 0 $SIZE snapshot /dev/mapper/mydev- 
base /dev/mapper/mydev-snap-back p 8 | dmsetup load mydev"  (IE: when  
you create a snapshot of a given device).


Naturally userspace could deadlock itself (although not the kernel)  
by freezing a block device and then attempting to access it, but  
since the "freeze" operation is limited to root this is not a big  
issue.  The way to freeze all filesystems safely would be to clone a  
new mount namespace, mlockall(), mount a tmpfs, pivot_root() into the  
tmpfs, bind-mount the filesystems you want to freeze directly onto  
subdirectories of the tmpfs, and then freeze them in an appropriate  
order.


Besides which the worst-case is a pretty straightforward non-critical  
failure; you might fail to fully sync a FUSE filesystem because its  
daemon is asleep waiting on something (possibly even just sitting in  
a "sleep(1)" call with all signals masked).  You simply need to  
make sure that all tasks are asleep outside of driver critical  
sections so that you can properly suspend your device tree.


Cheers,
Kyle Moffett
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: The ext3 way of journalling

2008-01-08 Thread Kyle Moffett


On Jan 08, 2008, at 15:51:53, Andi Kleen wrote:

Theodore Tso <[EMAIL PROTECTED]> writes:
Now, there are good reasons for doing periodic checks every N  
mounts and after M months.  And it has to do with PC class  
hardware.  (Ted's aphorism: "PC class hardware is cr*p").


If these reasons are good ones (some skepticism here) then the  
correct way to really handle this would be to do regular background  
scrubbing during runtime; ideally with metadata checksums so that  
you can actually detect all corruption.


Poor man's background scrubbing:

(A)  Use LVM like virtually all modern distros offer
(B)  Leave some extra space in your LVM volume group (enough for 1  
snapshot over the time it takes to do an FSCK).

(C)  Periodically run the following scriptlet:

set -e
START="$(date +'%Y%m%d%H%M%S')"
lvcreate -s -n "${VOLUME}-snap" "${VG}/${VOLUME}"
if nice +20 fsck -fy "/dev/mapper/${VG}_${VOLUME}-snap"; then
echo 'Background scrubbing succeeded!'
tune2fs -T "${START}" "/dev/mapper/${VG}_${VOLUME}"
else
echo 'Background scrubbing failed!  Reboot to fsck soon!'
tune2fs -C 16383 -T "19000101" "/dev/mapper/${VG}_${VOLUME}"
fi
lvremove "${VG}/${VOLUME}-snap"

Basically you can fsck the offline snapshot in the background.  If it  
succeeds you can adjust the "last checked" date to the time when the  
snapshot was taken and if it fails you can schedule an FSCK at next  
reboot (and possibly remount the filesystem read-only or reboot  
immediately).


You can do the same thing for your /boot volume, although you  
probably have to manually use dmsetup since most bootloaders can't  
interpret LVM volumes.


I've always been surprised that distros like RedHat which  
automatically use LVM don't stuff this in their weekly or monthly  
checks on desktop systems.  User experience could also be  
dramatically improved with automated smartd configuration and user- 
interactive logging and warning messages.



But since fsck is so slow and disks are so big this whole thing is  
a ticking time bomb now. e.g. it is not uncommon to require tens of  
minutes or even hours of fsck time and some server that reboots  
only every few months will eat that when it happens to reboot. This  
means you get a quite long downtime.


My servers all have an "interval-between-checks" of 2-6 weeks and are  
configured to run nice +20 background "fsck" checks during off-hours  
between once every few days and once every few weeks.  I also have  
the "max mount count" numbers set to primes between 7 and 37  
(depending on the filesystem) so that troubled or frequently-rebooted  
systems are more frequently verified.  The end result is that I  
almost never have the dreaded 4-hour-fsck-on-boot problem.  A drive  
has certainly been fscked within the last few weeks of operation, and  
I will only ever have multiple large filesystems all fscked at the  
same time very rarely (gcd of their max-mount-counts).


Cheers,
Kyle Moffett

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: yield API

2007-12-12 Thread Kyle Moffett


On Dec 12, 2007, at 17:39:15, Jesper Juhl wrote:

On 02/10/2007, Ingo Molnar <[EMAIL PROTECTED]> wrote:
sched_yield() has been around for a decade (about three times  
longer than futexes were around), so if it's useful, it sure  
should have grown some 'crown jewel' app that uses it and shows  
off its advantages, compared to other locking approaches, right?


I have one example of sched_yield() use in a real app.  
Unfortunately it's proprietary so I can't show you the source, but  
I can tell you how it's used.


The case is this:  Process A forks process B. Process B does some  
work that takes aproximately between 50 and 1000ms to complete  
(varies), then it creates a file and continues to do other work.   
Process A needs to wait for the file B creates before it can  
continue. Process A *could* immediately go into some kind of "check  
for file; sleep n ms" loop, but instead it starts off by calling  
sched_yield() to give process B a chance to run and hopefully get  
to the point where it has created the file before process A is  
again scheduled and starts to look for it - after the single sched  
yield call, process A does indeed go into a "check for file; sleep  
250ms;" loop, but most of the time the initial sched_yield() call  
actually results in the file being present without having to loop  
like that.


That is a *terrible* disgusting way to use yield.  Better options:
  (1) inotify/dnotify
  (2) create a "foo.lock" file and put the mutex in that
  (3) just start with the check-file-and-sleep loop.


Now is this the best way to handle this situation? No.  Does it  
work better than just doing the wait loop from the start? Yes.


It works better than doing the wait-loop from the start?  What  
evidence do you provide to support this assertion?  Specifically, in  
the first case you tell the kernel "I'm waiting for something but I  
don't know what it is or how long it will take"; while in the second  
case you tell the kernel "I'm waiting for something that will take  
exactly X milliseconds, even though I don't know what it is.  If you  
really want something similar to the old behavior then just replace  
the "sched_yield()" call with a proper sleep for the estimated time  
it will take the program to create the file.



Is this a good way to use sched_yield()? Maybe, maybe not.  But it  
*is* an actual use of the API in a real app.


We weren't looking for "actual uses", especially not in binary-only  
apps.  What we are looking for is optimal uses of sched_yield(); ones  
where that is the best alternative.  This... certainly isn't.


Cheers,
Kyle Moffett

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 5/9] bfs: move function prototype to the proper header file

2008-01-24 Thread Kyle Moffett


On Jan 24, 2008, at 18:13, Dmitri Vorobiev wrote:

Heikki Orsila пишет:

On Fri, Jan 25, 2008 at 01:32:04AM +0300, Dmitri Vorobiev wrote:

+/* inode.c */
+extern void dump_imap(const char *, struct super_block *);
+


Functions should not be externed, remove extern keyword.


Care to explain why?

Following is an explanation why the contrary is probably true:

1) We have lots of precedents in existing code:

[EMAIL PROTECTED]:~/Projects/misc/linux$ git-grep 'extern void' include |  
wc -l

5523
[EMAIL PROTECTED]:~/Projects/misc/linux$



The "extern" keyword on functions is *completely* redundant.

For C variables:
  Declaration:  extern int foo;
  Definition:   int foo;
  File-scoped:  static int foo;

For C functions:
  Declaration:  void foo(int x);
  Definition:   void foo(int x) { /*...body...*/ }
  File-scoped:  static void foo(int x) { /*...body...*/ }

The compiler will *allow* you to use "extern" on the function  
prototype, but the presence or absence of a function body is  
sufficiently obvious for it to determine whether the prototype is a  
declaration or a definition that the "extern" keyword is not required  
and therefore redundant.


For maximum readability and cleanliness I recommend that you leave off  
the "extern" on the function declarations; it makes the lines much  
longer without obvious gain.


Cheers,
Kyle Moffett


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] Version 3 (2.6.23-rc8) Smack: Simplified Mandatory Access Control Kernel

2007-10-04 Thread Kyle Moffett

g there.  Perhaps next time  
I'm bored.


I think a fair amount of what we need is already done in SELinux, and  
efforts would be better spent in figuring out what seems too  
complicated in SELinux and making it simpler.  Probably a fair amount  
of that just means better tools.


Cheers,
Kyle Moffett

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] Version 3 (2.6.23-rc8) Smack: Simplified Mandatory Access Control Kernel

2007-10-04 Thread Kyle Moffett


On Oct 05, 2007, at 00:45:17, Eric W. Biederman wrote:

Kyle Moffett <[EMAIL PROTECTED]> writes:


On Oct 04, 2007, at 21:44:02, Eric W. Biederman wrote:
SElinux is not all encompassing or it is generally  
incomprehensible I don't know which.  Or someone long ago would  
have said a better  way to implement containers was with a  
selinux ruleset, here is a  selinux ruleset that does that.   
Although it is completely possible  to implement all of the  
isolation with the existing LSM hooks as  Serge showed.


The difference between SELinux and containers is that SELinux (and  
LSM as a whole) returns -EPERM to operations outside the scope of  
the  subject, whereas containers return -ENOENT (because it's not  
even in  the same namespace).


Yes.  However if you look at what the first implementations were.   
Especially something like linux-vserver.  All they provided was  
isolation.  So perhaps you would not see every process ps but they  
all had unique pid values.


I'm pretty certain Serge at least prototyped a simplified version  
of that using the LSM hooks.  Is there something I'm not remember  
in those hooks that allows hiding of information like processes?


Yes. Currently with containers we are taking that one step farther  
as that solves a wider set of problems.


IMHO, containers have a subtly different purpose from LSM even though  
both are about information hiding.  Basically a container is  
information hiding primarily for administrative reasons; either as a  
convenience to help prevent errors or as a way of describing  
administrative boundaries.  For example, even in an environment where  
all sysadmins are trusted employees, a few head-honcho sysadmins  
would get root container access, and all others would get access to  
specific containers as a way of preventing "oops" errors.  Basically  
a container is about "full access inside this box and no access  
outside".


By contrast, LSM is more strictly about providing *limited* access to  
resources.  For an accounting business all client records would  
grouped and associated together, however those which have passed this  
year's review are read-only except by specific staff and others may  
have information restricted to some subset of the employees.


So containers are exclusive subsets of "the system" while LSM should  
be about non-exclusive information restriction.



We also have in the kernel another parallel security mechanism  
(for what is generally a different class of operations) that has  
been  quite successful, and different groups get along quite  
well, and  ordinary mortals can understand it.   The linux  
firewalling code.


Well, I wouldn't go so far as the "ordinary mortals can understand  
it" part; it's still pretty high on the obtuse-o-meter.


True.  Probably a more accurate statement is:`unix command line  
power users can and do handle it after reading the docs.  That's  
not quite ordinary mortals but it feels like it some days.  It  
might all be perception...


I have seen more *wrong* iptables firewalls than I've seen correct  
ones.  Securing TCP/IP traffic properly requires either a lot of  
training/experience or a good out-of-the-box system like Shorewall  
which structures the necessary restrictions for you based on an  
abstract description of the desired functionality.  For instance what  
percentage of admins do you think could correctly set up their  
netfilter firewalls to log christmas-tree packets, smurfs, etc  
without the help of some external tool?  Hell, I don't trust myself  
to reliably do it without a lot of reading of docs and testing, and  
I've been doing netfilter firewalls for a while.


The bottom line is that with iptables it is *CRITICAL* to have a good  
set of interface tools to take the users' "My system is set up  
like..." description in some form and turn it into the necessary set  
of efficient security rules.  The *exact* same issue applies to  
SELinux, with 2 major additional problems:


1)  Half the tools are still somewhat beta-ish and under heavy  
development.  Furthermore the semi-official reference policy is  
nowhere near comprehensive and pretty ugly to read (go back to the  
point about the tools being beta-ish).


2)  If you break your system description or translation tools then  
instead of just your network dying your entire *system* dies.



The linux firewalling codes has hooks all throughout the  
networking stack, just like the LSM has hooks all throughout the  
rest of linux  kernel.  There is a difference however.  The linux  
firewalling code in addition to hooks has tables behind those  
hooks that it  consults. There is generic code to walk those  
tables and consult with different kernel modules to decide if we  
should drop a packet.  Each of those kernel modules provides a  
different capability that can be used to genera

Re: [PATCH] Replace __attribute_pure with pure

2007-10-06 Thread Kyle Moffett


Trimmed the CC list a bit

On Oct 05, 2007, at 20:51:21, H. Peter Anvin wrote:

Ralf Baechle wrote:
To be consistent with the use of attributes in the rest of the  
kernel replace all use of __attribute_pure__ with __pure and  
delete the definition of __attribute_pure__.


Concern: __attribute_pure__ is very similar to __attribute_const__,  
which is almost completely, but not totally unlike the keyword  
"const"...


Yes, there's also the fact that __pure is a reserved GCC keyword.   
Essentially according to GCC docs all of the GCC-specific keywords  
are equivalently defined as "keyword", "__keyword", and  
"__keyword__", with only the latter two defined in strict-ANSI mode.   
The following is valid according to GCC docs:


static int __attribute__((__pure)) my_strlen(const char *str);

With the proposed definition of __pure, that becomes a noticeably  
invalid __attribute__((__attribute__((__pure__



Cheers,
Kyle Moffett

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: idio{,ma}tic typos (was Re: + fix-vm_can_nonlinear-check-in-sys_remap_file_pages.patch added to -mm tree)

2007-10-11 Thread Kyle Moffett


On Oct 11, 2007, at 03:35:37, Alexey Dobriyan wrote:

Sadly, yes.

[PATCH] smctr: fix "|| 0x" typo

IBM_PASS_SOURCE_ADDR is 1, so logically ORing it with status bits is
pretty useless. Do bitwise OR, instead.

Signed-off-by: Alexey Dobriyan <[EMAIL PROTECTED]>
---

 drivers/net/tokenring/smctr.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

--- a/drivers/net/tokenring/smctr.c
+++ b/drivers/net/tokenring/smctr.c
@@ -3413,7 +3413,7 @@ static int smctr_make_tx_status_code(struct  
net_device *dev,

 tsv->svi = TRANSMIT_STATUS_CODE;
 tsv->svl = S_TRANSMIT_STATUS_CODE;

-tsv->svv[0] = ((tx_fstatus & 0x0100 >> 6) ||  
IBM_PASS_SOURCE_ADDR);
+tsv->svv[0] = ((tx_fstatus & 0x0100 >> 6) |  
IBM_PASS_SOURCE_ADDR);


 /* Stripped frame status of Transmitted Frame */
 tsv->svv[1] = tx_fstatus & 0xff;


Hmm, here's a question for you:  The old code was equivalent to "tsv- 
>svv[0] = 1;", what's your proof that we don't rely on this "bug"  
elsewhere in the code?  In other words, this is a significant  
behavior change (albeit fixing an apparent bug) from what we've done  
for a while.  You might want to do a git-blame on this bit of code to  
see who the last person to modify it was and ask them to test or  
confirm the patch first.  The same general questions apply to the  
other logical-op bugs.


Cheers,
Kyle Moffett

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: "mount --bind" with user/group/mode definition?

2007-10-11 Thread Kyle Moffett


On Oct 11, 2007, at 04:35:37, Ph. Marek wrote:
is there some way to duplicate a directory somewhere else (like  
with "mount --bind"), but having different owner/group/mode bits?


I'd like to mount a directory I have no control over (think NFS, or  
floppy, ...) with clearly defined rights - like root:,  
mode 0550 for all directories, and 0440 for all files. (Here I want  
to have full *read* control, regardless of the original permissions).
[ I know that this special case can be (mostly) done by a read-only  
binding mount; the part that is missing is eg. files with a  
different owner being 0700. ]


I know that something like this is possible for eg. VFAT, which has  
no right descriptors for itself; but I'd need that for arbitrary  
directory trees, who  themselves *have* permissions set.


Is there some way to achieve that?


Not at the moment, unfortunately.  I suspect that with the recent  
developments in user container support and/or overlay mounting it  
will become possible to either write a UID/GID-translation overlay  
filesystem or grant cross-UID-container keys to achieve what you  
want.  On the other hand that probably won't fully happen for up to a  
year or so.


Cheers,
Kyle Moffett
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] Version 3 (2.6.23-rc8) Smack: Simplified Mandatory Access Control Kernel

2007-10-11 Thread Kyle Moffett

Ok, finally getting some time to work on this stuff once again (life  
gets really crazy sometimes).  I would like to postulate that you can  
restate any SMACK policy as a functionally equivalent SELinux policy  
(with a few slight technical differences, see below).  I've been  
working on a script to do this but keep getting stuck tracking down  
minor bugs and then get dragged off on other things I need to do.   
Here is the method I am presently trying to implement:


First divide the SELinux access vectors into 7 groups based on which  
ones SMACK wishes to influence:

(R) Requires "read" permissions (the 'r' bit)
(W) Requires "write" permissions (the 'w' bit)
(X) Requires "execute" permissions (the 'x' bit)
(A) Requires "append" OR "write" permissions (the 'a' bit)
(P) Requires CAP_MAC_OVERRIDE
	(K) May not be performed by a non-CAP_MAC_OVERRIDE process on a  
CAP_MAC_OVERRIDE process

(N) Does not require any special permissions

The letters in front indicate the names I will use in the rest of  
this document to describe the sets of access vectors.


Next define a single SELinux user "smack", and two independent roles,  
"priv" and "unpriv".  We create the set of SMACK equivalence-classes  
defined as various SELinux types with substitutions for "*", "^",  
"_", and "?", and then completely omit the MLS portions of the  
SELinux policy.


The next step is to establish the fundamental constraints of the  
policy.  To prevent processes from gaining CAP_MAC_OVERRIDE we  
iterate over the access vectors in (K) and add the following  
constraint for each vector:

constrain $OBJECT_CLASS $ACCESS_VECTOR ((r1 == r2) || (r1 == priv))

This also includes:
constrain process transition ((r1 == r2) || (r1 == priv))

Then we require privilege to access the (P) vectors; for each vector  
in (P) we add a constraint:

constrain $OBJECT_CLASS $ACCESS_VECTOR (r1 == priv)

At this point the only rules left to add are the between-type rules.   
Here it gets mildly complicated because SMACK is a linear-lookup  
system (each rule must be matched in order) whereas SELinux is a  
globally-unique-lookup system (all rules are mutually exclusive and  
matched simultaneously).  Essentially for each SMACK rule:

$SOURCE $DEST $PERM_BITS

We iterate over all of the classes represented in the access vector  
lists in $PERM_BITS and create rules for each one:

allow { $SOURCE } { $DEST }:$PERM_CLASS { $PERM_VECTORS };

If you need SMACK to allow subtractive permissions then you need to  
expand that further, however I believe as an initial cut that it  
sufficient.


The only other task is to prepend the auto-generated object-class and  
access-vector lists to the policy and append the initial SIDs that  
smack wants various objects to have, as well as allowing the "smack"  
user the "priv" and "nopriv" roles and allowing those two roles entry  
into all of the SMACK types.  The resulting SELinux-ified SMACK  
labels would go from:


SomeLabel (with CAP_MAC_OVERRIDE)
AnotherLabel
YetAnotherLabel

to:

smack:priv:SomeLabel
smack:nopriv:AnotherLabel
smack:nopriv:YetAnotherLabel


Casey, hopefully this gives you some ideas about how I think you  
could modify the SELinux code to compile out the "user" field and  
simplify the "role" field as needed.  I'm still not seeing anything  
which SELinux cannot directly implement without additional code, even  
the "CAP_MAC_OVERRIDE" bit.  If the semantics don't seem quite right,  
please provide details about how you think the models differ and I  
will try to address the concerns.


Cheers,
Kyle Moffett

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] Version 3 (2.6.23-rc8) Smack: Simplified Mandatory Access Control Kernel

2007-10-11 Thread Kyle Moffett


On Oct 11, 2007, at 11:41:34, Casey Schaufler wrote:

--- Kyle Moffett <[EMAIL PROTECTED]> wrote:

[snipped]


I'm still waiting to see the proposed SELinux policy that does what  
Smack does.


That *is* the SELinux policy which does what Smack does.  I keep  
having bugs in the perl-script I'm writing on account of not having  
the time to really get around to fixing it, but that is exactly the  
procedure for generating an SELinux policy from a SMACK policy.


I can accept that you don't see anything that can't be implemented  
thus, but that's not the point. You've provided some really clear  
design notes, and that's great, but it ain't the code. You said  
that you could write a 500 line perl script that would do the whole  
thing, and that left some people with an impression that Smack is a  
subset of SELinux.  Well, I'm already finding myself digging out  
from under that missunderstanding, and with people who are assuming  
that your policy has been done, "proving" the point.


I'd love to have time to finish the script but unfortunately real  
life keeps interfering and I'm going to have to go back to lurking on  
this thread.


Cheers,
Kyle Moffett

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] Reserve N process to root

2007-10-11 Thread Kyle Moffett


Please don't trim CC lists

On Oct 11, 2007, at 17:02:37, Al Boldi wrote:

David Newall wrote:

[EMAIL PROTECTED] wrote:
What David meant was that "root will always have a slot" doesn't  
*actually* help unless you *also* have a way to actually *spawn*  
such a process.  In order to do the ps, kill, and so on that you  
need to recover, you need to already have either a root shell  
available, or a way to *get* a root shell that doesn't rely on a  
non-root process (so /bin/su doesn't help here).


That's right, although it's worse than that.  You need to have a  
process with CAP_SYS_ADMIN.  If root processes normally have that  
capability then the reserved slots may well disappear before you  
notice a problem.  If root processes normally don't have it, then  
you need to guarantee that one is already running.


I once posted a patch to handle this DoS, but, as usual, it wasn't  
accepted.  Go figure...


This isn't really necessary any more with the new CFS scheduler.  If  
you want to prevent excess memory usage then you limit memory usage,  
not process count, so just set the system max process count to  
something absurdly high and leave the user counts down at the maximum  
a user might run.  Then as long as the sum of the user processes is  
less than the max number of processes (which you just set absurdly  
high or unlimited), you may still log in.  With the per-user  
scheduling enabled CFS allows you to run an optimistically-real-time  
game as one user and several thousand busy-loops as another user and  
get almost picture perfect 50% CPU distribution between the users.   
To me that seems a much better DoS-prevention system than limits  
which don't scale based on how many people are requesting resources.


Cheers,
Kyle Moffett

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] Reserve N process to root

2007-10-11 Thread Kyle Moffett


On Oct 12, 2007, at 01:37:23, Al Boldi wrote:

Kyle Moffett wrote:
This isn't really necessary any more with the new CFS scheduler.   
If you want to prevent excess memory usage then you limit memory  
usage, not process count, so just set the system max process count  
to something absurdly high and leave the user counts down at the  
maximum a user might run.  Then as long as the sum of the user  
processes is less than the max number of processes (which you just  
set absurdly high or unlimited), you may still log in.  With the  
per-user scheduling enabled CFS allows you to run an  
optimistically-real-time game as one user and several thousand  
busy-loops as another user and get almost picture perfect 50% CPU  
distribution between the users. To me that seems a much better DoS- 
prevention system than limits which don't scale based on how many  
people are requesting resources.


You have a point, and resource-controllers can probably control DoS  
a lot better, but the they also incur more overhead.  Think of this  
"lockout prevention" patch as a near zero overhead safety valve.


But why do you need to add "lockout prevention" if it already  
exists?  With CFS' extremely efficient per-user-scheduling (hopefully  
soon to be the default) there are only two forms of lockout by non- 
root processes:  (1) Running out of PIDs in the box's PID-space  
(think tens or hundreds of thousands of processes), or (2) Swap- 
storming the box to death.  To put it bluntly trying to reserve free  
PID slots is attacking the wrong end of the problem and your so  
called "lockout prevention" could very easily ensure that 10 PIDs are  
available even if the user has swapstormed the box with the PIDs he  
does have.


Cheers,
Kyle Moffett
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Get physical MAC address

2008-01-01 Thread Kyle Moffett


On Jan 01, 2008, at 21:42:18, Jon Masters wrote:

On Mon, 2007-12-31 at 12:39 +0700, Theewara Vorakosit wrote:
I get MAC address from ioctl. However, ifconfig can change this   
MAC address. Can I get a real physical MAC address of the NIC?


Forgive me reading into your mail...this smells a bit like some  
kind of licensing/compliance thing. Just bear in mind that using  
the MAC to verify the identity of a machine is utterly useless and  
pointless - anyone can trivially fool your software[0] to see what  
it "wants".


Not necessarily;  I can easily see distros wanting to have a "Restore  
defaults" button in their network config windows which also includes  
restoring the default MAC address to the NIC.  It should also be  
pointed out that anybody with one of a selection of re-flashable NICS  
(or NICS with removable EEPROMS) can easily change the MAC address on  
their NIC.  Other alternatives includes renaming eth0 to mynet0 and  
creating a downed dummy interface called "eth0" with the desired MAC  
addr.



[0] We used to have to do far worse kludgery in college, in order  
to prevent the silly powers that be who "banned" network cards  
other than those made by one manufacturer from being used on their  
little network.


Well for basically any userspace-level check, all it takes is  
somebody who knows ASM and has about 5 minutes to track down the  
problematic branch instructions.  Then they just have to write a 10- 
line GDB script which starts the program, traps the appropriate  
instructions, and then changes a "0" to a "1" (or vice versa) before  
the conditional branch.  On Windows it's vaguely practical (albeit  
crash-prone) to load a kernel hack which prevents your program from  
being debugged, but under Linux it's effectively impossible


Cheers,
Kyle Moffett

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: git guidance

2007-11-29 Thread Kyle Moffett


On Nov 29, 2007, at 00:27:04, Al Boldi wrote:

Jakub Narebski wrote:
Besides, you can always use "git show :". For  
example gitweb (and I think other web interfaces) can show any  
version of a file or a directory, accessing only repository.


Sure, browsing is the easy part, but Version Control starts when  
things become writable.


But... git history is very inherently completely immutable once  
created... that's the only way you can index everything with a simple  
SHA-1.  If you want to write to the "git filesystem" by adding new  
commits then you need to use the appropriate commands, same as every  
other VCS on the planet.


Cheers,
Kyle Moffett

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Kernel Development & Objective-C

2007-11-30 Thread Kyle Moffett


On Nov 30, 2007, at 13:40:07, H. Peter Anvin wrote:

Kyle Moffett wrote:
With that said, there is a significant performance penalty as all  
Objective-C method calls are looked up symbolically at runtime for  
every single call.


GACK!

At least C++ has vtables.


In a tight loop there is a way to do a single symbolic lookup and  
just call directly through a function pointer, but typically it isn't  
necessary for GUI programs and the like.  The flexibility of being  
able to dynamically add new methods to an existing class (at least  
for desktop user interfaces) significantly outweighs the performance  
cost.  Any performance-sensitive code is typically written in  
straight C anyways.


Cheers,
Kyle Moffett

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Kernel Development & Objective-C

2007-11-30 Thread Kyle Moffett


On Nov 30, 2007, at 09:34:45, Lennart Sorensen wrote:

On Thu, Nov 29, 2007 at 12:14:16PM +, Ben Crowhurst wrote:

Has Objective-C ever been considered for kernel development?


Doesn't objective C essentially require a runtime to provide a lot  
of the features of the language?  If it does (as I suspect) then it  
is totally unsiatable for kernel development.


That and object oriented languages in general are badly designed  
and a bad idea.  Having not used objective C I have no idea if it  
qualifies as badly designed or not.  Certainly C++ and java are  
both very badly designed.


Objective-C is actually a pretty minimal wrapper around C; it was  
originally implemented as a C preprocessor.  It generally does not  
have any kind of memory management, garbage collection, or anything  
else (although typically a "runtime" will provide those features).   
There are no first-class exceptions, so there would be nothing to  
worry about there (the exceptions used in GUI programs are built  
around the setjmp/longjmp primitives).  Objective-C is also almost  
completely backwards-compatible with C, much more so than C++ ever  
was.  As far as the runtime goes the kernel would be expected to  
write its own, the same way that it implements "kmalloc()" as part of  
a "C runtime".  Since the runtime itself never does any implicit  
memory allocation, I think it would conceivably even be relatively  
safe for kernel usage.


With that said, there is a significant performance penalty as all  
Objective-C method calls are looked up symbolically at runtime for  
every single call.  For GUI programs where large chunks of the code  
are event-loops and not performance-sensitive that provides a huge  
amount of extra flexibility.  In the kernel though, there are many  
codepaths where *every* *single* instruction counts; that could be a  
serious performance hit.


Cheers,
Kyle Moffett

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Relax permissions for reading hard drive serial number?

2007-12-04 Thread Kyle Moffett


On Dec 02, 2007, at 13:45:44, Matti Aarnio wrote:
This lack of having stable(*) unique system identifier available to  
applications is one of the small details that make node locked  
commercial software delivery challenging thing in UNIX environments..


*) "stable" as both stable data, and stable API to get it.


Well... There's that.  There's also the fact that anybody with a  
modicum of ASM programming skills can get clever with GDB and traces  
from "Correct HW serial" and "Incorrect HW serial" can write a 10- 
line GDB script to make it work regardless.  I did something similar  
with a popular FPS (which I legitimately own) on one of my Mac  
systems after having left the DVD behind when going to a LAN party.   
Addresses removed to protect the innocent^Wguilty, but they took  
maybe 15 minutes to acquire:


break *END_OF_CDKEY_CODE_DECRYPTION
run
delete 1
advance *JUST_AFTER_CDKEY_CHECK
set $r3 = 0
detach

At some point every such "locked" computer program has code like this:

if (program_is_not_authorized()) {
display_nasty_dialog();
exit(1);
}


All it takes for somebody with a debugger is to identify the last  
instruction of the "program_is_authorized()" function and change $r3  
(or whatever return register your system uses) from a 1 to a 0.  The  
fact remains that once the software is running on *THEIR* computer  
there is nothing you can practically do to forcibly prevent them from  
using it in whatever fashion they desire.  Typically if you price  
your software reasonably people will be willing to pay for multiple  
copies but there are no foolproof technical measures to enforce that  
they do so.


Cheers,
Kyle Moffett

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] Reduce stack used by lib/hexdump.c

2007-12-05 Thread Kyle Moffett


On Dec 05, 2007, at 21:42:35, Joe Perches wrote:

On Wed, 2007-12-05 at 18:18 -0800, Randy Dunlap wrote:

Joe Perches wrote:
Maybe just eliminate the 16 or 32 byte width option and force it  
to only 16 byte widths.
Have you checked users (callers)?  I'm pretty sure that one of the  
callers wanted 32 and that's why it's there.


I did.  There is only 1 subsystem.  That's easy to change.

drivers/mtd/ubi/debug.c:  print_hex_dump(KERN_DEBUG, "",  
DUMP_PREFIX_OFFSET, 32, 1,
drivers/mtd/ubi/io.c: print_hex_dump(KERN_DEBUG, "",  
DUMP_PREFIX_OFFSET, 32, 1,


Long lines in the log file are not too easy to read anyway.  Using  
16 byte dumps per line instead of 32 isn't painful.


It gets rid of the allocation, reduces the argument count and makes  
the kernel smaller.  I think it's all good.


Every current caller would have to change though.


Alternatively, since print_hex_dump is not a performance-critical  
path (and usually indicates an error/debug condition), you could  
probably just make a static "hexdump_lock" spinlock and  
spin_lock_irqsave()/spin_unlock_irqrestore().  It would always nest  
inside any other lock (except during crash, where we break locks  
already for printk()), and I doubt any of the callers would notice  
the serialization since they're already serialized on the printk buffer.


Cheers,
Kyle Moffett

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: New Address Family: Inter Process Networking (IPN)

2007-12-05 Thread Kyle Moffett


On Dec 06, 2007, at 00:30:16, Renzo Davoli wrote:
AF_IPN is different.  AF_IPN is the broadcast and peer-to-peer  
extension of AF_UNIX. It supports communication among *user*  
processes.


Ok, you say it's different, but then you describe how IP unicast and  
broadcast work.  Both are frequently used for communication among  
"*user* processes".  Please provide significantly more details about  
exactly *how* it's different.




Example:

Qemu, User-Mode Linux, Kvm, our umview machines can use IPN as an  
Ethernet Hub and communicate among themselves with the hosting  
computer and the world by a tap like interface.


You say "tap like" interface, but people do this already with  
existing infrastructure.  You can connect Qemu, UML, and KVM to a  
standard linus "tap" interface, and then use the standard Linux  
bridging code to connect the "tap" interface to your existing network  
interfaces.  Alternatively you could use the standard and well-tested  
IP routing/firewalling/NAT code to move your packets around.  None of  
this requires new network infrastructure in the slightest.  If you  
have problems with the existing code, please improve it instead of  
creating a slightly incompatible replacement which has different bugs  
and workarounds.



You can also grab an interface (say eth1) and use eth0 for your  
hosting computer and eth1 for the IPN network of virtual machines.


You can do that already with the bridging code.


If you load the kvde_switch submodule IPN can be a virtual Ethernet  
switch.


As I described above, this can be done with the existing bridging and  
tun/tap code.




Another Example:

You have a continuous stream of data packets generated by a  
process, and you want to send this data to many processes.  Maybe  
the set of processes is not known in advance, you want to send the  
data to any interested process. Some kind of publish&subscribe  
communication service (among unix processes not on TCP-IP). Without  
IPN you need a server. With IPN the sender creates the socket  
connects to it and feed it with data packets. All the interested  
receivers connects to it and start reading. That's all.


This is already done frequently in userspace.  Just register a port  
number with IANA on which to implement a "registration" server and  
write a little daemon to listen on 127.0.0.1:${YOUR_PORT}.  Your  
interconnecting programs then use either unicast or multicast sockets  
to bind, then report to the registration server what service you are  
offering and what port it's on.  Your "receivers" then connect to the  
registration server, ask what port a given service is on, and then  
multicast-listen or unicast-connect to access that service.  The best  
part is that all of the performance implications are already  
thoroughly understood.  Furthermore, if you want to extend your  
communication protocol to other hosts as well, you just have to  
replace the 127.0.0.1 bind with a global bind.  This is exactly how  
the standard-specified multiple-participant "SIP" protocol works, for  
example.



So if you really think this is something that belongs in the kernel  
you need to provide much more detailed descriptions and use-cases for  
why it cannot be implemented in user-space or with small  
modifications to existing UDP/TCP networking.


Cheers,
Kyle Moffett

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Futexes and network filesystems.

2007-11-20 Thread Kyle Moffett


On Nov 20, 2007, at 17:53:52, Er ic W. Biederman wrote:
I had a chance to think about this a bit more, and realized that  
the problem is that futexes don't appear to work on network  
filesystems, even if the network filesystems provide coherent  
shared memory.


It seems to me that we need to have a call that gets a unique token  
for a process for each filesystem per filesystem for use in futexes  
(especially robust futexes).  Say get_fs_task_id(const char *path);


On local filesystems this could just be the pid as we use today,  
but for filesystems that can be accessed from contexts with  
potentially overlapping pid values this could be something else.   
It is an extra syscall in the preparation path, but it should be  
hardly more expensive the current getpid().


Once we have fixed the futex infrastructure to be able to handle  
futexes on network filesystems, the pid namespace case will be  
trivial to implement.


Actually, I would think that get_vm_task_id(void *addr) would be a  
more useful interface.  The call would still be a relatively simple  
lookup to find the struct file associated with the particular virtual  
mapping, but it would be race-free from the perspective of userspace  
and would not require that we somehow figure out the file descriptor  
associated with a particular mmap() (which may be closed by this  
point in time).  Useful extension would be the get_fd_task_id(int fd)  
and get_fs_task_id(const char *path), but those are less important.


The other important thing is to ensure that somehow the numbers are  
considered unique only within the particular domain of a container,  
such that you can migrate a container from one system to another even  
using a simple local ext3 filesystem (on a networked block device)  
and still be able to have things work properly even after the  
migration.  Naturally this would only work with an upgraded libc but  
I think that's a reasonable requirement to enforce for migration of  
futexes and cross-network futexes.


Even for network filesystems which don't implement coherent shared  
memory, you might add a memexcl() system call which (when used by  
multiple cooperating processes) ensures that a given page is only  
ever mapped by at most one computer accessing a given network  
filesystem.  The page-outs and page-ins when shuttling that page  
across the network would be expensive, but I believe the cost would  
be reasonable for many applications and it would allow traditional  
atomic ops on the mapped pages to take and release futexes in the  
uncontended case.


Cheers,
Kyle Moffett

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC] Documentation about unaligned memory access

2007-11-22 Thread Kyle Moffett


On Nov 22, 2007, at 20:29:11, Alan Cox wrote:
Most architectures are unable to perform unaligned memory  
accesses. Any unaligned access causes a processor exception.


Not all. Some simply produce the wrong answer - thats oh so much  
more exciting.


As one example, the MicroBlaze soft-core processor family designed  
for use on Xilinx FPGAs will (by default) simply forcibly zero the  
lower bits of the unaligned address, such that the following code  
will fail mysteriously:


const char foo[] = { 0x00, 0x01, 0x02, 0x03, 0x04, 0x05, 0x06, 0x07 };
printf("0x%08lx 0x%08lx 0x%08lx 0x%08lx\n",
*((u32 *)(foo+0)),
*((u32 *)(foo+1)),
*((u32 *)(foo+2)),
*((u32 *)(foo+3)));

Instead of outputting:
0x00010203 0x01020304 0x02030405 0x03040506

It will output:
0x00010203 0x00010203 0x00010203 0x00010203

Other embedded architectures have very similar problems.  Some may  
provide an "unaligned data access" exception, but offer insufficient  
information to repair the damage and resume execution.


Cheers,
Kyle Moffett

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: + smack-version-11c-simplified-mandatory-access-control-kernel.patch added to -mm tree

2007-11-24 Thread Kyle Moffett


On Nov 24, 2007, at 06:39:34, Crispin Cowan wrote:

Andrew Morgan wrote:
It feels to me as if a MAC "override capability" is, if true to  
its name, extra to the MAC model; any MAC model that needs an  
'override' to function seems under-specified... SELinux clearly  
feels no need for one,


That's not quite right. More specifically, it already has one in  
the form of unconfined_t. AppArmor has a similar escape hatch in  
the "Ux" permission. Its not that they don't need one, it is that  
they already have one. They get to have one because they allow you  
to actually write a policy that is more nuanced than "process label  
must dominate object label".


Actually, a fully-secured strict-mode SELinux system will have no  
unconfined_t processes; none of my test systems have any.  Generally  
"unconfined_t" is used for situations similar to what AppArmor was  
designed for, where the only "interesting" security is that of the  
daemon (which is properly labelled) and one or more of the users are  
unconfined.


Even then "unconfined_t" is not an implicit part of the policy, it is  
explicitly given the ability to take any action on any object by  
rules in the policy, and it typically still falls under a few MLS  
labeling restrictions even in the targeted policy.


Cheers,
Kyle Moffett

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: + smack-version-11c-simplified-mandatory-access-control-kernel.patch added to -mm tree

2007-11-26 Thread Kyle Moffett


On Nov 24, 2007, at 22:36:43, Crispin Cowan wrote:

Kyle Moffett wrote:
Actually, a fully-secured strict-mode SELinux system will have no  
unconfined_t processes; none of my test systems have any.   
Generally "unconfined_t" is used for situations similar to what  
AppArmor was designed for, where the only "interesting" security  
is that of the daemon (which is properly labelled) and one or more  
of the users are unconfined.


Interesting. In a Targeted Policy, you do your policy  
administration from unconfined_t. But how do you administer a  
Strict Policy machine? I can think of 2 ways:


[snip]


* there is some type that is tighter than unconfined_t but none the
  less has sufficient privilege to change policy

To me, this would be semantically equivalent to unconfined_t,  
because any rogue code or user with this type could then fabricate  
unconfined_t and do what they want


Well, in a strict SELinux system, someone who has been permitted the  
"Security Administrator" role (secadm_r) and who has logged in  
through a "login_t" process may modify and reload the policy.  They  
are also permitted to view all files up to their clearance, write  
files below their level, and relabel files.  On the other hand, they  
do not have any system-administration privileges (those are reserve  
for sysadm_r).


Under the default policy the security administrator may disable  
SELinux completely, although that too can be adjusted as "load  
policy" is yet another specialized permission.


Cheers,
Kyle Moffett
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: freeze vs freezer

2007-11-27 Thread Kyle Moffett


On Nov 27, 2007, at 12:40:24, Rafael J. Wysocki wrote:

On Tuesday, 27 of November 2007, Matthew Garrett wrote:

On Mon, Nov 26, 2007 at 10:53:34PM +0100, Rafael J. Wysocki wrote:

On Monday, 26 of November 2007, David Chinner wrote:
So how do you handle threads that are blocked on I/O or a lock  
during the system freeze process, then?


We wait until they can continue.


So if I have a process blocked on an unavilable NFS mount, I can't
suspend?


That's correct, you can't.

[And I know what you're going to say. ;-)]


Why exactly does suspend/hibernation depend on "TASK_INTERRUPTIBLE"  
instead of a zero preempt_count()?  Really what we should do is just  
iterate over all of the actual physical devices and tell each one  
"Block new IO requests preemptably, finish pending DMA, put the  
hardware in low-power mode, and prepare for suspend/hibernate".  As  
long as each driver knows how to do those simple things we can have  
an entirely consistent kernel image for both suspend and for  
hibernation.


When all tasks are preemptable we can very trivially rely on the  
drivers to enforce the "Stop new IO submission" with a dirt-simple  
semaphore or waitqueue.  The sleep itself will be  
TASK_UNINTERRUPTIBLE, but it will be done from a preemptible context.


That way the system suspend time is the sum of the suspend times of  
the devices on the system, and the suspend time of any given device  
is the sum of its maximum non-preemptible critical section and the  
time to flush all of its remaining pending DMA/etc.  This is almost  
completely independent of the load-level of the machine, and it does  
not depend on things like NFS filesystems.  The one gotcha is that it  
does not flush dirty filesystem pages to disk first, although that  
could be fixed with a few VFS and blockdev hooks which hierarchically  
flush and "freeze" block devices and filesystems before actually  
disabling devices much the way that device-mapper can pause a device  
to take a snapshot and end up with a clean journal on the filesystem  
afterwards.


Cheers,
Kyle Moffett

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: freeze vs freezer

2007-11-27 Thread Kyle Moffett


On Nov 27, 2007, at 17:49:18, Jeremy Fitzhardinge wrote:

Rafael J. Wysocki wrote:
Well, this is more-or-less how we all imagine that should be done  
eventually.


The main problem is how to implement it without causing too much  
breakage.  Also, there are some dirty details that need to be  
taken into consideration.


For Xen suspend/resume, I'd like to use the freezer to get all  
threads into a known consistent state (where, specifically, they  
don't have any outstanding pagetable updates pending).  In other  
words, the freezer as it currently stands is what I want, modulo  
some of these issues where it gets caught up unexpectedly.  If  
threads end up getting frozen anywhere preempt isn't explicitly  
disabled, it wouldn't work for me.


The problem with "one freezer" is that "known consistent state" means  
something completely different to every single driver and subsystem.   
Xen wants it to mean "No pending page table updates and no more  
updates from this point forward".  A network driver wants it to mean  
"All pending network packets DMAed out or in and the device shut down  
with all remaining packets queued.  A SATA controller wants it to  
mean "All DMA quiesced and no more commands", etc.


The only way to have that work is to put minimal definitions of what  
state you care about in the drivers themselves.  For Xen this means  
that you need to have an appropriately-timed suspend handler which  
hooks into Xen code very precisely to create and preserve the "No  
pending page table updates" state that you care about.  It will be  
more work in the short term but it's the only maintainable solution  
in the long term IMO.


Cheers,
Kyle Moffett

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 00/26] Permit filesystem local caching

2008-01-15 Thread Kyle Moffett


On Jan 15, 2008, at 18:46, David Howells wrote:

 (*) 01-keys-inc-payload.diff
 (*) 02-keys-search-keyring.diff
 (*) 03-keys-callout-blob.diff


One vaguely related question:  Is there presently any way to adjust  
the per-user max-key-data limit? I've been tinkering with using the  
new-ish MIT kerberos "KEYRING:" credentials-cache code to hold keys  
for persistent daemons.  Unfortunately "root" keeps hitting the limit  
even with only about 16 keys allocated across a few sessions.  After  
perusing the docs I can't find any documentation on adjusting the  
limits.


I'd really like some way to specifically allow root to allocate up to  
several megs worth of non-swappable key data, although I suppose just  
increasing the global limit slightly wouldn't be bad either.  If such  
functionality already exists then I'd appreciate a pointer to it (and  
possibly respond in kind with documentation patches).


Cheers,
Kyle Moffett

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [MC] [CHECKER] Need help on mmap on FUSE (linux user-land file system)

2005-03-13 Thread Kyle Moffett

On Mar 13, 2005, at 02:28, Junfeng Yang wrote:
Forget to mention, we are checking linux 2.6.  It appears to us that 
mmap
doesnt' work for FUSE in linux 2.6.
IIRC, the reason mmap doesn't work on FUSE is because when it dirties 
pages they
cannot be flushed reliably, because writing them out involves calling a 
userspace
process which may allocate RAM, etc.

Cheers,
Kyle Moffett
-BEGIN GEEK CODE BLOCK-
Version: 3.12
GCM/CS/IT/U d- s++: a18 C>$ UB/L/X/*(+)>$ P+++()>$
L(+++) E W++(+) N+++(++) o? K? w--- O? M++ V? PS+() PE+(-) Y+
PGP+++ t+(+++) 5 X R? tv-(--) b(++) DI+ D+ G e->$ h!*()>++$ r  
!y?(-)
--END GEEK CODE BLOCK--

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[BUG?] Signedness of __kernel_nlink_t on Sparc?

2005-03-13 Thread Kyle Moffett

In include/asm-sparc/types.h, __kernel_nlink_t is signed, whereas on 
all the
other architectures it is unsigned.  Is this intentional, or a bug?

Cheers,
Kyle Moffett
-BEGIN GEEK CODE BLOCK-
Version: 3.12
GCM/CS/IT/U d- s++: a18 C>$ UB/L/X/*(+)>$ P+++()>$
L(+++) E W++(+) N+++(++) o? K? w--- O? M++ V? PS+() PE+(-) Y+
PGP+++ t+(+++) 5 X R? tv-(--) b(++) DI+ D+ G e->$ h!*()>++$ r  
!y?(-)
--END GEEK CODE BLOCK--

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH][RFC] Make /proc/ chmod'able

2005-03-15 Thread Kyle Moffett

On Mar 15, 2005, at 16:18, Rene Scharfe wrote:
It's easily visible in the style of public toilets: in some contries 
you have one big room with no walls in between where all men or women 
merrily shit together, in other countries (like mine) every person can 
lock himself into a private closet.  Both ways work, there's nothing 
too special about using a toilet, but I'm simply used to the privacy 
provided by those thin walls.  I assure you, I don't do anything evil 
in there. :]
Just as long as our labs "bathrooms" don't mysteriously get a
bazillion walls all over the place on kernel upgrade, we're ok.
I don't mind adding new options for advanced security, as long
as you don't change the defaults.  It's hard enough managing
a boatload of workstations under ideal conditions.  When the
default settings change every month it gets really annoying
really quickly. :-D.
Cheers,
Kyle Moffett
-BEGIN GEEK CODE BLOCK-
Version: 3.12
GCM/CS/IT/U d- s++: a18 C>$ UB/L/X/*(+)>$ P+++()>$
L(+++) E W++(+) N+++(++) o? K? w--- O? M++ V? PS+() PE+(-) Y+
PGP+++ t+(+++) 5 X R? tv-(--) b(++) DI+ D+ G e->$ h!*()>++$ r  
!y?(-)
--END GEEK CODE BLOCK--

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Real-Time Preemption and RCU

2005-03-20 Thread Kyle Moffett

On Mar 19, 2005, at 11:31, Ingo Molnar wrote:
What about allowing only as many concurrent readers as there are CPUs?
since a reader may be preempted by a higher prio task, there is no
linear relationship between CPU utilization and the number of readers
allowed. You could easily end up having all the nr_cpus readers
preempted on one CPU. It gets pretty messy
One solution I can think of, although it bloats memory usage for 
many-way
boxen, is to just have a table in the rwlock with one entry per cpu.  
Each
CPU would get one concurrent reader, others would need to sleep

Cheers,
Kyle Moffett
-BEGIN GEEK CODE BLOCK-
Version: 3.12
GCM/CS/IT/U d- s++: a18 C>$ UB/L/X/*(+)>$ P+++()>$
L(+++) E W++(+) N+++(++) o? K? w--- O? M++ V? PS+() PE+(-) Y+
PGP+++ t+(+++) 5 X R? tv-(--) b(++) DI+ D+ G e->$ h!*()>++$ r  
!y?(-)
--END GEEK CODE BLOCK--

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: current linus bk, error mounting root

2005-03-21 Thread Kyle Moffett

On Mar 21, 2005, at 19:19, Andrew Morton wrote:
Jon Smirl <[EMAIL PROTECTED]> wrote:
Jens is right that this is a user space issue, but how many people are
going to find this out the hard way when their root drives stop
mounting. Since no one is complaining I have to assume that most
kernel developers have their root device drivers built into the
kernel. I was loading mine as a module since for a long time Redhat
was not shipping kernels with SATA built in.
I don't agree that this is a userspace issue.  It's just not sane for a
driver to be in an unusable state for an arbitrary length of time after
modprobe returns.
What about if I'm booting from a USB drive?  In that case, because of 
the
asynchrony of USB probing, it may take 1 or 2 seconds for my attached 
hub
to power on, wake up, boot its embedded microprocessor, etc before it 
will
respond to signals.  In such a case, as far as the root hub can tell,
there are _no_ external devices for a couple seconds, and that's 
ignoring
that my external USB bootdrive may _also_ need time to "boot" before it
will be accessible, and that's only once its parent hub has become
available.

I think that the kernel needs some kind of wait-for-device API that is
accessible from kernel-space for the simple boot sequence, perhaps just
waiting for a specific kobject to be detected and complete 
initialization.

For an initrd/initramfs in userspace, dnotify on sysfs (For the static
/dev case), or dnotify on /dev (For the udev case) should allow it to
detect when the device is available.
Cheers,
Kyle Moffett
-BEGIN GEEK CODE BLOCK-
Version: 3.12
GCM/CS/IT/U d- s++: a18 C>$ UB/L/X/*(+)>$ P+++()>$
L(+++) E W++(+) N+++(++) o? K? w--- O? M++ V? PS+() PE+(-) Y+
PGP+++ t+(+++) 5 X R? tv-(--) b(++) DI+ D+ G e->$ h!*()>++$ r  
!y?(-)
--END GEEK CODE BLOCK--

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: forkbombing Linux distributions

2005-03-23 Thread Kyle Moffett

On Mar 23, 2005, at 09:43, Jan Engelhardt wrote:
brings down almost all linux distro's while other *nixes survives.
Let's see if this can be confirmed.
Here at my school we have the workstations running Debian testing. We
have edited /etc/security/limits.conf to have a much more restrictive
startup environment for user processes, limiting to 100 processes per
user and clamping maximum CPU time to 4 hours per process.  It's not
failsafe, but we also have all of the kernel threads set at realtime
levels, with the IRQ threads specifically set at SCHED_RR 99, and we
have a sulogin-type process on tty12 at SCHED_RR 99.
Even in the event of the worst kind of forkbomb, the terminal is as
responsive as if nothing else were running and allows us to kill the
offending processes easily, because when the scheduler refuses to
interrupt the killall process to run anything else, no other forkbomb
processes get started.
I suppose a similar situation could be set up with a user-accessible
server and a rate-limited SSH daemon if necessary, although a ttyS0
console via a console server might work better.  In any case, I think
that while there could perhaps be a better interface for user-limits
in the kernel, the existing one works fine for most purposes, when
combined with appropriate administrative tools.
Cheers,
Kyle Moffett
-BEGIN GEEK CODE BLOCK-
Version: 3.12
GCM/CS/IT/U d- s++: a18 C>$ UB/L/X/*(+)>$ P+++()>$
L(+++) E W++(+) N+++(++) o? K? w--- O? M++ V? PS+() PE+(-) Y+
PGP+++ t+(+++) 5 X R? tv-(--) b(++) DI+ D+ G e->$ h!*()>++$ r  
!y?(-)
--END GEEK CODE BLOCK--

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [OT] speeding boot process (was Re: [ANNOUNCE] hotplug-ng 001 release)

2005-02-14 Thread Kyle Moffett

On Feb 14, 2005, at 20:17, Lee Revell wrote:
On Mon, 2005-02-14 at 16:16 -0800, Tim Bird wrote:
Lee Revell wrote:
But, I was referring more to things like GDM not being started until 
all
the other init scripts are done.  Why not start it first, and let the
network initialize while the user is logging in?
There are a number of techniques used by CE vendors to get fast bootup
time.  Some CE products boot Linux in under 1 second.  Sony's
best Linux boot time in the lab (from power on to user space)
was 148 milliseconds, on an ARM chip (running at 200 MHZ I believe).
The reason I marked by response OT is that the time from power on to
userspace does not seem to be a big problem.  It's the amount of time
from user space to presenting a login prompt that's way too long.  My
distro (Debian) runs all the init scripts one at a time, and GDM is the
last thing that gets run.  There is just no reason for this.  We should
start X and initialize the display and get the login prompt up there
ASAP, and let the system acquire the DHCP lease and start sendmail and
apache and get the date from the NTP server *in the background while I
am logging in*.  It's not rocket science.
Such a system needs a drastically different bootup process than 
currently
exists, including the ability to specify init-script dependencies.  
(Like
for example user login via GDM (and with our setup, GDM working at all)
requires that AFS is mounted and NIS is working, which both require the
network to be available, which requires...  You can see where this is
going.  I think eventually we need a better /sbin/init, one that can use
a traditional legacy /etc/inittab file in addition to a newfangled
simultaneous boot process with lots of ways to start various kinds of
services.  Unfortunately such a system will need a _LOT_ of work and
testing to make sure it doesn't break existing setups.  Oh well, I can
dream, can't I? :-D

Cheers,
Kyle Moffett
-BEGIN GEEK CODE BLOCK-
Version: 3.12
GCM/CS/IT/U d- s++: a18 C>$ UB/L/X/*(+)>$ P+++()>$
L(+++) E W++(+) N+++(++) o? K? w--- O? M++ V? PS+() PE+(-) Y+
PGP+++ t+(+++) 5 X R? tv-(--) b(++) DI+ D+ G e->$ h!*()>++$ r  
!y?(-)
--END GEEK CODE BLOCK--

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Drive missing only with LVM kernel

2005-01-26 Thread Kyle Moffett

On Jan 26, 2005, at 03:34, Jasper Koolhaas wrote:
Oh, and I'm using a devfs so "cd /dev && ./MAKEDEV hdg" is not the
solution I think.
The odd thing is that without LVM compiled in the kernel or as
module /dev/hdg is accessible through devfs and with LVM not.
Well, devfs has been deprecated and mostly unmaintained since before
2.6.0 was released, so it really doesn't surprise me.  Go download
and install udev, hotplug, etc from your distro.
Cheers,
Kyle Moffett
-BEGIN GEEK CODE BLOCK-
Version: 3.12
GCM/CS/IT/U d- s++: a18 C>$ UB/L/X/*(+)>$ P+++()>$
L(+++) E W++(+) N+++(++) o? K? w--- O? M++ V? PS+() PE+(-) Y+
PGP+++ t+(+++) 5 X R? tv-(--) b(++) DI+ D+ G e->$ h!*()>++$ r  
!y?(-)
--END GEEK CODE BLOCK--

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

1 2 3 4 >

1 - 100 of 360 matches

Mail list logo