Duplicated functions (was: fix memory corruption from misinterpreted bad_inode_ops return values)

2007-01-04 Thread Mitchell Blank Jr
Al Viro wrote:
> On Thu, Jan 04, 2007 at 03:21:06PM -0800, Mitchell Blank Jr wrote:
> > The real (but harder) fix for the wasted space issue
> > would be to get the toolchain to automatically combine functions that
> > end up compiling into identical assembly.
> 
> Can't do.
[...]
> Comparing pointers to
> functions for equality is a well-defined operation and it's not
> to be messed with.

I agree -- it definitely could never be an "always on" optimization.  What I
was postulating would be a special post-processing step that could be run by
people interested in squeezing the kernel as small as possible.  I'd bet
that there's not many functions in the kernel that rely on having a unique
address.  If some exist they could be put in their own ELF section via
a "__unique_function_address_needed" macro or something.

> You _can_ compile g into jump to f, but that's it.

Actually if you don't mind sacrificing alignment (which we do anyway with -Os,
I think) you could give a function two unique addresses just by prepending it
with a single-byte noop instruction.

But anyway... how about some actual data?

Just out of curiosity I threw together a quick script (attached) that
parses the output of "objdump --disassemble vmlinux" and tries to find
functions that share the same opcodes.  The short summary of the results:
a bit of .text savings but not an amazing amount -- on my tiny nfsroot
test kernel it found 4034 bytes worth of possible savings (out of a total
of 1460830 bytes of opcodes in .text)  About what I expected, actually.

One line in its output really stands out, however:

| 218 bytes: __copy_from_user_ll_nozero, __copy_from_user_ll, __copy_to_user_ll

Those three big functions are exactly the same, which actually makes sense
if you think about how they're implemented.  That's a bunch of *very* hot
Icache lines we're wasting there!

Anyway, hopefully some folks will find the script interesting.  In particular
maybe the janitors can play with it.  I'd wager some of the things it
turns up are good candidates for code re-use.

-Mitch

Script's output:

Duplicated functions:
  98 bytes: nlmsvc_proc_granted, nlm4svc_proc_granted
  10 bytes: return_EIO, hung_up_tty_write
  10 bytes: free_pid_ns, mempool_kfree, free_bd_holder, nfs4_free_open_state, 
ipc_immediate_free, dir_release, pci_release_bus_bridge_dev, put_tty_driver, 
device_create_release, class_create_release, class_device_create_release, 
isa_dev_release, neigh_parms_destroy, unx_destroy_cred, pmap_map_release, 
rpc_free_iostats, free
  15 bytes: xor_64, xor_64
  7 bytes: native_set_pte, native_set_pmd, debugfs_u32_set
  24 bytes: part_uevent_store, store_uevent
  54 bytes: atm_init_aal34, atm_init_aal5
  22 bytes: nfs4_encode_void, nlmsvc_encode_void, nlm4svc_encode_void
  42 bytes: seq_release_private, content_release
  10 bytes: profile_event_register, profile_event_unregister, nocrypt, 
nocrypt_iv
  7 bytes: nfs_proc_commit_setup, udp_lib_hash, udp_lib_hash
  179 bytes: nlmsvc_proc_sm_notify, nlm4svc_proc_sm_notify
  29 bytes: nfs4_decode_void, nlmsvc_decode_void, nlm4svc_decode_void
  62 bytes: nfs3_commit_done, nfs3_write_done
  7 bytes: print_trace_stack, i8237A_suspend, store, module_frob_arch_sections, 
get_gate_vma, in_gate_area, in_gate_area_no_task, no_blink, noop_ret, 
no_action, next_online_pgdat, __pud_alloc, __pmd_alloc, generic_pipe_buf_pin, 
simple_sync_file, nfs4_callback_null, fuse_prepare_write, default_read_file, 
dummycon_dummy, vgacon_dummy, read_null, hung_up_tty_read, con_chars_in_buffer, 
class_device_create_uevent, anon_transport_dummy_function, ramdisk_writepages, 
sock_no_poll, noop_dequeue, ipv4_dst_check, rpcproc_encode_null, 
rpcproc_decode_null, pvc_shutdown, svc_shutdown
  20 bytes: fn_show_ptregs, sysrq_handle_showregs
  29 bytes: part_next, diskstats_next
  10 bytes: atomic_notifier_call_chain, raw_notifier_call_chain
  107 bytes: nlm_decode_cookie, nlm4_decode_cookie
  5 bytes: do_spurious_interrupt_bug, c_stop, machine_shutdown, machine_halt, 
syscall_vma_close, native_nop, sighand_ctor, r_stop, s_stop, 
audit_log_task_context, noop, default_unplug_io_fn, frag_stop, single_stop, 
t_stop, devinfo_stop, int_seq_stop, parse_solaris_x86, parse_freebsd, 
parse_netbsd, parse_openbsd, parse_unixware, parse_minix, nfs_server_list_stop, 
nfs_volume_list_stop, nfs3_forget_cached_acls, fuse_read_inode, 
ipc_lock_by_ptr, ipc_unlock, crypto_exit_cipher_ops, crypto_exit_digest_ops, 
ioport_unmap, pci_remove_legacy_files, k_ignore, con_throttle, 
nv_vlan_rx_kill_vid, input_devices_seq_stop, input_handlers_seq_stop, 
proto_seq_stop, dev_seq_stop, softnet_seq_stop, dev_mc_seq_stop, 
neigh_stat_seq_stop, netlink_seq_stop, rt_cpu_seq_stop, tcp_twsk_destructor, 
raw_seq_stop, udp_seq_stop, icmp_address, icmp_discard, fib_seq_stop, 
unix_seq_stop, packet_seq_stop, rpc_default_callback, nul_destroy, 
nul_destroy_cred, c_stop, vcc_seq_stop, smp_setup_pr

Re: [UPDATED PATCH] fix memory corruption from misinterpreted bad_inode_ops return values

2007-01-04 Thread Mitchell Blank Jr
Linus Torvalds wrote:
> Well, that probably would work, but it's also true that returning a 64-bit 
> value on a 32-bit platform really _does_ depend on more than the size.

Yeah, obviously this is restricted to the signed-integer case.  My point
was just that you could have the compiler figure out which variant to pick
for loff_t automatically.

> "let's not play tricks with function types at all".

I think I agree.  The real (but harder) fix for the wasted space issue
would be to get the toolchain to automatically combine functions that
end up compiling into identical assembly.

-Mitch
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [UPDATED PATCH] fix memory corruption from misinterpreted bad_inode_ops return values

2007-01-04 Thread Mitchell Blank Jr
Al Viro wrote:
> At least 3 versions, unless you want to mess with ifdefs to reduce them to
> two.

I don't think you need to do fancy #ifdef's:

static s32 return_eio_32(void) { return -EIO; }
static s64 return_eio_64(void) { return -EIO; }
extern void return_eio_unknown(void);   /* Doesn't exist */
#define return_eio(type)((sizeof(type) == 4)\
? ((void *) return_eio_32)  \
: ((sizeof(type) == 8)  \
? ((void *) return_eio_64)  \
: ((void *) return_eio_unknown)))

-Mitch
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [UPDATED PATCH] fix memory corruption from misinterpreted bad_inode_ops return values

2007-01-04 Thread Mitchell Blank Jr
Al Viro wrote:
 At least 3 versions, unless you want to mess with ifdefs to reduce them to
 two.

I don't think you need to do fancy #ifdef's:

static s32 return_eio_32(void) { return -EIO; }
static s64 return_eio_64(void) { return -EIO; }
extern void return_eio_unknown(void);   /* Doesn't exist */
#define return_eio(type)((sizeof(type) == 4)\
? ((void *) return_eio_32)  \
: ((sizeof(type) == 8)  \
? ((void *) return_eio_64)  \
: ((void *) return_eio_unknown)))

-Mitch
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [UPDATED PATCH] fix memory corruption from misinterpreted bad_inode_ops return values

2007-01-04 Thread Mitchell Blank Jr
Linus Torvalds wrote:
 Well, that probably would work, but it's also true that returning a 64-bit 
 value on a 32-bit platform really _does_ depend on more than the size.

Yeah, obviously this is restricted to the signed-integer case.  My point
was just that you could have the compiler figure out which variant to pick
for loff_t automatically.

 let's not play tricks with function types at all.

I think I agree.  The real (but harder) fix for the wasted space issue
would be to get the toolchain to automatically combine functions that
end up compiling into identical assembly.

-Mitch
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Duplicated functions (was: fix memory corruption from misinterpreted bad_inode_ops return values)

2007-01-04 Thread Mitchell Blank Jr
Al Viro wrote:
 On Thu, Jan 04, 2007 at 03:21:06PM -0800, Mitchell Blank Jr wrote:
  The real (but harder) fix for the wasted space issue
  would be to get the toolchain to automatically combine functions that
  end up compiling into identical assembly.
 
 Can't do.
[...]
 Comparing pointers to
 functions for equality is a well-defined operation and it's not
 to be messed with.

I agree -- it definitely could never be an always on optimization.  What I
was postulating would be a special post-processing step that could be run by
people interested in squeezing the kernel as small as possible.  I'd bet
that there's not many functions in the kernel that rely on having a unique
address.  If some exist they could be put in their own ELF section via
a __unique_function_address_needed macro or something.

 You _can_ compile g into jump to f, but that's it.

Actually if you don't mind sacrificing alignment (which we do anyway with -Os,
I think) you could give a function two unique addresses just by prepending it
with a single-byte noop instruction.

But anyway... how about some actual data?

Just out of curiosity I threw together a quick script (attached) that
parses the output of objdump --disassemble vmlinux and tries to find
functions that share the same opcodes.  The short summary of the results:
a bit of .text savings but not an amazing amount -- on my tiny nfsroot
test kernel it found 4034 bytes worth of possible savings (out of a total
of 1460830 bytes of opcodes in .text)  About what I expected, actually.

One line in its output really stands out, however:

| 218 bytes: __copy_from_user_ll_nozero, __copy_from_user_ll, __copy_to_user_ll

Those three big functions are exactly the same, which actually makes sense
if you think about how they're implemented.  That's a bunch of *very* hot
Icache lines we're wasting there!

Anyway, hopefully some folks will find the script interesting.  In particular
maybe the janitors can play with it.  I'd wager some of the things it
turns up are good candidates for code re-use.

-Mitch

Script's output:

Duplicated functions:
  98 bytes: nlmsvc_proc_granted, nlm4svc_proc_granted
  10 bytes: return_EIO, hung_up_tty_write
  10 bytes: free_pid_ns, mempool_kfree, free_bd_holder, nfs4_free_open_state, 
ipc_immediate_free, dir_release, pci_release_bus_bridge_dev, put_tty_driver, 
device_create_release, class_create_release, class_device_create_release, 
isa_dev_release, neigh_parms_destroy, unx_destroy_cred, pmap_map_release, 
rpc_free_iostats, free
  15 bytes: xor_64, xor_64
  7 bytes: native_set_pte, native_set_pmd, debugfs_u32_set
  24 bytes: part_uevent_store, store_uevent
  54 bytes: atm_init_aal34, atm_init_aal5
  22 bytes: nfs4_encode_void, nlmsvc_encode_void, nlm4svc_encode_void
  42 bytes: seq_release_private, content_release
  10 bytes: profile_event_register, profile_event_unregister, nocrypt, 
nocrypt_iv
  7 bytes: nfs_proc_commit_setup, udp_lib_hash, udp_lib_hash
  179 bytes: nlmsvc_proc_sm_notify, nlm4svc_proc_sm_notify
  29 bytes: nfs4_decode_void, nlmsvc_decode_void, nlm4svc_decode_void
  62 bytes: nfs3_commit_done, nfs3_write_done
  7 bytes: print_trace_stack, i8237A_suspend, store, module_frob_arch_sections, 
get_gate_vma, in_gate_area, in_gate_area_no_task, no_blink, noop_ret, 
no_action, next_online_pgdat, __pud_alloc, __pmd_alloc, generic_pipe_buf_pin, 
simple_sync_file, nfs4_callback_null, fuse_prepare_write, default_read_file, 
dummycon_dummy, vgacon_dummy, read_null, hung_up_tty_read, con_chars_in_buffer, 
class_device_create_uevent, anon_transport_dummy_function, ramdisk_writepages, 
sock_no_poll, noop_dequeue, ipv4_dst_check, rpcproc_encode_null, 
rpcproc_decode_null, pvc_shutdown, svc_shutdown
  20 bytes: fn_show_ptregs, sysrq_handle_showregs
  29 bytes: part_next, diskstats_next
  10 bytes: atomic_notifier_call_chain, raw_notifier_call_chain
  107 bytes: nlm_decode_cookie, nlm4_decode_cookie
  5 bytes: do_spurious_interrupt_bug, c_stop, machine_shutdown, machine_halt, 
syscall_vma_close, native_nop, sighand_ctor, r_stop, s_stop, 
audit_log_task_context, noop, default_unplug_io_fn, frag_stop, single_stop, 
t_stop, devinfo_stop, int_seq_stop, parse_solaris_x86, parse_freebsd, 
parse_netbsd, parse_openbsd, parse_unixware, parse_minix, nfs_server_list_stop, 
nfs_volume_list_stop, nfs3_forget_cached_acls, fuse_read_inode, 
ipc_lock_by_ptr, ipc_unlock, crypto_exit_cipher_ops, crypto_exit_digest_ops, 
ioport_unmap, pci_remove_legacy_files, k_ignore, con_throttle, 
nv_vlan_rx_kill_vid, input_devices_seq_stop, input_handlers_seq_stop, 
proto_seq_stop, dev_seq_stop, softnet_seq_stop, dev_mc_seq_stop, 
neigh_stat_seq_stop, netlink_seq_stop, rt_cpu_seq_stop, tcp_twsk_destructor, 
raw_seq_stop, udp_seq_stop, icmp_address, icmp_discard, fib_seq_stop, 
unix_seq_stop, packet_seq_stop, rpc_default_callback, nul_destroy, 
nul_destroy_cred, c_stop, vcc_seq_stop, smp_setup_processor_id, 
remapped_pgdat_init, pre_setup_arch_hook, trap_init_hook, sched_init_smp

Re: [PATCH 2.6.20-rc2] [BUGFIX] drivers/atm/firestream.c: Fix infinite recursion when alignment passed is 0.

2006-12-30 Thread Mitchell Blank Jr
First, if you want to get patches merged you should send them to the subsystem
maintained (in this case Chas, who I've cc:'ed)  Also if you feel it needs to
be sent to mailing list you should usually use a more specific list first
(like the ATM list or maybe netdev)  Please see Documentation/SubmittingPatches

Amit Choudhary wrote:
> Description: Fix infinite recursion when alignment passed is 0 in function 
> aligned_kmalloc(),

I'm curious how you hit this -- the only caller to aligned_kmalloc() passes
a constant "0x10"

Looking at aligned_kmalloc() it seems to be pretty badly broken -- its fallback
if it gets a non-aligned buffer is to just try a larger size which doesn't
necessarily fix the problem.  It looks like explicitly aligning the buffer
is a better solution.

Could you test this patch?  If it works feel free to forward it on to Chas.

-Mitch

[PATCH] [ATM] remove firestream.c's aligned_kmalloc()

Signed-off-by: Mitchell Blank Jr <[EMAIL PROTECTED]>

diff --git a/drivers/atm/firestream.c b/drivers/atm/firestream.c
index 9c67df5..df8b0c0 100644
--- a/drivers/atm/firestream.c
+++ b/drivers/atm/firestream.c
@@ -1379,38 +1379,22 @@ static void reset_chip (struct fs_dev *d
}
 }
 
-static void __devinit *aligned_kmalloc (int size, gfp_t flags, int alignment)
-{
-   void  *t;
-
-   if (alignment <= 0x10) {
-   t = kmalloc (size, flags);
-   if ((unsigned long)t & (alignment-1)) {
-   printk ("Kmalloc doesn't align things correctly! %p\n", 
t);
-   kfree (t);
-   return aligned_kmalloc (size, flags, alignment * 4);
-   }
-   return t;
-   }
-   printk (KERN_ERR "Request for > 0x10 alignment not yet implemented 
(hard!)\n");
-   return NULL;
-}
-
 static int __devinit init_q (struct fs_dev *dev, 
  struct queue *txq, int queue, int nentries, int is_rq)
 {
-   int sz = nentries * sizeof (struct FS_QENTRY);
+   unsigned sz = nentries * sizeof (struct FS_QENTRY);
struct FS_QENTRY *p;
+   void *vp;
 
func_enter ();
 
fs_dprintk (FS_DEBUG_INIT, "Inititing queue at %x: %d entries:\n", 
queue, nentries);
-
-   p = aligned_kmalloc (sz, GFP_KERNEL, 0x10);
+   vp = kmalloc(sz + 0xF, GFP_KERNEL);
+   p = (struct FS_QENTRY *) ALIGN((unsigned long) vp, 0x10);
fs_dprintk (FS_DEBUG_ALLOC, "Alloc queue: %p(%d)\n", p, sz);
 
-   if (!p) return 0;
+   if (!vp) return 0;
 
write_fs (dev, Q_SA(queue), virt_to_bus(p));
write_fs (dev, Q_EA(queue), virt_to_bus(p+nentries-1));
@@ -1423,8 +1407,7 @@ static int __devinit init_q (struct fs_d
write_fs (dev, Q_CNF(queue), 0 ); 
}
 
-   txq->sa = p;
-   txq->ea = p;
+   txq->buf = vp;
txq->offset = queue; 
 
func_exit ();
@@ -1529,8 +1512,8 @@ static void __devexit free_queue (struct
write_fs (dev, Q_WP(txq->offset), 0);
/* Configuration ? */
 
-   fs_dprintk (FS_DEBUG_ALLOC, "Free queue: %p\n", txq->sa);
-   kfree (txq->sa);
+   fs_dprintk (FS_DEBUG_ALLOC, "Free queue: %p\n", txq->buf);
+   kfree (txq->buf);
 
func_exit ();
 }
diff --git a/drivers/atm/firestream.h b/drivers/atm/firestream.h
index 49e783e..5408766 100644
--- a/drivers/atm/firestream.h
+++ b/drivers/atm/firestream.h
@@ -455,7 +455,7 @@ struct fs_vcc {
 
 
 struct queue {
-   struct FS_QENTRY *sa, *ea;  
+   void *buf;  
int offset;
 };
 

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2.6.20-rc2] [BUGFIX] drivers/atm/firestream.c: Fix infinite recursion when alignment passed is 0.

2006-12-30 Thread Mitchell Blank Jr
First, if you want to get patches merged you should send them to the subsystem
maintained (in this case Chas, who I've cc:'ed)  Also if you feel it needs to
be sent to mailing list you should usually use a more specific list first
(like the ATM list or maybe netdev)  Please see Documentation/SubmittingPatches

Amit Choudhary wrote:
 Description: Fix infinite recursion when alignment passed is 0 in function 
 aligned_kmalloc(),

I'm curious how you hit this -- the only caller to aligned_kmalloc() passes
a constant 0x10

Looking at aligned_kmalloc() it seems to be pretty badly broken -- its fallback
if it gets a non-aligned buffer is to just try a larger size which doesn't
necessarily fix the problem.  It looks like explicitly aligning the buffer
is a better solution.

Could you test this patch?  If it works feel free to forward it on to Chas.

-Mitch

[PATCH] [ATM] remove firestream.c's aligned_kmalloc()

Signed-off-by: Mitchell Blank Jr [EMAIL PROTECTED]

diff --git a/drivers/atm/firestream.c b/drivers/atm/firestream.c
index 9c67df5..df8b0c0 100644
--- a/drivers/atm/firestream.c
+++ b/drivers/atm/firestream.c
@@ -1379,38 +1379,22 @@ static void reset_chip (struct fs_dev *d
}
 }
 
-static void __devinit *aligned_kmalloc (int size, gfp_t flags, int alignment)
-{
-   void  *t;
-
-   if (alignment = 0x10) {
-   t = kmalloc (size, flags);
-   if ((unsigned long)t  (alignment-1)) {
-   printk (Kmalloc doesn't align things correctly! %p\n, 
t);
-   kfree (t);
-   return aligned_kmalloc (size, flags, alignment * 4);
-   }
-   return t;
-   }
-   printk (KERN_ERR Request for  0x10 alignment not yet implemented 
(hard!)\n);
-   return NULL;
-}
-
 static int __devinit init_q (struct fs_dev *dev, 
  struct queue *txq, int queue, int nentries, int is_rq)
 {
-   int sz = nentries * sizeof (struct FS_QENTRY);
+   unsigned sz = nentries * sizeof (struct FS_QENTRY);
struct FS_QENTRY *p;
+   void *vp;
 
func_enter ();
 
fs_dprintk (FS_DEBUG_INIT, Inititing queue at %x: %d entries:\n, 
queue, nentries);
-
-   p = aligned_kmalloc (sz, GFP_KERNEL, 0x10);
+   vp = kmalloc(sz + 0xF, GFP_KERNEL);
+   p = (struct FS_QENTRY *) ALIGN((unsigned long) vp, 0x10);
fs_dprintk (FS_DEBUG_ALLOC, Alloc queue: %p(%d)\n, p, sz);
 
-   if (!p) return 0;
+   if (!vp) return 0;
 
write_fs (dev, Q_SA(queue), virt_to_bus(p));
write_fs (dev, Q_EA(queue), virt_to_bus(p+nentries-1));
@@ -1423,8 +1407,7 @@ static int __devinit init_q (struct fs_d
write_fs (dev, Q_CNF(queue), 0 ); 
}
 
-   txq-sa = p;
-   txq-ea = p;
+   txq-buf = vp;
txq-offset = queue; 
 
func_exit ();
@@ -1529,8 +1512,8 @@ static void __devexit free_queue (struct
write_fs (dev, Q_WP(txq-offset), 0);
/* Configuration ? */
 
-   fs_dprintk (FS_DEBUG_ALLOC, Free queue: %p\n, txq-sa);
-   kfree (txq-sa);
+   fs_dprintk (FS_DEBUG_ALLOC, Free queue: %p\n, txq-buf);
+   kfree (txq-buf);
 
func_exit ();
 }
diff --git a/drivers/atm/firestream.h b/drivers/atm/firestream.h
index 49e783e..5408766 100644
--- a/drivers/atm/firestream.h
+++ b/drivers/atm/firestream.h
@@ -455,7 +455,7 @@ struct fs_vcc {
 
 
 struct queue {
-   struct FS_QENTRY *sa, *ea;  
+   void *buf;  
int offset;
 };
 

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] IBM HDAPS accelerometer driver, with probing.

2005-08-26 Thread Mitchell Blank Jr
Andi Kleen wrote:
> - it doesn't seem to help that much on modern CPUs with good
> branch prediction and big icaches anyways.

Really?  I would think that as pipelines get deeper (although that trend
seems to have stopped, thankfully) and Icache-miss penalties get relatively
larger we'd see unlikely() becoming MORE of a benefit, not less.  Storing
the used part of a "hot" function in 1 Icacheline instead of 4 seems like
an obvious win.

Personally I've never found unlikely() to be ugly; if anything I think
it serves as a nice little human-readable comment about whats going on
in the control-flow.  I guess I'm in the minority on that one, though.

-Mitch
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Linux-2.6.13-rc7

2005-08-26 Thread Mitchell Blank Jr
Richard Henderson wrote:
> Because I use "extern inline" in the proper way.  That is, I have both
> inline and out-of-line versions of some routines.

Is there any reason not to just make the out-of-line version explicit?
i.e.:

/* in some .h file: */
static /*(always!)*/inline int my_func(void)
{
return FOO;
}
extern int OOL_my_func(void);

/* in some .c file: */
int OOL_my_func(void)
{
return my_func();
}

It's a little ugly but there really aren't that many cases of this, right?
Or is this just the principal of the thing? :-)

-Mitch
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] IBM HDAPS accelerometer driver, with probing.

2005-08-26 Thread Mitchell Blank Jr
Andi Kleen wrote:
 - it doesn't seem to help that much on modern CPUs with good
 branch prediction and big icaches anyways.

Really?  I would think that as pipelines get deeper (although that trend
seems to have stopped, thankfully) and Icache-miss penalties get relatively
larger we'd see unlikely() becoming MORE of a benefit, not less.  Storing
the used part of a hot function in 1 Icacheline instead of 4 seems like
an obvious win.

Personally I've never found unlikely() to be ugly; if anything I think
it serves as a nice little human-readable comment about whats going on
in the control-flow.  I guess I'm in the minority on that one, though.

-Mitch
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Linux-2.6.13-rc7

2005-08-26 Thread Mitchell Blank Jr
Richard Henderson wrote:
 Because I use extern inline in the proper way.  That is, I have both
 inline and out-of-line versions of some routines.

Is there any reason not to just make the out-of-line version explicit?
i.e.:

/* in some .h file: */
static /*(always!)*/inline int my_func(void)
{
return FOO;
}
extern int OOL_my_func(void);

/* in some .c file: */
int OOL_my_func(void)
{
return my_func();
}

It's a little ugly but there really aren't that many cases of this, right?
Or is this just the principal of the thing? :-)

-Mitch
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] driver core: Add the ability to unbind drivers to devices from userspace

2005-07-28 Thread Mitchell Blank Jr
Still one nitpick:

Jon Smirl wrote:
> + while (isspace(*x) && (x - buffer->page < count))
> + x++;

I think you can just do:

if (count > 0)
while (isspace(*x))
x++;

If the passed-in string was fully whitespace then the trailing-whitespace
stripping would already reduce it to count==0.  So as long as you check
for that case you can be sure that the while() will stop before x falls off
the end of the string.

Other than that it looks fine to me now.

-Mitch
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] driver core: Add the ability to unbind drivers to devices from userspace

2005-07-28 Thread Mitchell Blank Jr
Jon Smirl wrote:
> Do we need to deal with UTF8 here? I did the forward loop because you
> can't parse UTF8 backwards. If UTF8 is possible I need to change the
> pointer inc function.

As others have mentioned there shouldn't be a UTF8 problem with isspace().
However, even if you wanted to scan going forwards you can do it without
the complexity of a nested loop:

char *real_end;

for (real_end = x; x - buffer->page < count; x++)
if (!isspace(*x))
real_end = x + 1;
*real_end = '\0';
// then fix "count"

BTW, my code I posted yesterday is incomplete since I neglected to notice
that the "count = z - x" at the end is used to repair "count" damage from
both the leading and trailing whitespace stripping.  Its actually easier
to strip the trailing whitespace first, like:

while (count > 0 && isspace(buffer->page[count - 1]))
count--;/* strip trailing whitespace */
if (count > 0 && isspace(buffer->page[0])) {
/*
 * We have leading whitespace but also at least one
 * non-whitespace character
 */
const char *x = buffer->page;
do {
x++;
count--;
} while (isspace(*x));
memmove(buffer->page, x, count);
}
buffer->page[count] = '\0';

-Mitch
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] speed up on find_first_bit for i386 (let compiler do the work)

2005-07-28 Thread Mitchell Blank Jr
Steven Rostedt wrote:
> In the thread "[RFC][PATCH] Make MAX_RT_PRIO and MAX_USER_RT_PRIO
> configurable" I discovered that a C version of find_first_bit is faster
> than the asm version

There are probably other cases of this in asm-i386/bitopts.h.  For instance
I think the "btl" instruction is pretty slow on modern CPUs so
constant_test_bit() will probably outperform variable_test_bit() even if
you feed it a non-constant "nr".  I'd be happy to be proven wrong, though :-)

When testing these optimizations you should also probably check the resulting
vmlinux size.

-Mitch
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] driver core: Add the ability to unbind drivers to devices from userspace

2005-07-28 Thread Mitchell Blank Jr
Greg KH wrote:
> > +   /* locate trailng white space */
> > +   z = y = x;
> > +   while (y - buffer->page < count) {
> > +   y++;
> > +   z = y;
> > +   while (isspace(*y) && (y - buffer->page < count)) {
> > +   y++;
> > +   }
> > +   }
> > +   count = z - x;
> 
> Hm, I _think_ this works, but I need someone else to verify this...
> Anyone else?

It looks sane-ish to me, but also more complicated than need be.  Why can't
you just do something like:

while (count > 0 && isspace(x[count - 1]))
count--;

-Mitch
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] driver core: Add the ability to unbind drivers to devices from userspace

2005-07-28 Thread Mitchell Blank Jr
Greg KH wrote:
  +   /* locate trailng white space */
  +   z = y = x;
  +   while (y - buffer-page  count) {
  +   y++;
  +   z = y;
  +   while (isspace(*y)  (y - buffer-page  count)) {
  +   y++;
  +   }
  +   }
  +   count = z - x;
 
 Hm, I _think_ this works, but I need someone else to verify this...
 Anyone else?

It looks sane-ish to me, but also more complicated than need be.  Why can't
you just do something like:

while (count  0  isspace(x[count - 1]))
count--;

-Mitch
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] speed up on find_first_bit for i386 (let compiler do the work)

2005-07-28 Thread Mitchell Blank Jr
Steven Rostedt wrote:
 In the thread [RFC][PATCH] Make MAX_RT_PRIO and MAX_USER_RT_PRIO
 configurable I discovered that a C version of find_first_bit is faster
 than the asm version

There are probably other cases of this in asm-i386/bitopts.h.  For instance
I think the btl instruction is pretty slow on modern CPUs so
constant_test_bit() will probably outperform variable_test_bit() even if
you feed it a non-constant nr.  I'd be happy to be proven wrong, though :-)

When testing these optimizations you should also probably check the resulting
vmlinux size.

-Mitch
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] driver core: Add the ability to unbind drivers to devices from userspace

2005-07-28 Thread Mitchell Blank Jr
Jon Smirl wrote:
 Do we need to deal with UTF8 here? I did the forward loop because you
 can't parse UTF8 backwards. If UTF8 is possible I need to change the
 pointer inc function.

As others have mentioned there shouldn't be a UTF8 problem with isspace().
However, even if you wanted to scan going forwards you can do it without
the complexity of a nested loop:

char *real_end;

for (real_end = x; x - buffer-page  count; x++)
if (!isspace(*x))
real_end = x + 1;
*real_end = '\0';
// then fix count

BTW, my code I posted yesterday is incomplete since I neglected to notice
that the count = z - x at the end is used to repair count damage from
both the leading and trailing whitespace stripping.  Its actually easier
to strip the trailing whitespace first, like:

while (count  0  isspace(buffer-page[count - 1]))
count--;/* strip trailing whitespace */
if (count  0  isspace(buffer-page[0])) {
/*
 * We have leading whitespace but also at least one
 * non-whitespace character
 */
const char *x = buffer-page;
do {
x++;
count--;
} while (isspace(*x));
memmove(buffer-page, x, count);
}
buffer-page[count] = '\0';

-Mitch
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] driver core: Add the ability to unbind drivers to devices from userspace

2005-07-28 Thread Mitchell Blank Jr
Still one nitpick:

Jon Smirl wrote:
 + while (isspace(*x)  (x - buffer-page  count))
 + x++;

I think you can just do:

if (count  0)
while (isspace(*x))
x++;

If the passed-in string was fully whitespace then the trailing-whitespace
stripping would already reduce it to count==0.  So as long as you check
for that case you can be sure that the while() will stop before x falls off
the end of the string.

Other than that it looks fine to me now.

-Mitch
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [CHECKER] possible missing capability check in ioctl function, drivers/net/cris/eth_v10.c, kernel 2.6.11

2005-04-15 Thread Mitchell Blank Jr
Ted Kremenek wrote:
> Currently we are looking primarily into the 
> ioctls in drivers/net,

Just as a small aside, a little over five years ago (wow does time fly!) I
did a manual audit for mistakes like this:
  http://lkml.org/lkml/2000/3/7/156
Not sure if that's relevant to your work your not...I'm not sure how you
would can catch most of these errors statistically since a lot of the
dangerous ioctl's were only for a single device.

-Mitch
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [CHECKER] possible missing capability check in ioctl function, drivers/net/cris/eth_v10.c, kernel 2.6.11

2005-04-15 Thread Mitchell Blank Jr
Ted Kremenek wrote:
 Currently we are looking primarily into the 
 ioctls in drivers/net,

Just as a small aside, a little over five years ago (wow does time fly!) I
did a manual audit for mistakes like this:
  http://lkml.org/lkml/2000/3/7/156
Not sure if that's relevant to your work your not...I'm not sure how you
would can catch most of these errors statistically since a lot of the
dangerous ioctl's were only for a single device.

-Mitch
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] ext3: Fix sparse -Wbitwise warnings.

2005-02-15 Thread Mitchell Blank Jr
Stephen C. Tweedie wrote:
> If we want to fix this, let's fix the macros: for example, convert
> EXT3_HAS_COMPAT_FEATURE to be
> 
>   ( le32_to_cpu(EXT3_SB(sb)->s_es->s_feature_compat) & (mask) )

Of course that's less efficient though since "mask" is probably constant..
so now the endian conversion changed from compile-time to run-time.

Would something like

( ( EXT3_SB(sb)->s_es->s_feature_compat & cpu_to_le32(mask) ) != 0)

be enough to satisfy sparse?

-Mitch
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] ext3: Fix sparse -Wbitwise warnings.

2005-02-15 Thread Mitchell Blank Jr
Stephen C. Tweedie wrote:
 If we want to fix this, let's fix the macros: for example, convert
 EXT3_HAS_COMPAT_FEATURE to be
 
   ( le32_to_cpu(EXT3_SB(sb)-s_es-s_feature_compat)  (mask) )

Of course that's less efficient though since mask is probably constant..
so now the endian conversion changed from compile-time to run-time.

Would something like

( ( EXT3_SB(sb)-s_es-s_feature_compat  cpu_to_le32(mask) ) != 0)

be enough to satisfy sparse?

-Mitch
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Lockup in 2.4.2 kernel ADSL PCI card ATM driver module

2001-06-14 Thread Mitchell Blank Jr

[EMAIL PROTECTED] wrote:
> SUBJECT:
> Lockup in 2.4.2 kernel ADSL PCI card ATM driver module
> 
> 
> DRIVER RESULTS:
> Works fine in 2.4.0 kernel.
> Locks up system (no messages/oops/etc.) in 2.4.2-2 kernel (rh 7.1).
> Locks up system (no messages/oops/etc.) in 2.4.2 kernel (w/ or w/o kgdb).
> Locks up system with "int 3" ??? message in 2.4.2-ac28 kernel.

First, please try with a recent 2.4.5ac kernel - a number of ATM bugfixes
have been made recently and they've been merged first with alan's
branch.  Anyway, there's a chance that you might just be hitting one of
those bugs that have already been fixed, so it'd be best to try that
first so we can exclude that possibility.

> MESSAGE GENERATED IN 2.4.2-ac28:
> int 3: 
> CPU: 0
> EIP: 0010:[]

Please read the REPORTING-BUGS and Documentation/oops-tracing.txt files
in the linux source tree - especially the parts about using the
"ksymoops" tool.  These numbers are useless to us unless you run it
through that tool first.

-Mitch
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: Lockup in 2.4.2 kernel ADSL PCI card ATM driver module

2001-06-14 Thread Mitchell Blank Jr

[EMAIL PROTECTED] wrote:
 SUBJECT:
 Lockup in 2.4.2 kernel ADSL PCI card ATM driver module
 
 
 DRIVER RESULTS:
 Works fine in 2.4.0 kernel.
 Locks up system (no messages/oops/etc.) in 2.4.2-2 kernel (rh 7.1).
 Locks up system (no messages/oops/etc.) in 2.4.2 kernel (w/ or w/o kgdb).
 Locks up system with int 3 ??? message in 2.4.2-ac28 kernel.

First, please try with a recent 2.4.5ac kernel - a number of ATM bugfixes
have been made recently and they've been merged first with alan's
branch.  Anyway, there's a chance that you might just be hitting one of
those bugs that have already been fixed, so it'd be best to try that
first so we can exclude that possibility.

 MESSAGE GENERATED IN 2.4.2-ac28:
 int 3: 
 CPU: 0
 EIP: 0010:[c885478b]

Please read the REPORTING-BUGS and Documentation/oops-tracing.txt files
in the linux source tree - especially the parts about using the
ksymoops tool.  These numbers are useless to us unless you run it
through that tool first.

-Mitch
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: symlink_prefix

2001-06-02 Thread Mitchell Blank Jr

Alexander Viro wrote:
> I can see how to implement per-mountpoint variant. However, I'm
> less than enthusiastic about the API side of that and about the
> ugliness it will lead to. It smells like a wrong approach. And
> no, I don't see a good one right now.

Aren't we one day going to have stackable filesystems?  This looks like an
idea for a trivial translation filesystem to me.

-Mitch
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: symlink_prefix

2001-06-02 Thread Mitchell Blank Jr

Alexander Viro wrote:
 I can see how to implement per-mountpoint variant. However, I'm
 less than enthusiastic about the API side of that and about the
 ugliness it will lead to. It smells like a wrong approach. And
 no, I don't see a good one right now.

Aren't we one day going to have stackable filesystems?  This looks like an
idea for a trivial translation filesystem to me.

-Mitch
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: Larger dev_t

2001-03-25 Thread Mitchell Blank Jr

Wichert Akkerman wrote:
> You are just delaying the problem then, at some point your uptime will
> be large enough that you have run through all 64bit pids for example.

64 bits is enough to fork 1 million processes per second for over
500,000 years.  I think that's putting the problem off far enough.

-Mitch
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: Larger dev_t

2001-03-25 Thread Mitchell Blank Jr

Wichert Akkerman wrote:
 You are just delaying the problem then, at some point your uptime will
 be large enough that you have run through all 64bit pids for example.

64 bits is enough to fork 1 million processes per second for over
500,000 years.  I think that's putting the problem off far enough.

-Mitch
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: spinlock usage - ext2_get_block, lru_list_lock

2001-03-21 Thread Mitchell Blank Jr

Andreas Dilger wrote:
> With per-group (or maybe per-bitmap) locking, files could be created in
> parallel with only a small amount of global locking if they are in different
> groups.

...and then we can let the disc go nuts seeking to actually commit all
these new blocks.  I suspect that this change would only be a win for
memory-based discs where seek time is zero.

I think that before anyone starts modifying the kernel for this they
should benchmark how often the free block checker blocks on lock
contention _AND_ how often the thread its contending with is looking
for a free block in a _different_ allocation group.  I bet it's not
often at all.

> It may also be
> possible to have lazy updating of the superblock counts, and depend on
> e2fsck to update the superblock counts on a crash.

That sounds more promising.

> , and only moving the deltas from the groups
> to the superblock on sync or similar.

If we're going to assume that e2fsck will correct the numbers anyway then
there's really no reason to update them any time except when marking
the filesystem clean (umount, remount-ro)  As a bonus, we have to update
the superblock then anyway.

-Mitch
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: spinlock usage - ext2_get_block, lru_list_lock

2001-03-21 Thread Mitchell Blank Jr

Andreas Dilger wrote:
 With per-group (or maybe per-bitmap) locking, files could be created in
 parallel with only a small amount of global locking if they are in different
 groups.

...and then we can let the disc go nuts seeking to actually commit all
these new blocks.  I suspect that this change would only be a win for
memory-based discs where seek time is zero.

I think that before anyone starts modifying the kernel for this they
should benchmark how often the free block checker blocks on lock
contention _AND_ how often the thread its contending with is looking
for a free block in a _different_ allocation group.  I bet it's not
often at all.

 It may also be
 possible to have lazy updating of the superblock counts, and depend on
 e2fsck to update the superblock counts on a crash.

That sounds more promising.

 , and only moving the deltas from the groups
 to the superblock on sync or similar.

If we're going to assume that e2fsck will correct the numbers anyway then
there's really no reason to update them any time except when marking
the filesystem clean (umount, remount-ro)  As a bonus, we have to update
the superblock then anyway.

-Mitch
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [CHECKER] 120 potential dereference to invalid pointers errors for linux 2.4.1

2001-03-17 Thread Mitchell Blank Jr

Junfeng Yang wrote:
> [BUG] fore200e_kmalloc can return NULL
> /u2/acc/oses/linux/2.4.1/drivers/atm/fore200e.c:2032:fore200e_get_esi: 
>ERROR:NULL:2020:2032: Using unknown ptr "prom" illegally! set by 
>'fore200e_kmalloc':2020

I don't see the bug - there is an explicit "if(!prom) return -ENOMEM;" after
the allocation.  It looks fine to me.

> [BUG] break the while loop, but not the for loop
> /u2/acc/oses/linux/2.4.1/drivers/atm/zatm.c:1817:zatm_detect: ERROR:NULL:1804:1817: 
>Using NULL ptr "zatm_dev" illegally! set by 'kmalloc':1804

Ah, good catch.  It'd be almost impossible to actually trigger this since
you'd need multiple cards of different types (all of which are rare) and
end up with really bad allocation luck, but it is technically a bug.
Really line 1829 should be "if(!zatm_dev) return devs;"

> [BUG] at line 1796
> /u2/acc/oses/linux/2.4.1/net/atm/lec.c:1799:lec_arp_update: ERROR:NULL:1798:1799: 
>Using unknown ptr "entry" illegally! set by 'make_entry':1798

Yep, all three of the catches in lec.c are real bugs - great work as always.

-Mitch
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [CHECKER] 120 potential dereference to invalid pointers errors for linux 2.4.1

2001-03-17 Thread Mitchell Blank Jr

Junfeng Yang wrote:
 [BUG] fore200e_kmalloc can return NULL
 /u2/acc/oses/linux/2.4.1/drivers/atm/fore200e.c:2032:fore200e_get_esi: 
ERROR:NULL:2020:2032: Using unknown ptr "prom" illegally! set by 
'fore200e_kmalloc':2020

I don't see the bug - there is an explicit "if(!prom) return -ENOMEM;" after
the allocation.  It looks fine to me.

 [BUG] break the while loop, but not the for loop
 /u2/acc/oses/linux/2.4.1/drivers/atm/zatm.c:1817:zatm_detect: ERROR:NULL:1804:1817: 
Using NULL ptr "zatm_dev" illegally! set by 'kmalloc':1804

Ah, good catch.  It'd be almost impossible to actually trigger this since
you'd need multiple cards of different types (all of which are rare) and
end up with really bad allocation luck, but it is technically a bug.
Really line 1829 should be "if(!zatm_dev) return devs;"

 [BUG] at line 1796
 /u2/acc/oses/linux/2.4.1/net/atm/lec.c:1799:lec_arp_update: ERROR:NULL:1798:1799: 
Using unknown ptr "entry" illegally! set by 'make_entry':1798

Yep, all three of the catches in lec.c are real bugs - great work as always.

-Mitch
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: kernel 2.4.2 SMP + ATM hangs

2001-03-02 Thread Mitchell Blank Jr

Miguel Armas wrote:
> A couple days ago we installed a Fore 200E ATM card and after getting the
> ATM address using ilmid the machine hangs. The kernel still respond to
> pings, but the userspace is dead.
> 
> If we remove SMP support in the kernel everything works fine (but with
> only one CPU)...

You probably need the patch that Chas Williams came up with in January.
I've been meaning to forward it, but I haven't yet.  Please try it and
see if it fixes your problem.

-Mitch

--- net/atm/addr.c.000  Thu Jan 25 16:26:24 2001
+++ net/atm/addr.c  Thu Jan 25 16:26:10 2001
@@ -57,7 +57,6 @@
 {
struct atm_dev_addr *this;
 
-   spin_lock (_dev_lock);
down(_lock);
while (dev->local) {
this = dev->local;
@@ -65,7 +64,6 @@
kfree(this);
}
up(_lock);
-   spin_unlock (_dev_lock);
notify_sigd(dev);
 }
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: dentry cache order 7 is broken

2001-02-08 Thread Mitchell Blank Jr

David S. Miller wrote:
> - hash = hash ^ (hash >> D_HASHBITS) ^ (hash >> D_HASHBITS*2);
> + hash = hash ^ (hash >> D_HASHBITS) ^
> + (hash >> (D_HASHBITS+(D_HASHBITS/2)));

Two comments:
  1. The inode-cache has the exact same problem, but it'll require a lot
 of RAM to run into it.  The buffer and page caches don't have the
 same problem.
  2. Given that D_HASHBITS is not a constant I wonder if there isn't
 a more efficient hash to be found.  But I guess I'll leave that
 to the hashing experts.

-Mitch
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: dentry cache order 7 is broken

2001-02-08 Thread Mitchell Blank Jr

David S. Miller wrote:
 - hash = hash ^ (hash  D_HASHBITS) ^ (hash  D_HASHBITS*2);
 + hash = hash ^ (hash  D_HASHBITS) ^
 + (hash  (D_HASHBITS+(D_HASHBITS/2)));

Two comments:
  1. The inode-cache has the exact same problem, but it'll require a lot
 of RAM to run into it.  The buffer and page caches don't have the
 same problem.
  2. Given that D_HASHBITS is not a constant I wonder if there isn't
 a more efficient hash to be found.  But I guess I'll leave that
 to the hashing experts.

-Mitch
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: O_ANY [was: Re: 'native files', 'object fingerprints' [was: sendpath()]]

2001-01-16 Thread Mitchell Blank Jr

Ingo Molnar wrote:
> - probably the most radical solution is what i suggested, to completely
> avoid the unique-mapping of file structures to an integer range, and use
> the address of the file structure (and some cookies) as an identification.

IMO... gross.  We do pretty much this exact thing in the ATM code (for
the signalling daemon and the kernel exchainging status on VCCs) and it's
pretty disgusting.  I want to make it go away.

> - a less radical solution would be to still map file structures to an
> integer range (file descriptors) and usage-maintain files per processes,
> but relax the 'allocate first non-allocated integer in the range' rule.
[...]
>   fd = open(...,O_ANY);

Yeah, this gets talked about, but I don't think a new flag for open is a
good way to do this, because open() isn't the only thing that returns
a new fd.  What about socket()?  pipe()?

Maybe we could have a new prctl() control that turns this behavior
on and off.  Then you'd just have to be careful to turn it back off
before calling any library functions that require ordering (like popen).

Other than that, I think it'd be a good idea, especially if it could
be implemented clean enough to make it CONFIG_'urable.  That can't
really be fairly judged until someone produces the code.

-Mitch
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: O_ANY [was: Re: 'native files', 'object fingerprints' [was: sendpath()]]

2001-01-16 Thread Mitchell Blank Jr

Ingo Molnar wrote:
 - probably the most radical solution is what i suggested, to completely
 avoid the unique-mapping of file structures to an integer range, and use
 the address of the file structure (and some cookies) as an identification.

IMO... gross.  We do pretty much this exact thing in the ATM code (for
the signalling daemon and the kernel exchainging status on VCCs) and it's
pretty disgusting.  I want to make it go away.

 - a less radical solution would be to still map file structures to an
 integer range (file descriptors) and usage-maintain files per processes,
 but relax the 'allocate first non-allocated integer in the range' rule.
[...]
   fd = open(...,O_ANY);

Yeah, this gets talked about, but I don't think a new flag for open is a
good way to do this, because open() isn't the only thing that returns
a new fd.  What about socket()?  pipe()?

Maybe we could have a new prctl() control that turns this behavior
on and off.  Then you'd just have to be careful to turn it back off
before calling any library functions that require ordering (like popen).

Other than that, I think it'd be a good idea, especially if it could
be implemented clean enough to make it CONFIG_'urable.  That can't
really be fairly judged until someone produces the code.

-Mitch
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: include conventions /usr/include/linux/sys ?

2000-11-22 Thread Mitchell Blank Jr

H. Peter Anvin wrote:
> I suggested include/kernel and include/arch with include/linux and
> include/asm being reserved for the kernel interfaces (ioctl and their
> structures mostly.)

That sounds good.  One other refinement I would like to see is that
architecture specific but always present header files don't get used
directly in architecture-independant .c flies.  Some common examples of
this are  and   The problem is that often there
is some code that is identical between some or all of the archs.  Then
one of two things happen:
  * The code gets duplicated in many *.h files with all the bad things that
comde duplication causes - especially since they're all under seperate
maintainership
  * We have a big ugly conversion like what happened with 'asm/spinlock.h' ->
'linux/spinlock.h'

If we only have arch-independant *.c files only include things out of
 (which may, of course, include things in ) we can
avoid these conversions in the future and promote code reuse between the
architectures.

Also we should probably consider implementing a reasonalbe hierarchy to
the proposed  directory - for instance .

-Mitch
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: [NEW DRIVER] firestream

2000-11-22 Thread Mitchell Blank Jr

First, I'd like to make a couple points about driver style that I'm trying
to move towards with the ATM drivers.  You're free to take them or leave
them, but I want to eventually move the tree in this direction.
  * I don't like header files that define the registers of the chip - since
the header file is only included in the driver's .c file you might as
well just put the definitions there (unless, of course, there is good
reason to think that the registers will be used in multiple drivers -
unlikely in this case)  Having a seperate header file just serves to
hamper searching around the driver and cluttering the directory.
  * Please use the new PCI interface for new drivers (i.e. MODULE_DEVICE_TABLE
and all that)

> +int loopback = 0;
> +int num=0x5a;

These should be defined static.

> +char *irq_bitname[] = {

Should be static.

> +#ifdef DEBUG
> +#define fs_dprintk(f, str...) if (fs_debug & f) printk (str)

Hint - you could use "(fs_debug & FS_DEBUG_##f)" in order to clean up
the callers of this function.  Also you might want to wrap this in a
"do { ... } while(0)" in order to avoid expansion problems in the future.

> +#ifdef DEBUG
> +/* I didn't forget to set this to zero before shipping. Hit me with a stick 
> +   if you get this with the debug default not set to zero again. -- REW */
> +int fs_debug = 0;

Again, should be static.

> +#ifdef MODULE
> +#ifdef DEBUG 
> +MODULE_PARM(fs_debug, "i");

There's no reason to wrap these "MODULE_PARM"s inside an "#ifdef MODULE".

> +#define MIN(a,b) (((a)<(b))?(a):(b))

You don't seem to ever use this definition.

> +#else /* DEBUG */
> +static void my_hd (void *addr, int len){}
> +#endif /* DEBUG */

You might as well make this a null #define in this case.

> +/* Hmm. If this is ATM specific, why isn't there an ATM routine for this?
> + * I copied it over from the ambassador driver. -- REW */
> +
> +static inline void fs_kfree_skb (struct sk_buff * skb) 

Yes, in the 2.5 timeframe this wart will be cleaned up (by always requiring
vcc->pop to be !=NULL)

> +/* It seems the ATM forum recomends this horribly complicated 16bit
> + * floating point format. Turns out the Ambassador uses the exact same
> + * encoding. I just copied it over. If Mitch agrees, I'll move it over
> + * to the atm_misc file or something like that. (and remove it from 
> + * here and the ambassador driver) -- REW
> + */

Yep, in 2.5 we can do something like that (although I want to look at it
a bit further)

> +void write_fs (struct fs_dev *dev, int offset, int val)

Should be static.  And probably inline.  "val" should be a u32.

> +unsigned int read_fs (struct fs_dev *dev, int offset)

Ditto.  Return value should be u32.

> +struct FS_QENTRY *get_qentry (struct fs_dev *dev, struct queue *q)

Should be static

> +void submit_qentry (struct fs_dev *dev, struct queue *q, struct FS_QENTRY *qe)

Should be static

> +void submit_queue (struct fs_dev *dev, struct queue *q, 
> +u32 cmd, u32 p1, u32 p2, u32 p3)

Should be static

> +void submit_command (struct fs_dev *dev, struct queue *q, 

Should be static

> +void process_return_queue (struct fs_dev *dev, struct queue *q)

Should be static

> +void process_txdone_queue (struct fs_dev *dev, struct queue *q)

Should be static

> +void process_incoming (struct fs_dev *dev, struct queue *q)

Should be static

> +static int fs_send (struct atm_vcc *atm_vcc, struct sk_buff *skb)
[...]
> + vcc->last_skb = skb;

I'm really leary of this... it looks to me that if we already had been in
the process of sending an skb then sends after that will lose that skb
(and leak the associated memory).  Please reference the recent thread on
the linux-atm mailing list about how to deal with TX backlog.  also:

> + td = kmalloc (sizeof (struct FS_BPENTRY), GFP_ATOMIC);
> + fs_dprintk (FS_DEBUG_ALLOC, "Alloc transd: %p(%d)\n", td, sizeof (struct 
>FS_BPENTRY));
> + if (!td) {
> + /* Oops out of mem */
> + return -ENOMEM;
> + }

What frees the skb in this case?

> +void undocumented_pci_fix (struct pci_dev *pdev)

Should be static

> +void write_phy (struct fs_dev *dev, int regnum, int val)

Should be static

> +int init_phy (struct fs_dev *dev, struct reginit_item *reginit)

Should be static

> +void reset_chip (struct fs_dev *dev)

Should be static

> +void *aligned_kmalloc (int size, int flags, int alignment)

Should be static

> +{
> + void  *t;
> +
> + if (alignment <= 0x10) {
> + t = kmalloc (size, flags);
> + if ((unsigned int)t & (alignment-1)) {
> + printk ("Kmalloc doesn't align things correctly! %p\n", t);
> + kfree (t);
> + return aligned_kmalloc (size, flags, alignment * 4);

Uh, ok  I'd prefer if you just died here - there shouldn't be any way
that kmalloc is going to return something that isn't 16-byte aligned.  It's
wise to double check this when it's important but I 

Re: [PATCH] atmrefcount

2000-11-22 Thread Mitchell Blank Jr

Patrick van de Lageweg wrote:
> This patch contains the fix for the atmrefcount problem (noted as a
> critical problem in Ted's todo list). It also has the makefile
> modifications for the firestream driver in a separate email. 

Could you please split the two patches so that they're actually independant
(i.e. put all the stuff for the new driver in its own patch).  I'll
comment on that driver seperately.

-Mitch
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: [PATCH] atmrefcount

2000-11-22 Thread Mitchell Blank Jr

Patrick van de Lageweg wrote:
 This patch contains the fix for the atmrefcount problem (noted as a
 critical problem in Ted's todo list). It also has the makefile
 modifications for the firestream driver in a separate email. 

Could you please split the two patches so that they're actually independant
(i.e. put all the stuff for the new driver in its own patch).  I'll
comment on that driver seperately.

-Mitch
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: [NEW DRIVER] firestream

2000-11-22 Thread Mitchell Blank Jr

First, I'd like to make a couple points about driver style that I'm trying
to move towards with the ATM drivers.  You're free to take them or leave
them, but I want to eventually move the tree in this direction.
  * I don't like header files that define the registers of the chip - since
the header file is only included in the driver's .c file you might as
well just put the definitions there (unless, of course, there is good
reason to think that the registers will be used in multiple drivers -
unlikely in this case)  Having a seperate header file just serves to
hamper searching around the driver and cluttering the directory.
  * Please use the new PCI interface for new drivers (i.e. MODULE_DEVICE_TABLE
and all that)

 +int loopback = 0;
 +int num=0x5a;

These should be defined static.

 +char *irq_bitname[] = {

Should be static.

 +#ifdef DEBUG
 +#define fs_dprintk(f, str...) if (fs_debug  f) printk (str)

Hint - you could use "(fs_debug  FS_DEBUG_##f)" in order to clean up
the callers of this function.  Also you might want to wrap this in a
"do { ... } while(0)" in order to avoid expansion problems in the future.

 +#ifdef DEBUG
 +/* I didn't forget to set this to zero before shipping. Hit me with a stick 
 +   if you get this with the debug default not set to zero again. -- REW */
 +int fs_debug = 0;

Again, should be static.

 +#ifdef MODULE
 +#ifdef DEBUG 
 +MODULE_PARM(fs_debug, "i");

There's no reason to wrap these "MODULE_PARM"s inside an "#ifdef MODULE".

 +#define MIN(a,b) (((a)(b))?(a):(b))

You don't seem to ever use this definition.

 +#else /* DEBUG */
 +static void my_hd (void *addr, int len){}
 +#endif /* DEBUG */

You might as well make this a null #define in this case.

 +/* Hmm. If this is ATM specific, why isn't there an ATM routine for this?
 + * I copied it over from the ambassador driver. -- REW */
 +
 +static inline void fs_kfree_skb (struct sk_buff * skb) 

Yes, in the 2.5 timeframe this wart will be cleaned up (by always requiring
vcc-pop to be !=NULL)

 +/* It seems the ATM forum recomends this horribly complicated 16bit
 + * floating point format. Turns out the Ambassador uses the exact same
 + * encoding. I just copied it over. If Mitch agrees, I'll move it over
 + * to the atm_misc file or something like that. (and remove it from 
 + * here and the ambassador driver) -- REW
 + */

Yep, in 2.5 we can do something like that (although I want to look at it
a bit further)

 +void write_fs (struct fs_dev *dev, int offset, int val)

Should be static.  And probably inline.  "val" should be a u32.

 +unsigned int read_fs (struct fs_dev *dev, int offset)

Ditto.  Return value should be u32.

 +struct FS_QENTRY *get_qentry (struct fs_dev *dev, struct queue *q)

Should be static

 +void submit_qentry (struct fs_dev *dev, struct queue *q, struct FS_QENTRY *qe)

Should be static

 +void submit_queue (struct fs_dev *dev, struct queue *q, 
 +u32 cmd, u32 p1, u32 p2, u32 p3)

Should be static

 +void submit_command (struct fs_dev *dev, struct queue *q, 

Should be static

 +void process_return_queue (struct fs_dev *dev, struct queue *q)

Should be static

 +void process_txdone_queue (struct fs_dev *dev, struct queue *q)

Should be static

 +void process_incoming (struct fs_dev *dev, struct queue *q)

Should be static

 +static int fs_send (struct atm_vcc *atm_vcc, struct sk_buff *skb)
[...]
 + vcc-last_skb = skb;

I'm really leary of this... it looks to me that if we already had been in
the process of sending an skb then sends after that will lose that skb
(and leak the associated memory).  Please reference the recent thread on
the linux-atm mailing list about how to deal with TX backlog.  also:

 + td = kmalloc (sizeof (struct FS_BPENTRY), GFP_ATOMIC);
 + fs_dprintk (FS_DEBUG_ALLOC, "Alloc transd: %p(%d)\n", td, sizeof (struct 
FS_BPENTRY));
 + if (!td) {
 + /* Oops out of mem */
 + return -ENOMEM;
 + }

What frees the skb in this case?

 +void undocumented_pci_fix (struct pci_dev *pdev)

Should be static

 +void write_phy (struct fs_dev *dev, int regnum, int val)

Should be static

 +int init_phy (struct fs_dev *dev, struct reginit_item *reginit)

Should be static

 +void reset_chip (struct fs_dev *dev)

Should be static

 +void *aligned_kmalloc (int size, int flags, int alignment)

Should be static

 +{
 + void  *t;
 +
 + if (alignment = 0x10) {
 + t = kmalloc (size, flags);
 + if ((unsigned int)t  (alignment-1)) {
 + printk ("Kmalloc doesn't align things correctly! %p\n", t);
 + kfree (t);
 + return aligned_kmalloc (size, flags, alignment * 4);

Uh, ok  I'd prefer if you just died here - there shouldn't be any way
that kmalloc is going to return something that isn't 16-byte aligned.  It's
wise to double check this when it's important but I wouldn't put too much
work into fixing this.  And calling ourselves 

Re: include conventions /usr/include/linux/sys ?

2000-11-22 Thread Mitchell Blank Jr

H. Peter Anvin wrote:
 I suggested include/kernel and include/arch with include/linux and
 include/asm being reserved for the kernel interfaces (ioctl and their
 structures mostly.)

That sounds good.  One other refinement I would like to see is that
architecture specific but always present header files don't get used
directly in architecture-independant .c flies.  Some common examples of
this are asm/io.h and asm/uaccess.h  The problem is that often there
is some code that is identical between some or all of the archs.  Then
one of two things happen:
  * The code gets duplicated in many *.h files with all the bad things that
comde duplication causes - especially since they're all under seperate
maintainership
  * We have a big ugly conversion like what happened with 'asm/spinlock.h' -
'linux/spinlock.h'

If we only have arch-independant *.c files only include things out of
kernel/* (which may, of course, include things in arch/*) we can
avoid these conversions in the future and promote code reuse between the
architectures.

Also we should probably consider implementing a reasonalbe hierarchy to
the proposed kernel/*.h directory - for instance kernel/bus/pci.h.

-Mitch
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: Kernel OOPS on boot

2000-10-26 Thread Mitchell Blank Jr

Richard B. Johnson wrote:
> > And here is the broken routine:
> > 
> > 03f4 :
[...]
> This is not good code. It does the following:
> 
> o Gets a parameter off the stack and puts into eax (a pointer).
> o Put the value 1 into ecx.
> o Take a byte from the pointed-to location and put it into dl.
> o Put the byte in dl back into memory where it came from!
> o Clear ecx
> o Copy ecx to eax for a return value of 0

If you look at the original routine (arch/i386/mm/init.c:test_wp_bit),
you'll see that its purpose is to test the capabilities of the CPU's
MMU (to differentiate the old 386's that didn't honor WP in supervisor
mode), which is why it does this little dance - to provoke a page
fault.

Anyway, the probable fix has already been posted (i.e. don't use gcc
2.7.2.*)

-Mitch
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: Kernel OOPS on boot

2000-10-26 Thread Mitchell Blank Jr

Richard B. Johnson wrote:
  And here is the broken routine:
  
  03f4 do_test_wp_bit:
[...]
 This is not good code. It does the following:
 
 o Gets a parameter off the stack and puts into eax (a pointer).
 o Put the value 1 into ecx.
 o Take a byte from the pointed-to location and put it into dl.
 o Put the byte in dl back into memory where it came from!
 o Clear ecx
 o Copy ecx to eax for a return value of 0

If you look at the original routine (arch/i386/mm/init.c:test_wp_bit),
you'll see that its purpose is to test the capabilities of the CPU's
MMU (to differentiate the old 386's that didn't honor WP in supervisor
mode), which is why it does this little dance - to provoke a page
fault.

Anyway, the probable fix has already been posted (i.e. don't use gcc
2.7.2.*)

-Mitch
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: Linux's implementation of poll() not scalable?

2000-10-24 Thread Mitchell Blank Jr

Linus Torvalds wrote:
> >   * it doesn't add extra syscalls
> 
> Sure it does. 
> 
> What do you think ioctl's are?

As I explained a few lines down from where you stopped quoting (and probably
stopped reading) the ioctl() use is just an artifact of Solaris's icky
implementation.  It could and should use read().  Not an extra syscall.

> Sure, you can have multiple "poll lists" by opening /dev/poll multiple
> times, and you seem to see this as a great feature. I'm telling you that
> multiple event queues are a horrible disaster, and it's a perfect example
> of what an engineer with the "more is better" mindset comes up with.

It has nothing to do with how exciting of a feature it is.  It's a matter
"hey, if you implement it with /dev/poll, you get multiple queues FOR
FREE".  Those sorts of things are an indication that you've got the
implementation right.  The fact that in your scheme multiple queues would
be extra work is just residue from the fact that you've got the idea wrong.

And how do you propose dealing with multi-threaded programs.  Obviously for
SMP you'd want multiple threads to be able to grab events, so I guess the
"one great queue" will be shared among all the threads.  But, suppose I
want to dedicate a thread to coping with a specific set of events.  H...
guess you'll need a CLONE_EVENTQUEUE flag to support that.  With /dev/poll
none of this foolishness is needed - you just open tow of them and let
the threads do what they want.

> Multiple event queues are bad, because it completely breaks the notion of
> even-driven programming. How do you want to listen to them all? You can't.

Explained in detail in my mail.

> You can only listen to one event queue at a time - unless you create some
> kind of "hierarchy" of event queues, where you have one event queue that
> you wait on that contains the _other_ event queues, so that you can tell
> which one has anything pending..

No, event queues of event queues are messy in implementation.  There's
nothing fundamentally wrong with them, but you need to make sure the
graph stays acyclic which is just an implementation hassle.  Not enough
gain for too much code (unless someone has a beautiful algorithm)

However, it's perfectly reasonable to use select() or poll() on two event
queues.  As I explained in my email, this is needed if you're already using
a library that does its own event queue based on poll or select.

> Basically, tastes differ. Everything you seem to see as an advantage of
> /dev/poll, I see as a horrible kludge.

Well, everything I see indicates that /dev/poll would be the easier of
the two interfaces to actually implement.  All the added flexibility is
just gravy.

-Mitch
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: Linux's implementation of poll() not scalable?

2000-10-24 Thread Mitchell Blank Jr

Linus Torvalds wrote:
> Here's a suggested "good" interface that would certainly be easy to
> implement, and very easy to use, with none of the scalability issues that
> many interfaces have.

I think everyone should take a timeout and look at Solaris 8's /dev/poll
interface.  This discussion is reinventing the wheel, the lever, and the
inclined plane.

http://docs.sun.com/ab2/coll.40.6/REFMAN7/@Ab2PageView/55123

I think this is a lot cleaner than your approach:
  * it doesn't add extra syscalls

  * it handles multiple event queues, and does so without ugliness.
all the per-queue state is held in the /dev/poll's "struct file"
instance

  * in your method you have a per-process queue - but under what clone()
conditions is it shared among siblings?  Here the user has a choice -
they can share the open("/dev/poll") file descriptor or not using
any of the existing means.  Granted, they also would probably want
to arrange to share the fd's being polled in order to make this
useful.

  * No new fields in task_struct

A few simple improvements can be made to the Sun model, though:

  * The fact that the fd of /dev/poll can't be used for poll(2) or select(2)
is ugly.  Sure, you probably don't want an open instance of /dev/poll
being added to another /dev/poll, but being able to call "select" on
them would be really useful:
  1. Imagine a server that has to process connections from both
 high-priority and low-priority clients - and that requests from
 the high-priority ones always take precedence.  With this
 interface you could easily have two open instances of /dev/poll
 and then call select(2) on them.  This ability just falls
 out naturally from the interface.
  2. Some libraries are internally driven by select(2) loops (I think
 Xlib is like this, IIRC)  If you have a lot of fd's you want to
 watch, this means you must take the hit of calling select(2) on
 all of them.  If you could just pass in a fd for /dev/poll,
 problem solved.

  * I think the fact that you add events via write(2) but retrieve them
via ioctl(2) is an ugly asymmetry.  From what I can see, the entire
motivation for using ioctl as opposed to read(2) is to allow the user
to specify a timeout.  If you could use poll(2)/select(2) on /dev/poll
this issue would be moot (see above)

  * It would be nice if the interface were flexible enough to report
items _other_ than "activity on fd" in the future.  Maybe SYSV IPC?
itimers?  directory update notification?  It seems that every time
UNIX adds a mechanism of waiting for events, we spoil it by not
making it flexible enough to wait on everything you might want.
Lets not repeat the mistake with a new interface.

  * The "struct pollfd" and "struct dvpoll" should also include a 64-bit
opaque quantity supplied by userland which is returned with each event
on that fd.  This would save the program from having to look up
which connection context goes with each fd - the kernel could just
give you the pointer to the structure.  Not having this capability isn't
a burden for programs dealing with a small number of fd's (since they
can just have a lookup table) but if you potentially have 1's of
connections it may be undesirable to allocate an array for them all.

The linux patch of /dev/poll implements mmap'ing of the in-kenrel poll
table... I don't think this is a good idea.  First, the user just wants to
be able to add events and dequeue them - both linear operations.  Second,
why should the kernel be forced to maintain a fixed-size linear list of
events we're looking for... this seems mindlessly inefficient.  Why not
just pull a "struct pollfd" out of slab each time a new event is listened
for?

My unresolved concerns:
  * Is this substantially better than the already existing rtsig+poll
solution?  Enough to make implementing it worth while?

  * How do we quickly do the "struct file" -> "struct pollfd" conversion
each time an event happens?  Especially if there are multiple /dev/poll
instances open in the current process.  Probably each "struct file"
would need a pointer to the instance of /dev/poll which would have
some b-tree variant (or maybe a hash table).  The limitation would
be that a single fd couldn't be polled for events by two different
/dev/poll instances, even for different events.  This is probably
good for sanity's sake anyway.

-Mitch
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: Linux's implementation of poll() not scalable?

2000-10-24 Thread Mitchell Blank Jr

Linus Torvalds wrote:
 Here's a suggested "good" interface that would certainly be easy to
 implement, and very easy to use, with none of the scalability issues that
 many interfaces have.

I think everyone should take a timeout and look at Solaris 8's /dev/poll
interface.  This discussion is reinventing the wheel, the lever, and the
inclined plane.

http://docs.sun.com/ab2/coll.40.6/REFMAN7/@Ab2PageView/55123

I think this is a lot cleaner than your approach:
  * it doesn't add extra syscalls

  * it handles multiple event queues, and does so without ugliness.
all the per-queue state is held in the /dev/poll's "struct file"
instance

  * in your method you have a per-process queue - but under what clone()
conditions is it shared among siblings?  Here the user has a choice -
they can share the open("/dev/poll") file descriptor or not using
any of the existing means.  Granted, they also would probably want
to arrange to share the fd's being polled in order to make this
useful.

  * No new fields in task_struct

A few simple improvements can be made to the Sun model, though:

  * The fact that the fd of /dev/poll can't be used for poll(2) or select(2)
is ugly.  Sure, you probably don't want an open instance of /dev/poll
being added to another /dev/poll, but being able to call "select" on
them would be really useful:
  1. Imagine a server that has to process connections from both
 high-priority and low-priority clients - and that requests from
 the high-priority ones always take precedence.  With this
 interface you could easily have two open instances of /dev/poll
 and then call select(2) on them.  This ability just falls
 out naturally from the interface.
  2. Some libraries are internally driven by select(2) loops (I think
 Xlib is like this, IIRC)  If you have a lot of fd's you want to
 watch, this means you must take the hit of calling select(2) on
 all of them.  If you could just pass in a fd for /dev/poll,
 problem solved.

  * I think the fact that you add events via write(2) but retrieve them
via ioctl(2) is an ugly asymmetry.  From what I can see, the entire
motivation for using ioctl as opposed to read(2) is to allow the user
to specify a timeout.  If you could use poll(2)/select(2) on /dev/poll
this issue would be moot (see above)

  * It would be nice if the interface were flexible enough to report
items _other_ than "activity on fd" in the future.  Maybe SYSV IPC?
itimers?  directory update notification?  It seems that every time
UNIX adds a mechanism of waiting for events, we spoil it by not
making it flexible enough to wait on everything you might want.
Lets not repeat the mistake with a new interface.

  * The "struct pollfd" and "struct dvpoll" should also include a 64-bit
opaque quantity supplied by userland which is returned with each event
on that fd.  This would save the program from having to look up
which connection context goes with each fd - the kernel could just
give you the pointer to the structure.  Not having this capability isn't
a burden for programs dealing with a small number of fd's (since they
can just have a lookup table) but if you potentially have 1's of
connections it may be undesirable to allocate an array for them all.

The linux patch of /dev/poll implements mmap'ing of the in-kenrel poll
table... I don't think this is a good idea.  First, the user just wants to
be able to add events and dequeue them - both linear operations.  Second,
why should the kernel be forced to maintain a fixed-size linear list of
events we're looking for... this seems mindlessly inefficient.  Why not
just pull a "struct pollfd" out of slab each time a new event is listened
for?

My unresolved concerns:
  * Is this substantially better than the already existing rtsig+poll
solution?  Enough to make implementing it worth while?

  * How do we quickly do the "struct file" - "struct pollfd" conversion
each time an event happens?  Especially if there are multiple /dev/poll
instances open in the current process.  Probably each "struct file"
would need a pointer to the instance of /dev/poll which would have
some b-tree variant (or maybe a hash table).  The limitation would
be that a single fd couldn't be polled for events by two different
/dev/poll instances, even for different events.  This is probably
good for sanity's sake anyway.

-Mitch
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: Linux's implementation of poll() not scalable?

2000-10-24 Thread Mitchell Blank Jr

Linus Torvalds wrote:
* it doesn't add extra syscalls
 
 Sure it does. 
 
 What do you think ioctl's are?

As I explained a few lines down from where you stopped quoting (and probably
stopped reading) the ioctl() use is just an artifact of Solaris's icky
implementation.  It could and should use read().  Not an extra syscall.

 Sure, you can have multiple "poll lists" by opening /dev/poll multiple
 times, and you seem to see this as a great feature. I'm telling you that
 multiple event queues are a horrible disaster, and it's a perfect example
 of what an engineer with the "more is better" mindset comes up with.

It has nothing to do with how exciting of a feature it is.  It's a matter
"hey, if you implement it with /dev/poll, you get multiple queues FOR
FREE".  Those sorts of things are an indication that you've got the
implementation right.  The fact that in your scheme multiple queues would
be extra work is just residue from the fact that you've got the idea wrong.

And how do you propose dealing with multi-threaded programs.  Obviously for
SMP you'd want multiple threads to be able to grab events, so I guess the
"one great queue" will be shared among all the threads.  But, suppose I
want to dedicate a thread to coping with a specific set of events.  H...
guess you'll need a CLONE_EVENTQUEUE flag to support that.  With /dev/poll
none of this foolishness is needed - you just open tow of them and let
the threads do what they want.

 Multiple event queues are bad, because it completely breaks the notion of
 even-driven programming. How do you want to listen to them all? You can't.

Explained in detail in my mail.

 You can only listen to one event queue at a time - unless you create some
 kind of "hierarchy" of event queues, where you have one event queue that
 you wait on that contains the _other_ event queues, so that you can tell
 which one has anything pending..

No, event queues of event queues are messy in implementation.  There's
nothing fundamentally wrong with them, but you need to make sure the
graph stays acyclic which is just an implementation hassle.  Not enough
gain for too much code (unless someone has a beautiful algorithm)

However, it's perfectly reasonable to use select() or poll() on two event
queues.  As I explained in my email, this is needed if you're already using
a library that does its own event queue based on poll or select.

 Basically, tastes differ. Everything you seem to see as an advantage of
 /dev/poll, I see as a horrible kludge.

Well, everything I see indicates that /dev/poll would be the easier of
the two interfaces to actually implement.  All the added flexibility is
just gravy.

-Mitch
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: 2.4.0-testx fr0kedness?

2000-10-21 Thread Mitchell Blank Jr

Jason Slagle wrote:
> It worked fine under 2.3.x and as a matter of fact worked fine under
> 2.4.0-test1-4 I believe.
> 
> I don't buy a hardware explination when I've been running this setup for
> well over a year and only just now have problems :)

It's because we've seen it a hundred zillion times before...  weird problems
with overclocked systems that the owners claim "work fine under any other
version, so it must be a kernel bug".

You could well be on the edge of where overclocking breaks down.  There's
some sequence of instructions and/or memory access in the kernel that is
causing your particular CPU to croak every billionth time through.  Maybe
previous kernel compiles didn't emit those exact instructions and you
lucked out.

Anyway, turn off overclokcing and try to reproduce.

-Mitch
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: 2.4.0-testx fr0kedness?

2000-10-21 Thread Mitchell Blank Jr

Jason Slagle wrote:
 It worked fine under 2.3.x and as a matter of fact worked fine under
 2.4.0-test1-4 I believe.
 
 I don't buy a hardware explination when I've been running this setup for
 well over a year and only just now have problems :)

It's because we've seen it a hundred zillion times before...  weird problems
with overclocked systems that the owners claim "work fine under any other
version, so it must be a kernel bug".

You could well be on the edge of where overclocking breaks down.  There's
some sequence of instructions and/or memory access in the kernel that is
causing your particular CPU to croak every billionth time through.  Maybe
previous kernel compiles didn't emit those exact instructions and you
lucked out.

Anyway, turn off overclokcing and try to reproduce.

-Mitch
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: atm_devs protection.

2000-10-18 Thread Mitchell Blank Jr

Werner Almesberger wrote:
> Mitchell Blank Jr wrote:
> > Yeah, a lot of the add/remove device ATM code (and, IMO, even the vcc
> > open/close) code is pretty suspect.
> 
> That's actually a bit of an understatement:

Well, I was trying to be polite to the original author ;-)

-Mitch
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: atm_devs protection.

2000-10-18 Thread Mitchell Blank Jr

Rogier Wolff wrote:
> We're trying to make the module refcounting 'secure' against
> concurrent SMP unloads.
> 
> For example in net/atm/resources.c: 

Yeah, a lot of the add/remove device ATM code (and, IMO, even the vcc
open/close) code is pretty suspect.  If you want to look through and
liberally sprinkle lock_kernel() everywhere, that wouldn't really be a
bad idea.  A lot of this code hasn't changed much since the 1.3 days,
so I don't know if anyone ever gave a good look at its SMP friendlyness.
At one point I cleaned up a few problems, but I'n not sure I got them
all.  I think most of the races that are easy to trigger are gone (at
least in the common code, can't be sure about all the drivers).

As far as module_unload, isn't that protected by lock_kernel?  What am
I missing?

-Mitch
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: atm_devs protection.

2000-10-18 Thread Mitchell Blank Jr

Werner Almesberger wrote:
 Mitchell Blank Jr wrote:
  Yeah, a lot of the add/remove device ATM code (and, IMO, even the vcc
  open/close) code is pretty suspect.
 
 That's actually a bit of an understatement:

Well, I was trying to be polite to the original author ;-)

-Mitch
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: Updated 2.4 TODO List

2000-10-09 Thread Mitchell Blank Jr

[EMAIL PROTECTED] wrote:
> 4. Boot Time Failures
[...]
>  * IBM Thinkpad 390 won't boot since 2.3.11 (See Decklin Foster for
>more info)

I _highly_ suspect that this is not a 2.4 bug but is instead user error.
I've seen it several times.

On all kernel versions prior to 2.3.11 if you compiled with the CPU set
to a newer model then you had (for instance, of you selected "Pentium Pro"
and you actually had a 486) you would get a kernel that worked fine albiet
not optimally.  On kernels 2.3.11 you will have an unbootable kernel.

Decklin - please try to reproduce this with CONFIG_M386.  Also you might
try looking in /proc/cpuinfo and make sure that laptop has the CPU you
think it has.  The reason I know which kernel version this problem appeared
in is because I spent an entire evening chasing this same "kernel bug"
a couple months ago.  Turned out the machine I could have _sworn_ was a
Celeron was a K6 :-)

-Mitch
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: Updated 2.4 TODO List

2000-10-09 Thread Mitchell Blank Jr

[EMAIL PROTECTED] wrote:
 4. Boot Time Failures
[...]
  * IBM Thinkpad 390 won't boot since 2.3.11 (See Decklin Foster for
more info)

I _highly_ suspect that this is not a 2.4 bug but is instead user error.
I've seen it several times.

On all kernel versions prior to 2.3.11 if you compiled with the CPU set
to a newer model then you had (for instance, of you selected "Pentium Pro"
and you actually had a 486) you would get a kernel that worked fine albiet
not optimally.  On kernels 2.3.11 you will have an unbootable kernel.

Decklin - please try to reproduce this with CONFIG_M386.  Also you might
try looking in /proc/cpuinfo and make sure that laptop has the CPU you
think it has.  The reason I know which kernel version this problem appeared
in is because I spent an entire evening chasing this same "kernel bug"
a couple months ago.  Turned out the machine I could have _sworn_ was a
Celeron was a K6 :-)

-Mitch
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: 2.4.0test - uid_hash_find()

2000-10-08 Thread Mitchell Blank Jr

Dan Aloni wrote:
> 
> I've been touring around the kernel sources when I stumbled
> across the uid_hash_find() function (kernel/user.c):
> 
> static inline struct user_struct *uid_hash_find(unsigned short uid, unsigned int 
>hashent)

Is it just me, or should that be "uid_t" not "unsigned short"?  Now THAT
looks like a bug.

-Mitch
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: Calling current() from interrupt context

2000-10-08 Thread Mitchell Blank Jr

[EMAIL PROTECTED] wrote:
> > Yes, on architectures that use current->processor that is an exception
> > to the rule.  After all, you know for sure that you're still on the
> > same CPU as the task currently running.
> 
> This makes sense. And I wish cpu architects would put a cpu-id
> register somewhere
> so that we could have fast computation of cpu-id on smp machines.

Some do (look at asm-sparc/smp.h).  However, linux does everything possible
to make dereferencing "current" as fast as possible, so actually just
looping it up there is pretty damn efficient.  We need to keep track of the
last cpu a processor ran on anyway to do cpu-affinity heuristics in the
scheduler.  So this implementation is quite optimal, actually.

-Mitch
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: Calling current() from interrupt context

2000-10-08 Thread Mitchell Blank Jr

[EMAIL PROTECTED] wrote:
> Looking at the code, I don't see any places where "current" is not valid.
> Got some examples? 

It's not that its invalid, it just doesn't make much sense.  It points to
whatever task happened to be running when the interrupt happened.  So
any attempt to access it is 99% likely to be a bug.

> BTW: there is an implicit reference to "current"  in smp_processor_id. 

Yes, on architectures that use current->processor that is an exception
to the rule.  After all, you know for sure that you're still on the
same CPU as the task currently running.

-Mitch
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: Calling current() from interrupt context

2000-10-08 Thread Mitchell Blank Jr

[EMAIL PROTECTED] wrote:
 Looking at the code, I don't see any places where "current" is not valid.
 Got some examples? 

It's not that its invalid, it just doesn't make much sense.  It points to
whatever task happened to be running when the interrupt happened.  So
any attempt to access it is 99% likely to be a bug.

 BTW: there is an implicit reference to "current"  in smp_processor_id. 

Yes, on architectures that use current-processor that is an exception
to the rule.  After all, you know for sure that you're still on the
same CPU as the task currently running.

-Mitch
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: Calling current() from interrupt context

2000-10-08 Thread Mitchell Blank Jr

[EMAIL PROTECTED] wrote:
  Yes, on architectures that use current-processor that is an exception
  to the rule.  After all, you know for sure that you're still on the
  same CPU as the task currently running.
 
 This makes sense. And I wish cpu architects would put a cpu-id
 register somewhere
 so that we could have fast computation of cpu-id on smp machines.

Some do (look at asm-sparc/smp.h).  However, linux does everything possible
to make dereferencing "current" as fast as possible, so actually just
looping it up there is pretty damn efficient.  We need to keep track of the
last cpu a processor ran on anyway to do cpu-affinity heuristics in the
scheduler.  So this implementation is quite optimal, actually.

-Mitch
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: 2.4.0test - uid_hash_find()

2000-10-08 Thread Mitchell Blank Jr

Dan Aloni wrote:
 
 I've been touring around the kernel sources when I stumbled
 across the uid_hash_find() function (kernel/user.c):
 
 static inline struct user_struct *uid_hash_find(unsigned short uid, unsigned int 
hashent)

Is it just me, or should that be "uid_t" not "unsigned short"?  Now THAT
looks like a bug.

-Mitch
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: (reiserfs) Re: More on 2.2.18pre2aa2

2000-09-13 Thread Mitchell Blank Jr

James Sutherland wrote:
> In terms of latency, I'd suggest we aim to keep the device in use all the
> time we have outstanding requests: every time the device is ready to
> accept a request, we feed it the "next" one in the queue; until it is free
> again, requests pile up in the queue, being sorted, merged etc.
> 
> This should perform very well under light loads - requests just get passed
> to the device as fast as they can be handled - and under very heavy loads
> - we have a large queue in play, with plenty of scope for reordering,
> merges etc.

The "large queue" goes against the whole point of this exercise - that
is that if there are many items in the "queue" being sorted then
unlucky requests can end up waiting a long time to get serviced.
You want to have some reasonable break-off point where you just
decide to make an elevator swipe in order to avoid starving those
requests.

Keep in mind that I'm using the word "queue" in a confusing manner
here - we're talking about how much we'll try to elevator-sort in
a go, not the actual queue of all requests.  I've been calling this
a queue because from a performance standpoint that's what it is,
so I think it helps to think of the problem that way.

Oh yeah one other thing about huge elevator queues to think about
(although its not too pertinant these days).  A friend of mine had
a job 15 years or so ago to working on a commercial UNIX flavor.
They were having some severe I/O performance problems under high
loads.  The problem was that they were trying to sort and merge
all the pending requests.  When I/O was heavy this ended up
chewing all the CPU time and the kernel wouldn't actually be
able to keep up sending requests to the disk :-)  However, keep
in mind that this was int the bad-old-days when "hard drive"
meant a rack-mount Eagle and the elevator was expected to consider
things like head positioning and rotational speed when doing
a sort...

-Mitch
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: Proposal: Linux Kernel Patch Management System

2000-09-13 Thread Mitchell Blank Jr

Alexander Viro wrote:
> BTW, any bug reports starting with "kernel is x.y.z + FOO42069 + K314 +
> " will be cheerfully flushed down the toilet here,
> no matter what system of dependencies is going to be in place.

Yes, for the stuff discussed on lkml patch dependencies should be
pretty minimal.  However, if I were discussing something on linux-m68k
it would be common to say "kernel is 2.5.18 + m68k-native-patch-2.5.18 +
mac68k-patch-2.5.18"

-Mitch
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: Proposal: Linux Kernel Patch Management System

2000-09-13 Thread Mitchell Blank Jr

Alan Cox wrote:
> Humans will generally get a weird report from sending Linus a message wonder
> what all this field stuff is and mail someone else instead. 

If they're able to create a patch, hopefully they'd be able to fill in
a simple email template (and I've seen some pretty dim folks manage to
register domains with InterNIC, so email templates aren't that hard :-)

We could even have a simple web page where you check a few boxes and
fill in "URL to patch" and hit submit.

> The thing that needs the most thinking about is how to handle patches that
> don't have all this id crap and stuff on them.

Like any tracking system, the suceess or failure of its rollout depends
completely on whether Linus et al will be steadfast enough to refuse
to look at any patch that hasn't gone through the system.

It's just like when you're converting a helpdesk to use a ticket tracking
system.  You need to teach the tech people that when they run into
someone in the hallway that says "hey, my printer is broken, come fix it"
they they need to reply "you must be mistaken - if your printer were
broken you would have filed a trouble ticket... since I have no trouble
ticket your printer must be fine".  People catch on quick.

-Mitch
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: (reiserfs) Re: More on 2.2.18pre2aa2

2000-09-13 Thread Mitchell Blank Jr

Alan Cox wrote:
> > Yes, but "how hard is it reasonable for the kernel to try" is based on
> > both items.  A good first order approximation is number of requests.
> 
> I must strongly disagree with that claim. A request could be 512 bytes or
> 128K. 

Yeah, as sct pointed out this gets thorny.  For a modern harddrive this
probably doesn't matter (since sequential I/O is SO fast compared to
seek times) but for other devices its an issue.

> > ...where the user sets a number exlpicitly for what performance they
> > want.  Again, if we're going to make the user set this latency
> 
> No they do not. The parameters are defined by the bandwidth and measured 
> behaviour.

Hmmm... well if someone wants to propose an algorithm to self-tune the
"queue depth in milliseconds" number then maybe we'll get somewhere.
You'd need to do some sort of moving averages of both requests/sec and
sectors/sec that come out of the elevator and use those as feedback to
adjust the queue-depth-value.  I'm not sure if this is advisable, but
at least it sounds feasable.

-Mitch
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: Proposal: Linux Kernel Patch Management System

2000-09-13 Thread Mitchell Blank Jr

Daniel Quinlan wrote:
>   "Version" - the base kernel version.  For example, "2.4.0-test8-pre1".
>   The web page will list valid version strings.

Ideally this should be overridable for patches marked experimental, since
they might be to some modified kernel.  Of course you wouldn't do an
test for applyability :-)

>Linus wants the body of patches to be in text format and not
>MIME-encoded or uuencoded.

Well, since it'll be machine-parsed anyway, can't we just fix that up?

>Good patches are sent to the linux-kernel mailing list which is
>also where additional discussion about these patches can take
>place.

...and the patch-id should go in the Subject line i.e.
Subject: [PATCHID: KP3004] fix for video card catching fire
Then the web interface to the patch database could also have
the archived discussion available right there

Also there should be some flag to turn off the "automatic posting"
feature for the truly TRULY trivial patches ("i found a spelling
error in a comment") that just don't merit discussion

>  - Linus should reject most out-of-band patches if this is to work.

Yes, but you should be able to select people other than Linus that
they want the patch sent to (and probably different mailing lists you
want the patch discussed on).  That way this can scale to the various
maintainers.  For instance if I haved a great new networking feature
I could have the system send it to davem (assuming that he signs
up for it - I'm just using him as an example) and discussion of it
on netdev.  David decides that he likes the patch, and he sends
a (GPG/PGP signed) message to the server indicating this.  Now the
patch goes into Linus's queue (and lkml) with a note indicating
that davem has blessed it.

Of course, other developers with public keys on file could also tag it
so that if a patch touches several different subsystems it can get
"signed off on" by the involved maintainers before Linus has to make
a call.

-Mitch
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: Proposal: Linux Kernel Patch Management System

2000-09-13 Thread Mitchell Blank Jr

Daniel Quinlan wrote:
   "Version" - the base kernel version.  For example, "2.4.0-test8-pre1".
   The web page will list valid version strings.

Ideally this should be overridable for patches marked experimental, since
they might be to some modified kernel.  Of course you wouldn't do an
test for applyability :-)

Linus wants the body of patches to be in text format and not
MIME-encoded or uuencoded.

Well, since it'll be machine-parsed anyway, can't we just fix that up?

Good patches are sent to the linux-kernel mailing list which is
also where additional discussion about these patches can take
place.

...and the patch-id should go in the Subject line i.e.
Subject: [PATCHID: KP3004] fix for video card catching fire
Then the web interface to the patch database could also have
the archived discussion available right there

Also there should be some flag to turn off the "automatic posting"
feature for the truly TRULY trivial patches ("i found a spelling
error in a comment") that just don't merit discussion

  - Linus should reject most out-of-band patches if this is to work.

Yes, but you should be able to select people other than Linus that
they want the patch sent to (and probably different mailing lists you
want the patch discussed on).  That way this can scale to the various
maintainers.  For instance if I haved a great new networking feature
I could have the system send it to davem (assuming that he signs
up for it - I'm just using him as an example) and discussion of it
on netdev.  David decides that he likes the patch, and he sends
a (GPG/PGP signed) message to the server indicating this.  Now the
patch goes into Linus's queue (and lkml) with a note indicating
that davem has blessed it.

Of course, other developers with public keys on file could also tag it
so that if a patch touches several different subsystems it can get
"signed off on" by the involved maintainers before Linus has to make
a call.

-Mitch
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: Proposal: Linux Kernel Patch Management System

2000-09-13 Thread Mitchell Blank Jr

Alan Cox wrote:
 Humans will generally get a weird report from sending Linus a message wonder
 what all this field stuff is and mail someone else instead. 

If they're able to create a patch, hopefully they'd be able to fill in
a simple email template (and I've seen some pretty dim folks manage to
register domains with InterNIC, so email templates aren't that hard :-)

We could even have a simple web page where you check a few boxes and
fill in "URL to patch" and hit submit.

 The thing that needs the most thinking about is how to handle patches that
 don't have all this id crap and stuff on them.

Like any tracking system, the suceess or failure of its rollout depends
completely on whether Linus et al will be steadfast enough to refuse
to look at any patch that hasn't gone through the system.

It's just like when you're converting a helpdesk to use a ticket tracking
system.  You need to teach the tech people that when they run into
someone in the hallway that says "hey, my printer is broken, come fix it"
they they need to reply "you must be mistaken - if your printer were
broken you would have filed a trouble ticket... since I have no trouble
ticket your printer must be fine".  People catch on quick.

-Mitch
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: Proposal: Linux Kernel Patch Management System

2000-09-13 Thread Mitchell Blank Jr

Alexander Viro wrote:
 BTW, any bug reports starting with "kernel is x.y.z + FOO42069 + K314 +
 long list of patches" will be cheerfully flushed down the toilet here,
 no matter what system of dependencies is going to be in place.

Yes, for the stuff discussed on lkml patch dependencies should be
pretty minimal.  However, if I were discussing something on linux-m68k
it would be common to say "kernel is 2.5.18 + m68k-native-patch-2.5.18 +
mac68k-patch-2.5.18"

-Mitch
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: (reiserfs) Re: More on 2.2.18pre2aa2

2000-09-13 Thread Mitchell Blank Jr

James Sutherland wrote:
 In terms of latency, I'd suggest we aim to keep the device in use all the
 time we have outstanding requests: every time the device is ready to
 accept a request, we feed it the "next" one in the queue; until it is free
 again, requests pile up in the queue, being sorted, merged etc.
 
 This should perform very well under light loads - requests just get passed
 to the device as fast as they can be handled - and under very heavy loads
 - we have a large queue in play, with plenty of scope for reordering,
 merges etc.

The "large queue" goes against the whole point of this exercise - that
is that if there are many items in the "queue" being sorted then
unlucky requests can end up waiting a long time to get serviced.
You want to have some reasonable break-off point where you just
decide to make an elevator swipe in order to avoid starving those
requests.

Keep in mind that I'm using the word "queue" in a confusing manner
here - we're talking about how much we'll try to elevator-sort in
a go, not the actual queue of all requests.  I've been calling this
a queue because from a performance standpoint that's what it is,
so I think it helps to think of the problem that way.

Oh yeah one other thing about huge elevator queues to think about
(although its not too pertinant these days).  A friend of mine had
a job 15 years or so ago to working on a commercial UNIX flavor.
They were having some severe I/O performance problems under high
loads.  The problem was that they were trying to sort and merge
all the pending requests.  When I/O was heavy this ended up
chewing all the CPU time and the kernel wouldn't actually be
able to keep up sending requests to the disk :-)  However, keep
in mind that this was int the bad-old-days when "hard drive"
meant a rack-mount Eagle and the elevator was expected to consider
things like head positioning and rotational speed when doing
a sort...

-Mitch
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: (reiserfs) Re: More on 2.2.18pre2aa2

2000-09-12 Thread Mitchell Blank Jr

Alan Cox wrote:
> > time, but remember that there are two things measured in time here:
> >   A. The time for the whole queue of requests to run (this is what Rik is
> >  proposing using to throttle)
> >   B. The time an average request takes to process.
> 
> Your perceived latency is based entirely on A.

Yes, but "how hard is it reasonable for the kernel to try" is based on
both items.  A good first order approximation is number of requests.

> > If we limit on the depth of queue we're (to some level of approximation)
> > making our decision based on A/B.  It's still a magic constant, but at
> 
> I dont suggest you do queue limiting on that basis. I suggest you do order
> limiting based on time slots

It's still a queue - the queue of things we're going to take on this
elevator swipe, right?  And the problem is one of keeping a sane
watermark on this queue - not too many requests to destroy latency
but enough to let the elevator do some good.

> > Well, actually just about any communications protocol worth its salt
> > uses some sort of windowing throttle based on the amount of data
> 
> Im talking about flow control/traffic shaping

...where the user sets a number exlpicitly for what performance they
want.  Again, if we're going to make the user set this latency
variable for each of their devices, then doing it based on time will
work great.

> > There's too many orders of magnatude difference between even just SCSI
> > disks (10 yr old drive?  16-way RAID?  Solid state?) to make
> > supplying any sort of default with the kernel impractical.  The end
> 
> The same argument is equally valid for the current scheme, and I think you'll
> find equally bogus

There will always need to be tunables - and it's fine to say "if you've
got oddball hardware and/or workload and/or requirements then you should
twiddle this knob".  But it seems to me that the current scheme works
well for a pretty big range of devices.  If you do the setting based
on time, I think it'll be a lot more sensitive since there's nothing
that will scale based on the speed of the device.

-Mitch
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: (reiserfs) Re: More on 2.2.18pre2aa2

2000-09-12 Thread Mitchell Blank Jr

Alan Cox wrote:
> > Now, I see people trying to introduce the concept of elapsed time into
> > that fix, which smells strongly of hack. How will this hack be cobbled
> 
> Actually my brain says that elapsed time based scheduling is the right thing
> to do.

No, Andrea is right here.  The argument that everyone is using ("Our target -
latency - is measured in time") is utterly bogus.  Yes, it's measured in
time, but remember that there are two things measured in time here:
  A. The time for the whole queue of requests to run (this is what Rik is
 proposing using to throttle)
  B. The time an average request takes to process.

If we limit on the depth of queue we're (to some level of approximation)
making our decision based on A/B.  It's still a magic constant, but at
least it's scaled to take into account the speed of the drive.  And
underneath, it's still based on time.

> It certainly works for networks

Well, actually just about any communications protocol worth its salt
uses some sort of windowing throttle based on the amount of data
outstanding, not the length of time it's been in the queue.  Which
is why TCP works well over both GigE and 28.8. [*]  Now substitute
"big fiberchannel RAID" for GigE and "360K floppy" for 28.8 and
you've got the same problem.

*  -- Yes, for optimal TCP over big WAN pipes you may want to use a
  larger buffer size, but that's a matter of the bandwidth
  delay product, which isn't relavent for talking about storage

If we move to a "length of queue in time" as Rik suggests then we're
going to have to MAKE the user set it manually for each device.
There's too many orders of magnatude difference between even just SCSI
disks (10 yr old drive?  16-way RAID?  Solid state?) to make
supplying any sort of default with the kernel impractical.  The end
result might be a bit better behaved, but only just slightly.
If people absolutely need this behavior for some reason, the current
algorithm should stay as the default.

-Mitch
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: (reiserfs) Re: More on 2.2.18pre2aa2

2000-09-12 Thread Mitchell Blank Jr

Alan Cox wrote:
  Now, I see people trying to introduce the concept of elapsed time into
  that fix, which smells strongly of hack. How will this hack be cobbled
 
 Actually my brain says that elapsed time based scheduling is the right thing
 to do.

No, Andrea is right here.  The argument that everyone is using ("Our target -
latency - is measured in time") is utterly bogus.  Yes, it's measured in
time, but remember that there are two things measured in time here:
  A. The time for the whole queue of requests to run (this is what Rik is
 proposing using to throttle)
  B. The time an average request takes to process.

If we limit on the depth of queue we're (to some level of approximation)
making our decision based on A/B.  It's still a magic constant, but at
least it's scaled to take into account the speed of the drive.  And
underneath, it's still based on time.

 It certainly works for networks

Well, actually just about any communications protocol worth its salt
uses some sort of windowing throttle based on the amount of data
outstanding, not the length of time it's been in the queue.  Which
is why TCP works well over both GigE and 28.8. [*]  Now substitute
"big fiberchannel RAID" for GigE and "360K floppy" for 28.8 and
you've got the same problem.

*  -- Yes, for optimal TCP over big WAN pipes you may want to use a
  larger buffer size, but that's a matter of the bandwidth
  delay product, which isn't relavent for talking about storage

If we move to a "length of queue in time" as Rik suggests then we're
going to have to MAKE the user set it manually for each device.
There's too many orders of magnatude difference between even just SCSI
disks (10 yr old drive?  16-way RAID?  Solid state?) to make
supplying any sort of default with the kernel impractical.  The end
result might be a bit better behaved, but only just slightly.
If people absolutely need this behavior for some reason, the current
algorithm should stay as the default.

-Mitch
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: (reiserfs) Re: More on 2.2.18pre2aa2

2000-09-12 Thread Mitchell Blank Jr

Alan Cox wrote:
  time, but remember that there are two things measured in time here:
A. The time for the whole queue of requests to run (this is what Rik is
   proposing using to throttle)
B. The time an average request takes to process.
 
 Your perceived latency is based entirely on A.

Yes, but "how hard is it reasonable for the kernel to try" is based on
both items.  A good first order approximation is number of requests.

  If we limit on the depth of queue we're (to some level of approximation)
  making our decision based on A/B.  It's still a magic constant, but at
 
 I dont suggest you do queue limiting on that basis. I suggest you do order
 limiting based on time slots

It's still a queue - the queue of things we're going to take on this
elevator swipe, right?  And the problem is one of keeping a sane
watermark on this queue - not too many requests to destroy latency
but enough to let the elevator do some good.

  Well, actually just about any communications protocol worth its salt
  uses some sort of windowing throttle based on the amount of data
 
 Im talking about flow control/traffic shaping

...where the user sets a number exlpicitly for what performance they
want.  Again, if we're going to make the user set this latency
variable for each of their devices, then doing it based on time will
work great.

  There's too many orders of magnatude difference between even just SCSI
  disks (10 yr old drive?  16-way RAID?  Solid state?) to make
  supplying any sort of default with the kernel impractical.  The end
 
 The same argument is equally valid for the current scheme, and I think you'll
 find equally bogus

There will always need to be tunables - and it's fine to say "if you've
got oddball hardware and/or workload and/or requirements then you should
twiddle this knob".  But it seems to me that the current scheme works
well for a pretty big range of devices.  If you do the setting based
on time, I think it'll be a lot more sensitive since there's nothing
that will scale based on the speed of the device.

-Mitch
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/