Calling userspace apps from within kernel

2000-09-25 Thread Daniel Walls

Hi,

I was wondering if it is possible to execute a userspace application from 
within the kernel (particularly binfmt_elf.c)...
something along the lines of execl()...

If so, what is the name of the function used to do this?

*an aside: It would be very useful for newbies like myself to have a list 
of functions that can be used in the kernel Does this exist at all?

Thanks very much for your timeCould you please reply to my address as I 
am not subscribed to the list.

dan





-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: Interrupt sharing

2000-09-25 Thread Mahadev K Cholachagudda


- Original Message -
From: "Jeff Garzik" <[EMAIL PROTECTED]>
To: "Mahadev K Cholachagudda" <[EMAIL PROTECTED]>
Cc: <[EMAIL PROTECTED]>
Sent: Monday, September 25, 2000 7:00 PM
Subject: Re: Interrupt sharing


>
>
> On Mon, 25 Sep 2000, Mahadev K Cholachagudda wrote:
>
> > Hello to all,
> >
> > I have one doubt and is as below.
> >
> >
> > Suppose say the two drivers driver1 and driver2 will install the ISR for
a
> > particular interrupt, say UART0.
> > After some time the interrupt is generated. At this moment, which
driver's
> > ISR is going to execute ?.
> >
> > If driver1 ISR is get executed, will the driver2's ISR is going to
execute
> > ?. If say driver2's ISR is going to execute, Is the data that interrupt
> > generated is going to be emulated to the driver2's ISR.
>
> When an interrupt is delivered, the kernel calls ALL interrupt handlers
> registered for that interrupt.  That means all drivers capable of
> sharing interrupts should, ideally, have code in their interrupt handler
> to exit ASAP if no work is necessary.
>
> status = RTL_R16(IntrStatus);
> /* exit ASAP if no interrupt conditions (0), or
> * if the hardware was unplugged (0x)
> */
> if ((status == 0) || (status == 0x))
> return;
>

If this is the case, if i am not wrong the data which is to be delivered to
the ISR will be emulated to the each drivers ISR. Is it true ?




_
Do You Yahoo!?
Get your free @yahoo.com address at http://mail.yahoo.com

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: test9-pre7

2000-09-25 Thread Steven Cole

> Udo A Steinberg wrote:
> There's a little annoying bug with printing partitions upon bootup.

before the patch:

kernel: <6>Partition check:
kernel: <6> hda: hda1 hda1 hda2 hda2 < hda5 hda5 hda6 hda6 hda7 hda7 hda8 
hda8 hda9 hda9 >
kernel: <6> hdb: hdb1 hdb1 hdb2 hdb2 hdb3 hdb3 hdb4 hdb4

after the patch:

kernel: <6>Partition check:
kernel: <6> hda: hda1 hda2 < hda5 hda6 hda7 hda8 hda9 >
kernel: <6> hdb: hdb1 hdb2 hdb3 hdb4

Looks like the patch worked.  Running this version of 2.4.0-test9-pre7 now.

Regards,

Steven Cole
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: how interesting are data->bss patches?

2000-09-25 Thread Keith Owens

On Mon, 25 Sep 2000 21:20:00 -0500 (CDT), 
Peter Samuelson <[EMAIL PROTECTED]> wrote:
>Cool stuff!  I thought about using basically the same approach, but I
>wasn't sure if binutils was up to the job.  I didn't know about readelf
>(well, I'd read about it in the 2.10 announcement, but I didn't know
>what it could do).  Basically readelf is objdump on steroids, right?

Correct.

>Now, is there a way to extract filenames and line numbers off those
>symbols, if you build with -g?

readelf -w dumps debug info but only if you compile with dwarf, -g
alone is not enough.  Objects compiled with dwarf are a lot bigger and
the readelf -w output is messy to parse, it is probably not justified.
For a process that is not going to run very often, using etags, cscope,
sourcenavigator, lxr or other such tool to find the variables is
acceptable (says he who will not be doing the work ;).

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



2.4.x vgacon causes SMP deadlock

2000-09-25 Thread Keith Owens

Resend, no response to first mail.

2.4.0-test9-pre5, although this has existed since at least 2.4.0-test1.
VT console on vga.  printk -> vt_console_print -> hide_cursor ->
vgacons_cursor -> write_vga -> cli -> __global_cli -> get_irqlock ->
wait_on_irq -> show -> printk -> SMP deadlock!

I hit this on an SMP box, one cpu was dead and inside an interrupt.
The other cpu tried to print to the console, ran down that chain, timed
out in wait_on_irq and tried to print.  It was big red button time.

Not only is this path *slow* but it introduces a nasty deadlock
condition into printk.  We cannot rely on printk to get diagnostics for
SMP hangs.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: [PATCH: Final ] WAS (Re: [PATCH/RFC] (long) network interfacechanges

2000-09-25 Thread jamal


shit, i forgot this small addendum to the patch.
Attached.

cheers,
jamal



--- linux/net/core/sysctl_net_core.c.orig   Wed Feb  9 23:08:09 2000
+++ linux/net/core/sysctl_net_core.cTue Sep 12 20:06:15 2000
@@ -12,6 +12,10 @@
 #ifdef CONFIG_SYSCTL
 
 extern int netdev_max_backlog;
+extern int no_cong_thresh;
+extern int no_cong;
+extern int lo_cong;
+extern int mod_cong;
 extern int netdev_fastroute;
 extern int net_msg_cost;
 extern int net_msg_burst;
@@ -41,6 +45,18 @@
 _dointvec},
{NET_CORE_MAX_BACKLOG, "netdev_max_backlog",
 _max_backlog, sizeof(int), 0644, NULL,
+_dointvec},
+   {NET_CORE_MAX_BACKLOG, "no_cong_thresh",
+_cong, sizeof(int), 0644, NULL,
+_dointvec},
+   {NET_CORE_MAX_BACKLOG, "no_cong",
+_cong, sizeof(int), 0644, NULL,
+_dointvec},
+   {NET_CORE_MAX_BACKLOG, "lo_cong",
+_cong, sizeof(int), 0644, NULL,
+_dointvec},
+   {NET_CORE_MAX_BACKLOG, "mod_cong",
+_cong, sizeof(int), 0644, NULL,
 _dointvec},
 #ifdef CONFIG_NET_FASTROUTE
{NET_CORE_FASTROUTE, "netdev_fastroute",



Re: how interesting are data->bss patches?

2000-09-25 Thread Peter Samuelson


[kaos]
> Got bored, wrote some Perl.

Cool stuff!  I thought about using basically the same approach, but I
wasn't sure if binutils was up to the job.  I didn't know about readelf
(well, I'd read about it in the 2.10 announcement, but I didn't know
what it could do).  Basically readelf is objdump on steroids, right?

Now, is there a way to extract filenames and line numbers off those
symbols, if you build with -g?

Peter
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



[PATCH: Final ] WAS (Re: [PATCH/RFC] (long) network interface changes

2000-09-25 Thread jamal


Dave,

Final patch with feedback from Henner to change the naming convention of
the return codes. Clean it up, polish it, junk it etc.

I'd like also to send you a large patch or a series of patches to use the
NET_RX_* codes by the protocols. eg patch:


--- ip_input.c  2000/09/23 12:48:56 1.1
+++ ip_input.c  2000/09/23 12:52:52
@@ -341,7 +341,7 @@
 
skb = skb_cow(skb, skb_headroom(skb));
if (skb == NULL)
-   return 0;
+   return NET_RX_DROP;
iph = skb->nh.iph;
 
skb->ip_summed = 0;
@@ -372,7 +372,7 @@
IP_INC_STATS_BH(IpInHdrErrors);
 drop:
 kfree_skb(skb);
-return(0);
+return NET_RX_DROP;
 }
 
 /*
@@ -429,6 +429,6 @@
 drop:
 kfree_skb(skb);
 out:
-return(0);
+return NET_RX_DROP;
 }
 
--

I realize nobody is using those codes but they would be useful and will
enforce consistency.

cheers,
jamal


--- linux/net/core/dev.c.orig   Thu Sep  7 11:32:01 2000
+++ linux/net/core/dev.cTue Sep 12 20:02:20 2000
@@ -59,6 +59,8 @@
  * Paul Rusty Russell  :   SIOCSIFNAME
  *  Pekka Riikonen  :  Netdev boot-time settings code
  *  Andrew Morton   :   Make unregister_netdevice wait indefinitely 
on dev->refcnt
+ * J Hadi Salim:   - Backlog queue sampling
+ * - netif_rx() feedback   
  */
 
 #include 
@@ -97,6 +99,9 @@
 extern int plip_init(void);
 #endif
 
+/*
+#define RAND_LIE 1
+*/
 NET_PROFILE_DEFINE(dev_queue_xmit)
 NET_PROFILE_DEFINE(softnet_process)
 
@@ -133,6 +138,11 @@
 static struct packet_type *ptype_base[16]; /* 16 way hashed list */
 static struct packet_type *ptype_all = NULL;   /* Taps */
 
+#ifdef offline_sample
+static void sample_queue(unsigned long dummy);
+static struct timer_list samp_timer = { { NULL, NULL }, 0, 0, _queue };
+#endif
+
 /*
  * Our notifier list
  */
@@ -951,12 +961,25 @@
   ===*/
 
 int netdev_max_backlog = 300;
+/* these numbers are selected based on intuition and some
+experimentatiom, if you have more scientific way of doing this
+please go ahead and fix things
+*/
+int no_cong_thresh=10;
+int no_cong=20;
+int lo_cong=100;
+int mod_cong=290;
+
+
 
 struct netif_rx_stats netdev_rx_stat[NR_CPUS];
 
 
 #ifdef CONFIG_NET_HW_FLOWCONTROL
+/*
 static atomic_t netdev_dropping = ATOMIC_INIT(0);
+*/
+atomic_t netdev_dropping = ATOMIC_INIT(0);
 static unsigned long netdev_fc_mask = 1;
 unsigned long netdev_fc_xoff = 0;
 spinlock_t netdev_fc_lock = SPIN_LOCK_UNLOCKED;
@@ -1014,6 +1037,56 @@
 }
 #endif
 
+static void get_sample_stats(int cpu)
+{
+#ifdef RAND_LIE
+   unsigned long rd;
+   int rq;
+#endif
+   int blog=softnet_data[cpu].input_pkt_queue.qlen;
+   int avg_blog=softnet_data[cpu].avg_blog;
+
+   avg_blog=(avg_blog >> 1)+ (blog>>1);
+
+   if (avg_blog > mod_cong) {
+/*  above moderate congestion levels */
+   softnet_data[cpu].cng_level= NET_RX_CN_HIGH;
+#ifdef RAND_LIE
+   rd=net_random();
+   rq=rd% netdev_max_backlog;
+   if (rq < avg_blog) /* unlucky bastard */
+   softnet_data[cpu].cng_level=NET_RX_DROP;
+#endif
+   } else if (avg_blog > lo_cong) {
+   softnet_data[cpu].cng_level= NET_RX_CN_MOD;
+#ifdef RAND_LIE
+   rd=net_random();
+   rq=rd% netdev_max_backlog;
+   if (rq < avg_blog) /* unlucky bastard */
+   softnet_data[cpu].cng_level=NET_RX_CN_HIGH;
+#endif
+   } else if (avg_blog > no_cong) 
+   softnet_data[cpu].cng_level= NET_RX_CN_LOW;
+   else  /* no congestion */
+   softnet_data[cpu].cng_level=NET_RX_SUCCESS;
+
+   softnet_data[cpu].avg_blog=avg_blog;
+
+}
+
+#ifdef offline_samp
+static void sample_queue(unsigned long dummy)
+{
+/* 10 ms 0r 1ms -- i dont care -- JHS */
+   int next_tick=1;
+   int cpu=smp_processor_id();
+   get_sample_stats(cpu);
+   next_tick+=jiffies;
+   mod_timer(_timer, next_tick);
+}
+#endif
+
+
 /**
  * netif_rx-   post buffer to the network code
  * @skb: buffer to post
@@ -1022,9 +1095,18 @@
  * the upper (protocol) levels to process.  It always succeeds. The buffer
  * may be dropped during processing for congestion control or by the 
  * protocol layers.
+ *  
+ * return values:
+ * NET_RX_SUCCESS  (no congestion)   
+ * NET_RX_CN_LOW (low congestion) 
+ * NET_RX_CN_MOD (moderate congestion)
+ * NET_RX_CN_HIGH(high congestion) 
+ * 

Re: test9-pre7

2000-09-25 Thread Juan J. Quintela

> "udo" == Udo A Steinberg <[EMAIL PROTECTED]> writes:
udo> There's a little annoying bug with printing partitions upon bootup.
udo> Specifically my dmesg now looks like:

udo> Partition check:
udo>  hda: hda1 hda1
udo>  hdb: hdb1 hdb1 hdb2 hdb2 hdb3 hdb3 < hdb5 hdb5 hdb6 hdb6 hdb7 hdb7 hdb8 hdb8 hd
udo> b9 hdb9 hdb10 hdb10 >

udo> The following attached patch should fix this.
udo> I'd be happy if someone could verify that it's correct
udo> (seeing that it's past 3am here).

It works nicely here.

Thanks, Juan. 

-- 
In theory, practice and theory are the same, but in practice they 
are different -- Larry McVoy
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: [PATCH/RFC] (long) network interface changes

2000-09-25 Thread jamal



On Mon, 25 Sep 2000, Henner Eisen wrote:

> > "jamal" == jamal  <[EMAIL PROTECTED]> writes:
> jamal> So i would prefer to leave this turned off. Infact i was
> jamal> hoping to take it off for the final code submission. If you
> jamal> insist, it could be left there and enabled during
> 
> No, I don´t insist. This was just some brain-storming ;).
> 

I'll just leave it in but disable it. Davem may choose to remove it.
It is the cheapest solution so far that has been experimented with.
Rusty had some interesting ideas on this as well ...

> I think setting CN bits appropriatlely is the task of the upper
> layer protocol anyway. The only thing is that the upper layers need
> to know whether the input queue is congested. Maybe the netif_rx() code
> could set an skb->rx_congested bit when it delivers packets to the
> upper layer while the backlog queue is in congested state. (Maybe not
> really necessary, the upper layer could also query the congestion state
> by atomic_read(_dropping)).

Interesting and quiet applicable to Frame Relay, for example; what do you
see other apps/protos using it for? Would TCP for example delay ACKs or
delay delivery of data? 
Actually, the more i think about it i realize that the rx_congested mark
might not be an accurate reflection of the situation when the skb gets to
the protocol/app. It will depend on when the rx softirq gets scheduled by
which time the information might not be accurate anymore.
It will probably also not make much sense in SMP (where one congested CPU
might not necessarily mean the rest are congested).

BTW, I checked the return values of the protocols which you had
concerns with and i believe there are not reflective of the
situation and there are a few semantical issues
eg ip_rcv()
on error returns 0. 0 is generally refered to as 'success' in other parts
of the code.
the good news is that no-code cares about/checks these return values.

I'll submit the current patch and maybe later submit another where the
protocols use things like NET_RX_SUCCESS etc

cheers,
jamal

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: test9-pre7

2000-09-25 Thread Udo A. Steinberg

Linus Torvalds wrote:

> 
> VM balacing fixes, sound should work again, and a lot of small details.
> 
> Linus
>  - pre7:
> - official Compaq CISS driver.

There's a little annoying bug with printing partitions upon bootup.
Specifically my dmesg now looks like:

Partition check:
 hda: hda1 hda1
 hdb: hdb1 hdb1 hdb2 hdb2 hdb3 hdb3 < hdb5 hdb5 hdb6 hdb6 hdb7 hdb7 hdb8 hdb8 hd
b9 hdb9 hdb10 hdb10 >

The following attached patch should fix this.
I'd be happy if someone could verify that it's correct
(seeing that it's past 3am here).

Cheers,
Udo

--- linux/fs/partitions/check.c Tue Sep 26 03:32:45 2000
+++ patched/fs/partitions/check.c   Tue Sep 26 03:31:47 2000
@@ -187,14 +187,11 @@
 #ifdef CONFIG_DEVFS_FS
printk(" p%d", (minor & ((1 << hd->minor_shift) - 1)));
 #else
-   if (hd->major >= COMPAQ_SMART2_MAJOR+0 && hd->major <= COMPAQ_SMART2_MAJOR+7)
+   if ((hd->major >= COMPAQ_SMART2_MAJOR+0 && hd->major <= COMPAQ_SMART2_MAJOR+7) 
+||
+   (hd->major >= COMPAQ_CISS_MAJOR+0 && hd->major <= COMPAQ_CISS_MAJOR+7))
printk(" p%d", (minor & ((1 << hd->minor_shift) - 1)));
else
printk(" %s", disk_name(hd, minor, buf));
-   if (hd->major >= COMPAQ_CISS_MAJOR+0 && hd->major <= COMPAQ_CISS_MAJOR+7)
-printk(" p%d", (minor & ((1 << hd->minor_shift) - 1)));
-else
-printk(" %s", disk_name(hd, minor, buf));
 #endif
 }
 



Re: 1023rd thread crashes 2.4.0-test8 from non-root user

2000-09-25 Thread Ted Deppner

On Mon, Sep 25, 2000 at 10:33:06AM +0200, Ingo Molnar wrote:
> On Mon, 25 Sep 2000, Ted Deppner wrote:
> 
> > I ask because on my perl-threads test case, I can't create more than 1023
> > threads, but I get a kernel crash when I've _attempted_ to create more
> > than 1023 and hit ctrl-c.
> 
> could you test this with the kernel/signal.c:max_queued_signals
> initialization change i suggested? Does it still crash?

With max_queued_signals=4096, I can still only create 1022 threads under
perl-5.005-threads.  

With more than 1023 threads the process no longer responds to ctrl-c, or a
kill -INT on it.  A kill -9 will kill it however with no kernel lockup.

Under 1023 threads the process responds to ctrl-c.

It seems like the bug is definately involved in signal handling, and that
max_queued_signals affects it in some way...


My ulimit -a from bash... you can see open files at 1024, but I'm not
doing open files stuff in my test program (threadcrash.pl).

core file size (blocks) 0
data seg size (kbytes)  unlimited
file size (blocks)  unlimited
max locked memory (kbytes)  unlimited
max memory size (kbytes)unlimited
open files  1024
pipe size (512 bytes)   8
stack size (kbytes) 8192
cpu time (seconds)  unlimited
max user processes  4093
virtual memory (kbytes) unlimited

I upped my open-files to 2048 and still was unable to get more than 1022
threads running.  I wonder if perl-5.005-threads might have a static limit
set somewhere inside it.  Maybe I'll try to recompile it tonight and see
what happens.

-- 
Ted Deppner
http://www.psyber.com/~ted/

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: some sound-related oops'es

2000-09-25 Thread Andrew McNamara

>I get some oops whenever I try to insmod sb
[...]
>Sep 25 14:08:35 penny kernel: sb: No ISAPnP cards found, trying standard ones...
>Sep 25 14:08:35 penny kernel: SB 4.13 detected OK (220)
>Sep 25 14:08:35 penny kernel:  at 0x220 irq 5 dma 1,5
>Sep 25 14:08:35 penny kernel:  at 0x330 irq 5 dma 0,0
>Sep 25 14:08:35 penny kernel: sb: I/O region in use.
>Sep 25 14:08:35 penny kernel: Sound: Hmm, DMA1 was left allocated - fixed
>Sep 25 14:08:35 penny kernel: Sound: Hmm, DMA5 was left allocated - fixed
>Sep 25 14:08:35 penny insmod: /lib/modules/2.4.0-test8/kernel/drivers/sound/sb.o: 
>init_module: No such device
>Sep 25 14:08:35 penny insmod: Hint: insmod errors can be caused by incorrect module 
>parameters, including invalid IO or IRQ parameters
>Sep 25 14:08:35 penny insmod: /lib/modules/2.4.0-test8/kernel/drivers/sound/sb.o: 
>insmod char-major-14 failed
>Sep 25 14:08:35 penny kernel: Unable to handle kernel paging request at virtual 
>address ca8fc1a0

I was also getting this with 2.4.0-test9-pre6. It looks to me like the
soundblaster driver was tripping over it's own feet (finding 0x220,
registering it, trying to register it again, then bailing because
"someone" already had it registered. It doesn't bail cleanly - leaving
lots of stuff still allocated, which appears to be the cause of the
oops.

I was eventually able to load the module cleanly by specifying:

insmod sb isapnp=0 multiple=0 io=0x220 irq=5 dma=1 dma16=5 mpu_io=0x330

In particular, the "multiple=0" got it to load without the "sb: I/O
region in use".

 ---
Andrew McNamara (System Architect)

connect.com.au Pty Ltd
Lvl 3, 213 Miller St, North Sydney, NSW 2060, Australia
Phone: +61 2 9409 2117, Fax: +61 2 9409 2111
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



[patch] net/ipv4/arp.c fixes IP/HW type collision

2000-09-25 Thread David Ford

Ok, no complaints about the patch, it's simple, has been looked at and
tested.  This patch against current kernel trees fixes two things:

- n.n.n.n0xNN whitespace collision in /proc/net/arp and
- removes the sprintf formatting for %s, "*" on the arp mask which is no
longer used nor will be used here

-d

--
  "There is a natural aristocracy among men. The grounds of this are
  virtue and talents", Thomas Jefferson [1742-1826], 3rd US President




--- net/ipv4/arp.c.orig Fri Aug  4 18:18:49 2000
+++ net/ipv4/arp.c  Sat Sep 23 12:47:03 2000
@@ -65,6 +65,7 @@
  * clean up the APFDDI & gen. FDDI bits.
  * Alexey Kuznetsov:   new arp state machine;
  * now it is in net/core/neighbour.c.
+ * David Ford  :   More fixes cleaning up the proc output
  */
 
 /* RFC1122 Status:
@@ -1025,6 +1026,7 @@
char hbuffer[HBUFFERLEN];
int i,j,k;
const char hexbuf[] =  "0123456789ABCDEF";
+   char abuf[16];
 
size = sprintf(buffer,"IP address   HW type Flags   HW address 
   Mask Device\n");
 
@@ -1063,20 +1065,15 @@
}
 #endif
 
-   {
-   char tbuf[16];
-   sprintf(tbuf, "%u.%u.%u.%u", 
NIPQUAD(*(u32*)n->primary_key));
-
-   size = sprintf(buffer+len, "%-16s 0x%-10x0x%-10x%s",
-   tbuf,
-   hatype,
-   arp_state_to_flags(n), 
-   hbuffer);
-
-   size += sprintf(buffer+len+size,
-" %-8s %s\n",
-"*", dev->name);
-   }
+   size = sprintf(buffer+len, "%-16s 0x%-10x0x%-10x%s",
+   in_ntoa2(*(u32*)n->primary_key, abuf),
+   hatype,
+   arp_state_to_flags(n),
+   hbuffer);
+
+   size += sprintf(buffer+len+size,
+   " *%-16s\n",
+   dev->name);
 
read_unlock(>lock);
 
@@ -1099,15 +1096,15 @@
struct net_device *dev = n->dev;
int hatype = dev ? dev->type : 0;
 
-   size = sprintf(buffer+len,
-   "%u.%u.%u.%u0x%-10x0x%-10x%s",
-   NIPQUAD(*(u32*)n->key),
+   size = sprintf(buffer+len, "%-16s 0x%-10x0x%-10x%s",
+   in_ntoa2(*(u32*)n->key, abuf),
hatype,
ATF_PUBL|ATF_PERM,
"00:00:00:00:00:00");
+
size += sprintf(buffer+len+size,
-" %-17s %s\n",
-"*", dev ? dev->name : "*");
+   " *%-16s\n",
+   dev ? dev->name : "*");
 
len += size;
pos += size;


begin:vcard 
n:Ford;David
x-mozilla-html:TRUE
org:http://www.kalifornia.com/images/paradise.jpg">
adr:;;
version:2.1
email;internet:[EMAIL PROTECTED]
title:Blue Labs Developer
x-mozilla-cpt:;28256
fn:David Ford
end:vcard



test9-pre7

2000-09-25 Thread Linus Torvalds


VM balacing fixes, sound should work again, and a lot of small details.

Linus

-
 - pre1:
- USB: OHCI controller unlink and bandwidth reclamation fixes
- USB: storage update
- sparc64: register window race. Non-deadlock rwlocks.
- name clash in hamradio/pi2.c and hamradio/pt.c  
- epic100 credits, 8139too driver update, sr.c initcalls
- acenic update
- NFS sillyrename fixups
- mktime(). Do it just once - not 16 times.
- misc small fixes to random drivers by Tigran
- IDE driver picks up master/slave relationships on its own.
- truncate unmapped/uptodate case handled correctly
- don't do notifier locking at low level: higher levels do (or
  should do) this already. 
- ACPI interpreter updates (and file renames - making this part big)
- SysKonnect gigabit driver update
- 3c59x driver update
- pcmcia debounce logic. Ugh.
- MM balancing (Rik Riel)
 - pre2:
- "extern inline" -> "static inline".  It doesn't matter right now,
  but it's proactive for future gcc versions.
- various net drvr updates and fixes
- more initcall updates
- PPC updates (including PPC-related drivers etc)
- disallow re-mounting same filesystem in same place multiple times.
  Too confusing. And /etc/mtab gets strange.
- Riel VM update
- sparc updates
- PCI bridge scanning fix: assign numbers properly
- network updates
- scsi fixes
 - pre3:
- uninitialized == zero. Remove extra initializers.
- block_prepare_write and block_truncate_page: if the page is
  up-to-date, then so are the buffer heads inside it once they
  are mapped..
- SCSI initialization - move over to the modular case. No more
  double initialization.
- Sync up with Alans 2.2.x driver changes
- networking updates (iipv6 works non-modular etc)
- netfilter update
- adfs correct dentry operations
- ARM update (including ARM drivers)
- acenic driver update
- floppy: we'd better hold the io_request_lock when playing with "CURRENT".
- NFS cache coherency across file locking fix
- NFS over TCP - handle TCP socket writability right..
- USB updates
 - pre4:
- more USB updates
- continued SCSI cleanup
 - pre5:
- more drivers synced to Alan's 2.2.x changes
- sis900 driver update
- Andries: net device name allocation as in 2.2.x
- pmac SCSI driver init update
- ixj telephony driver fixes
- _fput/__fput are no longer used. 
- more USB updates
- codafs update
- byteorder: use statement expressions instead of macros, to avoid argument re-use.
- don't disallow Onstream ide-scsi devices
- fix cardbus bridge resources..
- Make SCSI initialization order be same as before.
 - pre6:
- Add Camino chipset ID to eepro100 driver. 
- VM UP deadlock fix
- codafs fixups
- networking updates
- USB uhci updates
- makefile documentation update
- emu10k stereo sound fix.
- new PCI ids
- more linux-2.2 driver sync-ups
- VIA IDE driver bugfixes
- update atp ISA net driver
- file locking fixes
- make the scsi-generic module work properly again.
- careful memory ordering by Andrea..
- alpha RTC year magic again..
- broadcast I/O APIC interrupt MP-tables are legal..
- samba 2.2 needs leases for efficient file sharing.  They are kind
  of like file locks with async IO notification.
- teach st.c about some SCSI tapes that really aren't SCSI tapes (OnStream)
- TUN/TAP driver: use proper device number (misc device, minor=200).
 - pre7:
- official Compaq CISS driver.
- cs4281 sound driver
- export the new lock copy/init functions
- add SGI PCI ID's.
- Ingo: clean up VM handling, improve balancing.
- fix up ac97 codec initialization
- nfsd: mark us as a O_LARGEFILE case, so that the VFS allows
  the full 64-bit access..
- shm statistics bugfix.
- sound init cleanups
- alpha cross-compile fixes..
- don't raise privileges when re-trying a failed NFS RPM request
- NFSv3 is not really really experimental any more.
- file locking deadlock detection bugfix..
- recognize the k6 model 13: it's a K6-2+ mobile processor.
- USB: remember to release the kernel lock and other updates..

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: [patch] vmfixes-2.4.0-test9-B2 - fixing deadlocks

2000-09-25 Thread Linus Torvalds



On Tue, 26 Sep 2000, Andrea Arcangeli wrote:
> 
> The machine will run low on memory as soon as I read 200mbyte from disk.

So? 

Yes, at that point we'll do the LRU dance. Then we won't be low on memory
any more, and we won't do the LRU dance any more. What's the magic in
zoneinfo that makes it not have to do the same thing?

Linus

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: the new VMt

2000-09-25 Thread Erik Andersen

On Mon Sep 25, 2000 at 02:04:19PM -0600, [EMAIL PROTECTED] wrote:
> 
> > all of the pending requests just as long as they are serialised, is
> > this a problem?
> 
> I think you are solving the wrong problem. On a small memory machine, the kernel,
> utilities, and applications should be configured to use little memory.  
> BusyBox is better than BeanCount. 
> 

Granted that smaller apps can help -- for a particular workload.  But while I
am very partial to BusyBox (in fact I am about to cut a new release) I can
assure you that OOM is easily possible even when your user space is tiny.  I do
it all the time.  There are mallocs in busybox and when under memory pressure,
the kernel still tends to fall over...

 -Erik

--
Erik B. Andersen   email:  [EMAIL PROTECTED]
--This message was written using 73% post-consumer electrons--
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



ide-disk: set_multmode?

2000-09-25 Thread Jussi Hamalainen

 hdc:hdc: set_multmode: status=0x51 { DriveReady SeekComplete Error }
hdc: set_multmode: error=0x04 { DriveStatusError }
 [PTBL] [523/255/63] hdc1 hdc2

This has been happening at least since 2.2.10. It's probably just
something cosmetic, but shouldn't it still be fixed? Running
vanilla-2.2.16 SMP on i686,

hdc: ST34321A, 4103MB w/128kB Cache, CHS=8894/15/63, (U)DMA

Here's a fix someone on this list suggested some time ago. It seems
to make the symptoms disappear, but I'm not sure wether it fixes the
actual problem. From vanilla-2.2.16:

--- ide-disk.c.orig Tue Sep 26 01:58:58 2000
+++ ide-disk.c  Tue Sep 26 02:00:36 2000
@@ -856,7 +856,7 @@
drive->mult_req = INITIAL_MULT_COUNT;
if (drive->mult_req > id->max_multsect)
drive->mult_req = id->max_multsect;
-   if (drive->mult_req || ((id->multsect_valid & 1) &&
id->multsect))
+   if (drive->mult_req && ((id->multsect_valid & 1) &&
id->multsect))
drive->special.b.set_multmode = 1;
 #else
id->multsect = ((id->max_multsect/2) >
1) ? id->max_multsect : 0;


-- 
-=[ Count Zero / TBH - Jussi Hämäläinen - email [EMAIL PROTECTED] ]=-

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: boot fails with test9-pre3 and above

2000-09-25 Thread Prasanna Narayana


Looks that the scsi changes introduced in test9-pre3
broke the functioning of "scsihosts" that can be specified from lilo.conf

In my kernel, I had ncr53c896 and aic7xxx compiled in
and was using
append = "scsihosts=ncr53c8xx"
in /etc/lilo.conf as the boot disk was on ncr controller.

Removing this line and making aic7xxx as a module allowed me to boot.

-- Prasanna
VERITAS Software

PS: Thanks to Keith Owens for pointing out the VIDEO_CHAR debugging trick.

> [EMAIL PROTECTED] (Prasanna Narayana) wrote:
> >In our Dell 8-way 1gb machine, test9-pre3 and above kernel
> >doesn't boot (test9-pre2 boots ok). I just get the message
> > Loading 2.4test8
> > Uncompressing Linux... ok, booting the kernel.
> 
> Ahh, a candidate for [DOC] Debugging early kernel hangs.  Forwarded
> separately.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: [patch] vmfixes-2.4.0-test9-B2 - fixing deadlocks

2000-09-25 Thread Andrea Arcangeli

On Mon, Sep 25, 2000 at 03:30:10PM -0700, Linus Torvalds wrote:
> On Tue, 26 Sep 2000, Andrea Arcangeli wrote:
> > 
> > I'm talking about the fact that if you have a file mmapped in 1.5G of RAM
> > test9 will waste time rolling between LRUs 384000 pages, while classzone
> > won't ever see 1 of those pages until you run low on fs cache.
> 
> What drugs are you on? Nobody looks at the LRU's until the system is low
> on memory. Sure, there's some background activity, but what are you

The system is low on memory when you run `free` and you see a value
< freepages_high*PAGE_SIZE in the "free" column first row.

> talking about? It's only when you're low on memory that _either_ approach
> starts looking at the LRU list.

The machine will run low on memory as soon as I read 200mbyte from disk.

Andrea
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: [patch] vmfixes-2.4.0-test9-B2 - fixing deadlocks

2000-09-25 Thread Andrea Arcangeli

On Tue, Sep 26, 2000 at 12:30:28AM +0200, Juan J. Quintela wrote:
> Which is completely wrong if the program uses _any not completely_
> unusual locality of reference.  Think twice about that, it is more
> probable that you need more that 300MB of filesystem cache that you
> have an aplication that references _randomly_ 1.5GB of data.  You need
> to balance that _always_ :((

The application doesn't references ramdonly 1.5GB of data. Assume
there's a big executable large 2G (and yes I know there are) and I run it.
After some hour its RSS it's 1.5G. Ok?

So now this program also shmget a 300 Mbyte shm segment.

Now this program starts reading and writing terabyte of data that
wouldn't fit in cache even if there would be 300G of ram (and
this is possible too). Or maybe the program itself uses rawio
but then you at a certain point use the machine to run a tar somewhere.

Now tell me why this program needs more than 200Mbyte of fs cache
if the kernel doesn't waste time on the mapped pages (as in
classzone).

Andrea
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: [patch] vmfixes-2.4.0-test9-B2 - fixing deadlocks

2000-09-25 Thread Rik van Riel

On Tue, 26 Sep 2000, Andrea Arcangeli wrote:
> On Mon, Sep 25, 2000 at 08:54:57PM +0100, Stephen C. Tweedie wrote:

> > basically the whole of memory is data cache, some of which is mapped
> > and some of which is not?
> 
> As as said in the last email aging on the cache is supposed to that.
> 
> Wasting CPU and incrasing the complexity of the algorithm is a price
> that I won't pay just to get the information on when it's time
> to recall swap_out().

You must be joking. Page replacement should be tuned to
do good page replacement, not just to be easy on the CPU.
(though a heavily thrashing system /is/ easy on the cpu,
I'll have to admit that)

> If the cache have no age it means I'd better throw it out instead
> of swapping/unmapping out stuff, simple?

Simple, yes. But completely BOGUS if you don't age the cache
and the mapped pages at the same rate!

If I age your pages twice as much as my pages, is it still
only fair that your pages will be swapped out first? ;)

> > anything since last time.  Anything that only ages per-pte, not
> > per-page, is simply going to die horribly under such load, and any
> 
> The aging on the fs cache is done per-page.

And the same should be done for other pages as well.
If you don't do that, you'll have big problems keeping
page replacement balanced and making the system work well
under various loads.

regards,

Rik
--
"What you're running that piece of shit Gnome?!?!"
   -- Miguel de Icaza, UKUUG 2000

http://www.conectiva.com/   http://www.surriel.com/

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: [patch] vmfixes-2.4.0-test9-B2 - fixing deadlocks

2000-09-25 Thread Andrea Arcangeli

On Mon, Sep 25, 2000 at 07:26:56PM -0300, Rik van Riel wrote:
> IMHO this is a minor issue because:

I don't think it's a minor issue.

If you don't have reschedule point in your equivalent of shrink_mmap and this
1.5G will happen to be consecutive in the lru order (quite probably if it's
been pagedin at fast rate) then you may even hang in interruptible mode for
seconds as soon as somebody start reading from disk. 2.4.x have to scale for
dozen of Giga of RAM as there are archs supporting that amount of RAM.

> 2) you don't /want/ to run low on fs cache, you want

So I can't read more than the size that the fs cache can take? I must be
allowed to do that (they're 200 Mbyte of RAM that can be more than enough
if the server mainly generate pollution anyway).

Andrea
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: [RFC] Wine speedup through kernel module

2000-09-25 Thread Alexander Viro



On Mon, 25 Sep 2000, Albert D. Cahalan wrote:

> If you'd like to live without all /proc-using tools, much of /sbin,
> the X server, inetd, anything that uses sendfile(), and anything
> that uses RT-signals for IO events... go right ahead. You can give
> up on VFS enhancements too, since anything using them wouldn't be
> portable to AIX, Ultrix, or OpenServer.
> 
> Partial list of non-portable system calls for you to abstain from:
> 
> mkdir   have fun, because this isn't portable

Non-portable to _what_? v2?

> sigaction   you know what to do...

I can live with that. Supported by quite a few Unices.

> setreuid

Encapsulate unless you want a major PITA maintaining the code.

> symlink you can use the v7 filesystem too

Same as with sigaction, even wider support.

> ioplobvious

_Yes_. It's non-portable within Linux, ferchrissake. And yes,
hardware-related work should be localized.

> clone

Can be implemented via rfork(), so that gives *BSD and Plan 9.

> setfsuiddidn't Ted come up with this one?

And you advocate using it? Wow.

> pollcan't use select either

Both are remarkably ugly. Yes, it's one of the cases when you may need
your own functions encapsulating that mess.

> prctl   comes from IRIX, not v7

And is ugly as hell, as damn next to everything coming from SGI.

> pread   SysV, isn't it? Must be bad!

I'm yet to see where it is necessary. For all practical purposes can be
emulated by lseek()+read()+lseek() and you'ld better provide such
emulation anyway if you want to have the thing portable.

> sendfilejust some benchmark hack

Bingo. Can be implemented via primitives, though.
 
> SCO has some UNIX source that might be more to your
> liking than Linux is.

Unlikely. GNU userland, as ugly as it is, is less obnoxious than
Missed'em'V one.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: [patch] vmfixes-2.4.0-test9-B2 - fixing deadlocks

2000-09-25 Thread Andrea Arcangeli

On Mon, Sep 25, 2000 at 08:54:57PM +0100, Stephen C. Tweedie wrote:
> OK, and here's another simple real life example.  A 2GB RAM machine
> running something like Oracle with a hundred client processes all
> shm-mapping the same shared memory segment.

Oracle takes the SHM locked, and it will never run on a machine without
enough memory.

> Oh, and you're also doing lots of file IO.  How on earth do you decide
> what to swap and what to page out in this sort of scenario, where
> basically the whole of memory is data cache, some of which is mapped
> and some of which is not?

As as said in the last email aging on the cache is supposed to that.

Wasting CPU and incrasing the complexity of the algorithm is a price
that I won't pay just to get the information on when it's time
to recall swap_out().

If the cache have no age it means I'd better throw it out instead
of swapping/unmapping out stuff, simple?

> anything since last time.  Anything that only ages per-pte, not
> per-page, is simply going to die horribly under such load, and any

The aging on the fs cache is done per-page.

The per-pte issue happens when we just took the difficult decision (that it was
time to swap-out) and you have the same problem because you don't know the
chain of pte that point to the physical page (so you're refresh the referenced
bit more often). Once we'll have the chain of pte pointing to the page
classzone will only need a real lru for the mapped pages to use it instead of
walking pagetables.

Andrea
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: [patch] vmfixes-2.4.0-test9-B2 - fixing deadlocks

2000-09-25 Thread Linus Torvalds



On Tue, 26 Sep 2000, Andrea Arcangeli wrote:
> 
> I'm talking about the fact that if you have a file mmapped in 1.5G of RAM
> test9 will waste time rolling between LRUs 384000 pages, while classzone
> won't ever see 1 of those pages until you run low on fs cache.

What drugs are you on? Nobody looks at the LRU's until the system is low
on memory. Sure, there's some background activity, but what are you
talking about? It's only when you're low on memory that _either_ approach
starts looking at the LRU list.

Linus

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: [patch] vmfixes-2.4.0-test9-B2 - fixing deadlocks

2000-09-25 Thread Juan J. Quintela

> "andrea" == Andrea Arcangeli <[EMAIL PROTECTED]> writes:

Hi

andrea> I'm talking about the fact that if you have a file mmapped in 1.5G of RAM
andrea> test9 will waste time rolling between LRUs 384000 pages, while classzone
andrea> won't ever see 1 of those pages until you run low on fs cache.

Which is completely wrong if the program uses _any not completely_
unusual locality of reference.  Think twice about that, it is more
probable that you need more that 300MB of filesystem cache that you
have an aplication that references _randomly_ 1.5GB of data.  You need
to balance that _always_ :((

I think that there is no silver bullet here :(

Later, Juan.

-- 
In theory, practice and theory are the same, but in practice they 
are different -- Larry McVoy
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: [patch] vmfixes-2.4.0-test9-B2 - fixing deadlocks

2000-09-25 Thread Rik van Riel

On Tue, 26 Sep 2000, Andrea Arcangeli wrote:
> On Mon, Sep 25, 2000 at 04:26:17PM -0300, Rik van Riel wrote:
> > > > It doesn't --- that is part of the design.  The vm scanner propagates
> > > 
> > > And that's the inferior part of the design IMHO.
> > 
> > Indeed, but physical page based aging is a definate
> > 2.5 thing ... ;(
> 
> I'm talking about the fact that if you have a file mmapped in
> 1.5G of RAM test9 will waste time rolling between LRUs 384000
> pages, while classzone won't ever see 1 of those pages until you
> run low on fs cache.

IMHO this is a minor issue because:
1) you need to do page replacement with shared pages
   right
2) you don't /want/ to run low on fs cache, you want
   to have a good balance between thee cache(s) and
   the processes

OTOH, if you have a way to keep fair page aging and
fix the CPU time issue at the same time, I'd love
to see it.

regards,

Rik
--
"What you're running that piece of shit Gnome?!?!"
   -- Miguel de Icaza, UKUUG 2000

http://www.conectiva.com/   http://www.surriel.com/

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: [RFC] Wine speedup through kernel module

2000-09-25 Thread Albert D. Cahalan

Alexander Viro writes:
> On Mon, 25 Sep 2000, Albert D. Cahalan wrote:

>> The list would be NULL most of the time. If Linux apps start
>> using this feature a lot, then it can be optimized.
>
> Then these apps are non-portable to other Unices and either get fixed or
> get rm'd. Period.

If you'd like to live without all /proc-using tools, much of /sbin,
the X server, inetd, anything that uses sendfile(), and anything
that uses RT-signals for IO events... go right ahead. You can give
up on VFS enhancements too, since anything using them wouldn't be
portable to AIX, Ultrix, or OpenServer.

Partial list of non-portable system calls for you to abstain from:

mkdir   have fun, because this isn't portable
sigaction   you know what to do...
setreuid
symlink you can use the v7 filesystem too
ioplobvious
clone
setfsuiddidn't Ted come up with this one?
pollcan't use select either
prctl   comes from IRIX, not v7
pread   SysV, isn't it? Must be bad!
sendfilejust some benchmark hack

SCO has some UNIX source that might be more to your
liking than Linux is.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: [patch] vmfixes-2.4.0-test9-B2 - fixing deadlocks

2000-09-25 Thread Andrea Arcangeli

On Mon, Sep 25, 2000 at 04:26:17PM -0300, Rik van Riel wrote:
> > > It doesn't --- that is part of the design.  The vm scanner propagates
> > 
> > And that's the inferior part of the design IMHO.
> 
> Indeed, but physical page based aging is a definate
> 2.5 thing ... ;(

I'm talking about the fact that if you have a file mmapped in 1.5G of RAM
test9 will waste time rolling between LRUs 384000 pages, while classzone
won't ever see 1 of those pages until you run low on fs cache.

Andrea
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



DMA related IDE error

2000-09-25 Thread Moritz Schulte

Hi,
 I'm running Linux 2.2.17+ide.2.2.17.all.2904.patch and
just saw these messages in my log files:

hda: timeout waiting for DMA
ide_dmaproc: chipset supported ide_dma_timeout func only: 14
hda: irq timeout: status=0xd0 { Busy }
hda: DMA disabled
ide0: reset: success

What do the mean, do I have to worry about them?

The system had no big load, when it happened, it just had some
(5-10?) seconds without activity. Then, it seems to continue
working without problems.

My computer's configuration:
AMD K6 III CPU, 500 MHz; 128 MB core; it has an ALI chipset
("Ali M15x3 Chipset" according to /proc/ide/ali); i have only one
hard disk: IBM-DTTA-350840 (/proc/ide/hda/model).

I need the IDE patch, for UDMA support for this chipset 
Are there any more useful information, I can give?

moritz
-- 
/* Moritz Schulte <[EMAIL PROTECTED]>
 * http://hp9001.fh-bielefeld.de/~moritz/
 * PGP-Key available, encrypted Mail is welcome.
 */
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: [patch] vmfixes-2.4.0-test9-B2

2000-09-25 Thread Andrea Arcangeli

On Mon, Sep 25, 2000 at 11:28:55PM +0200, Jens Axboe wrote:
> q->plug_device_fn(q, ...);
> list_add(...)
> generic_unplug_device(q);
> 
> would suffice in scsi_lib for now.

It looks sane to me.

Andrea
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: [patch] vmfixes-2.4.0-test9-B2

2000-09-25 Thread Andrea Arcangeli

On Mon, Sep 25, 2000 at 10:52:08PM +0200, Peter Osterlund wrote:
> Do you know why? Is it because the average seek distance becomes

Good question. No I don't know why right now. I'll try again just to be 200%
sure and I'll let you know the results.

> smaller with your algorithm? (I later realized that request merging is
> done before the elevator function kicks in, so your algorithm should

Not sure what you mean. There are two cases: the bh is merged, or
the bh will be queued in a new request because merging isn't possible.

Your change deals only with the latter case and that should be
pretty orthogonal to what the merging stuff does.

Andrea
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: 1023rd thread crashes 2.4.0-test8 from non-root user (fwd)

2000-09-25 Thread Linus Torvalds


Duh. This was a really stupid bug.

In kernel/signal.c, collect_signal(), for the case where we don't find a
siginfo block, we need to clear the signal set.

In short, add the line

sigdelset(>signal, sig);

just before the first "return 1" in collect_signal(), and all should be
well (famous last words - it's untested, but I'm sure that's it).

If I'm right, the kernel didn't properly crash, but it would send the
signal on and on again forever, which would basically kill the machine if
something like init or X or a number of other important cases got stuck
doing nothing.

Linus

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: DPT SmartRAID V and Linux 2.4-

2000-09-25 Thread Ricky Beam

On Mon, 25 Sep 2000, Nick Loman wrote:
>So as far as I can tell, the i2o stack in Linux 2.4 doesn't support the
>DPT SmartRAID V i2o controller.

"We know."  It never has. (and arguablly never will.)

>Am I right in thinking then the only option is to combine DPT's drivers
>into the kernel by hand? Is this feasible/easy to do, or better, has
>someone already done it?

If "by hand" means with 'patch', then yes.  Some people have no problem and
others commit suicide as a result of the process.  (YMWV.)  Getting the
DPT supplied driver to work in 2.3 will require some work -- spinlock
semantics and the scsi host structure changed.

I'm running a SRV in my system with 2.4.0-test5 right now, so it is doable.
(http://chickenboo.bluetopia.net/~jfbeam/DPT/)

--Ricky


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: (Fwd) CD-ROM (SCSI and IDE) not mounting disk

2000-09-25 Thread Jens Axboe

On Mon, Sep 25 2000, [EMAIL PROTECTED] wrote:
>   I am currently seeing the same behaviour. My machine is up for
>   42 days now. Kernel 2.2.16-3 (RH 6.2). I am quite sure I could
>   play CDROM a few weeks ago. But now, when I launch cdplay
>   or xplaycd, no CD is detected :
> 
> /home/danis/DISCOGRAPHIE/JethroTull/Stormwatch/mp3 > cdplay
> /dev/cdrom: Mauvais type de medium (wrong medium type)
> 
> /home/danis/DISCOGRAPHIE/JethroTull/Stormwatch/mp3 > dmesg
> ...
> VFS: Disk change detected on device ide1(22,0)
> cdrom: pid 15218 must open device O_NONBLOCK!
> cdrom: open failed.

echo "0" /proc/sys/dev/cdrom/check_media

This should be the default though, unless you've changed it along
the way.

-- 
* Jens Axboe <[EMAIL PROTECTED]>
* SuSE Labs
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: [patch] vmfixes-2.4.0-test9-B2

2000-09-25 Thread Jens Axboe

On Mon, Sep 25 2000, Andrea Arcangeli wrote:
> > The scsi layer currently "manually" does a list_add on the queue itself,
> > which doesn't look too healthy.
> 
> It's grabbing the io_request_lock so it looks healthy for now :)

It's safe alright, but if we want to do the generic_unplug_queue
instead of just hitting the request_fn (which might do anything
anyway), it would be nicer to expose this part of the block layer
(i.e. have a general way of queueing a request to the request_queue).
But I guess just

q->plug_device_fn(q, ...);
list_add(...)
generic_unplug_device(q);

would suffice in scsi_lib for now.

-- 
* Jens Axboe <[EMAIL PROTECTED]>
* SuSE Labs
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Packet loss with Znyx 4port 10/100

2000-09-25 Thread Johathan Earle
Title: Packet loss with Znyx 4port 10/100





Hi,


I'm running kernel 2.4.0-test9-pre4 on a Dell GX1 (PIII-500) with a Znyx 4port 10/100 card (4 tulip 21143 ethernet controllers onboard).  With the ports locked at 10mbps full duplex, and traffic (64byte UDP packets) from our generator running through the box, I am seeing a little under 2% packet loss in shortterm stats gathering (< 10minutes).  Decreasing the rate to about 5mbps seems to reduce the loss to 0.  Running traffic for extended periods of time (1hr or more - I left it running over the weekend) will show a total loss of about 3-3.5%.

I've tried the stock tulip driver, as well as employing the modified tulip driver to take advantage of hardware flow control.  No change between the two.

I originally thought the problem was due to the card assigning 1 or 2 IRQs to all of the ports, thus raising the possibility of IRQ sharing problems.  I then moved the card to the 1st slot, and it was then able to obtain a unique IRQ for each port.  After I disabled the system's built-in ethernet (which also claimed an IRQ used by the Znyx card) and restarted the traffic flow, I noticed that the loss decreased about 0.2%.

I've also tried replacing the card with no change.


Any ideas?


Cheers!
Jon





Re: [patch] vmfixes-2.4.0-test9-B2

2000-09-25 Thread Alexander Viro




On Sun, 24 Sep 2000, Linus Torvalds wrote:

[directories in pagecache on ext2]

> > I'll do it and post the result tomorrow. I bet that there will be issues
> > I've overlooked (stuff that happens to work on UFS, but needs to be more
> > general for ext2), so it's going as "very alpha", but hey, it's pretty
> > straightforward, so there is a chance to debug it fast. Yes, famous last
> > words and all such...
> 
> Sure.

All right, I think I've got something that may work. Yes, there were issues -
UFS has the constant directory chunk size (1 sector), while ext2 makes it
equal to fs block size. _Bad_ idea, since the sector writes are atomic and
block ones... Oh, well, so ext2 is slightly less robust. It required some
changes, I'll do the initial testing and post the patch once it will pass
the trivial tests.

BTW, why on the Earth had we done it that way? It has no noticable effect
on directory fragmentation, it makes code (both in page- and buffer-cache
variants) more complex, it's less robust (by definition - directory layout
may be broken easier)... What was the point?

Not that we could do something about that now (albeit as a ro-compat feature
it would be nice), but I'm curious about the reasons...
Cheers,
Al

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: the new VMt

2000-09-25 Thread yodaiken

On Mon, Sep 25, 2000 at 04:47:21PM -0400, Benjamin C.R. LaHaise wrote:
> On Mon, 25 Sep 2000 [EMAIL PROTECTED] wrote:
> 
> > On Mon, Sep 25, 2000 at 09:23:48PM +0100, Alan Cox wrote:
> > > > my prediction is that if you show me an example of 
> > > > DoS vulnerability,  I can show you fix that does not require bean counting.
> > > > Am I wrong?
> > > 
> > > I think so. Page tables are a good example
> > 
> > I'm not too sure of what you have in mind, but if it is
> >  "process creates vast virtual space to generate many page table
> >   entries -- using mmap"
> > the answer is, virtual address space quotas and mmap should kill 
> > the process on low mem for page tables.
> 
> No.  Page tables are not freed after munmap (and for good reason).  The
> counting of page table "beans" is critical.

I've seen the assertion before, reasons would be interesting.


-- 
-
Victor Yodaiken 
Finite State Machine Labs: The RTLinux Company.
 www.fsmlabs.com  www.rtlinux.com

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



RE: PATCH 2.2.18.9: Backport /proc/pci from 2.4.x to 2.2.x

2000-09-25 Thread Dunlap, Randy

> From: Jeff Garzik [mailto:[EMAIL PROTECTED]]
> 
> On Mon, 25 Sep 2000, Dan Hollis wrote:
> > On Mon, 25 Sep 2000, Jeff Garzik wrote:
> > > I see you suggestion in the same way...  If we keep the 
> PCI device name
> > > data around after boot, then we have a lot of kernel 
> memory locked up
> > > on the off chance that a HotPlug PCI device might appear 
> for which we
> > > need a name.
> > > I would much prefer a userspace solution for naming 
> unnamed PCI devices
> > > after boot...
> 
> > How about the kernel calling lspci?
> 
> Kernel calling a proggie is no problem... CONFIG_KMOD does 
> it, and Linus
> has suggest that hotplugging a device needs to fire off a script, like
> 
>   /sbin/hotplug-net eth0 # new eth0 just inserted
> 
> If hotplugging executes an action, then updating the PCI device name
> should become part of that.  That implies that the kernel won't do any
> of the executing...  /sbin/hotplug-net will initiate the device name
> update, so the kernel needs a way to update the device name at the
> request of userspace.  It could be something as simple as 
>   echo name > /proc/bus/pci/00/0a.0/name
> or something more complex...

2.4.0-testN kernel already calls /sbin/hotplug (for USB),
or whatever the string value in /proc/sys/kernel/hotplug is.

It takes several argv and envp parameters so that different
buses and interfaces can be supported.

~Randy

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: the new VMt

2000-09-25 Thread yodaiken

On Mon, Sep 25, 2000 at 09:46:35PM +0100, Alan Cox wrote:
> > I'm not too sure of what you have in mind, but if it is
> >  "process creates vast virtual space to generate many page table
> >   entries -- using mmap"
> > the answer is, virtual address space quotas and mmap should kill 
> > the process on low mem for page tables.
> 
> Those quotas being exactly what beancounter is

But that is a function specific counter, not a counter in the 
alloc code.


-- 
-
Victor Yodaiken 
Finite State Machine Labs: The RTLinux Company.
 www.fsmlabs.com  www.rtlinux.com

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: [patch] vmfixes-2.4.0-test9-B2

2000-09-25 Thread Peter Osterlund

Andrea Arcangeli <[EMAIL PROTECTED]> writes:

> The new elevator ordering algorithm returns me much better numbers
> than the CSCAN one with tiobench.

Do you know why? Is it because the average seek distance becomes
smaller with your algorithm? (I later realized that request merging is
done before the elevator function kicks in, so your algorithm should
not produce more seeks than a CSCAN algorithm. Unfortunately I didn't
realize this when I wrote my CSCAN patch.)

Btw, does anyone know how the seek time depends on the seek distance
on modern disk hardware?

-- 
Peter Österlund  Email: [EMAIL PROTECTED]
Sköndalsvägen 35[EMAIL PROTECTED]
S-128 66 Sköndal Home page: http://home1.swipnet.se/~w-15919
Sweden   Phone: +46 8 942647

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: mount -t bind gone?

2000-09-25 Thread H. Peter Anvin

Followup to:  <[EMAIL PROTECTED]>
By author:Andries Brouwer <[EMAIL PROTECTED]>
In newsgroup: linux.dev.kernel
>
> On Mon, Sep 25, 2000 at 11:29:48AM -0700, H. Peter Anvin wrote:
> 
> > I guess mount -t bind is officially gone.  What is the new official
> > replacement?  New system call?
> 
> mount --bind
> 
> (use mount from util-linux 2.10o)
> 

Hmm... what do you think is the best way to do this in autofs?  Should
I call mount(8), or just mount(2)?  In particular, does:

a) umount(8) do the right thing with these, and
b) mount(8) record this in /etc/mtab?

Thanks,

-hpa
-- 
<[EMAIL PROTECTED]> at work, <[EMAIL PROTECTED]> in private!
"Unix gives you enough rope to shoot yourself in the foot."
http://www.zytor.com/~hpa/puzzle.txt
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: the new VMt

2000-09-25 Thread Alan Cox

> I'm not too sure of what you have in mind, but if it is
>  "process creates vast virtual space to generate many page table
>   entries -- using mmap"
> the answer is, virtual address space quotas and mmap should kill 
> the process on low mem for page tables.

Those quotas being exactly what beancounter is

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: mount -t bind gone?

2000-09-25 Thread Andries Brouwer

On Mon, Sep 25, 2000 at 11:29:48AM -0700, H. Peter Anvin wrote:

> I guess mount -t bind is officially gone.  What is the new official
> replacement?  New system call?

mount --bind

(use mount from util-linux 2.10o)

Andries
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: the new VMt

2000-09-25 Thread Benjamin C.R. LaHaise

On Mon, 25 Sep 2000 [EMAIL PROTECTED] wrote:

> On Mon, Sep 25, 2000 at 09:23:48PM +0100, Alan Cox wrote:
> > > my prediction is that if you show me an example of 
> > > DoS vulnerability,  I can show you fix that does not require bean counting.
> > > Am I wrong?
> > 
> > I think so. Page tables are a good example
> 
> I'm not too sure of what you have in mind, but if it is
>  "process creates vast virtual space to generate many page table
>   entries -- using mmap"
> the answer is, virtual address space quotas and mmap should kill 
> the process on low mem for page tables.

No.  Page tables are not freed after munmap (and for good reason).  The
counting of page table "beans" is critical.

-ben

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: the new VMt

2000-09-25 Thread yodaiken

On Mon, Sep 25, 2000 at 09:23:48PM +0100, Alan Cox wrote:
> > my prediction is that if you show me an example of 
> > DoS vulnerability,  I can show you fix that does not require bean counting.
> > Am I wrong?
> 
> I think so. Page tables are a good example

I'm not too sure of what you have in mind, but if it is
 "process creates vast virtual space to generate many page table
  entries -- using mmap"
the answer is, virtual address space quotas and mmap should kill 
the process on low mem for page tables.

> 
> 
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [EMAIL PROTECTED]
> Please read the FAQ at http://www.tux.org/lkml/

-- 
-
Victor Yodaiken 
Finite State Machine Labs: The RTLinux Company.
 www.fsmlabs.com  www.rtlinux.com

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: 2.4.0t8 strangeness

2000-09-25 Thread Oliver Xymoron

On Mon, 25 Sep 2000, Oliver Xymoron wrote:

> On my /home partition, mkdir(2) is returning EIO on ext2fs for uid!=0.
> Creating files with touch still works though. Persists after reboot,
> forced e2fsck finds nothing wrong.
> 
> About to try test9-pre6 but thought I'd mention it.

Figured it out. Ran into the reserved block limit, which used to return
ENOSPACE? Any chance this could be changed back?

--
 "Love the dolphins," she advised him. "Write by W.A.S.T.E.." 

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: the new VMt

2000-09-25 Thread Stephen C. Tweedie

Hi,

On Mon, Sep 25, 2000 at 02:04:19PM -0600, [EMAIL PROTECTED] wrote:

> > Right, but if the alternative is spurious ENOMEM when we can satisfy
> 
> An ENOMEM is not spurious if there is not enough memory. UNIX does not ask the
> OS to do impossible tricks.

Yes, but the ENOMEM _is_ spurious if you actually meant EAGAIN, and if
the OS was perfectly capable of doing the retry itself.

> > all of the pending requests just as long as they are serialised, is
> > this a problem?
> 
> I think you are solving the wrong problem. On a small memory machine, the kernel,
> utilities, and applications should be configured to use little memory.  
> BusyBox is better than BeanCount. 

Any box is a small memory machine if you get the wrong workload on it,
and the DoS attacks which are possible without beancounting let any
user bring even a large system to its knees right now.  If solving
that problem also means that small memory machines do the right thing
on their own rather than requiring specific manual configuration, then
it sounds like a good aim.

> > However, you just can't escape from the fact that on low memory
> > machinnes, we *need* beancounter-style accounting of pinned pages or
> > we'll be in Deep Trouble (TM).  We already have nasty DoS situations
> 
> What we need is simple kernel code that does not hold resources
> into a  possible deadlock situation. 



> On general principles, I don't see any substitute for clean code in the kernel and
> my prediction is that if you show me an example of 
> DoS vulnerability,  I can show you fix that does not require bean counting.
> Am I wrong?

If you have a user forking multiple processes and exhausting some
resource, then at some point you have to do something about it.  Let's
say it's page tables, just for argument's sake, because those are
currently non-swappable, but even if you make those swappable there
are plenty of other resources it might be (eg. data shoved down unix
domain sockets if you want another example).

So you have run out of physical memory --- what do you do about it?
The important observation here is that in a multi-user environment,
simply denying further allocations isn't good enough --- unless you
revoke those existing allocations you have DoS.  And you can't fairly
revoke existing allocations without knowing WHICH user has exhausted
the memory (which requires beancounter-style resource tracking), AND
having mechanisms in place to revoke all of the possible resources
which might be involved (eg unix domain socket datagrams).  kill -9
might solve that latter problem but it doesn't help in identifying who
to kill.

--Stephen
> 
> 
> 
> 
> 
> -- 
> -
> Victor Yodaiken 
> Finite State Machine Labs: The RTLinux Company.
>  www.fsmlabs.com  www.rtlinux.com
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: the new VMt

2000-09-25 Thread Alan Cox

> my prediction is that if you show me an example of 
> DoS vulnerability,  I can show you fix that does not require bean counting.
> Am I wrong?

I think so. Page tables are a good example


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: mount -t bind gone?

2000-09-25 Thread Jasper Spaans

On Mon, Sep 25, 2000 at 11:29:48AM -0700, H. Peter Anvin wrote:

> I guess mount -t bind is officially gone.  What is the new official
> replacement?  New system call?

A simple solution: update your version of mount, and try 

mount --bind /foo /bar

Regards,

Jasper

PS. If you look at the code in fs/super.c at line 1338, you'll see there's
a new mount flag, and the old code at line 1348 has been deactivated.
-- 
Jasper Spaans  <[EMAIL PROTECTED]>
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: the new VMt

2000-09-25 Thread yodaiken

On Mon, Sep 25, 2000 at 08:25:49PM +0100, Stephen C. Tweedie wrote:
> Hi,
> 
> On Mon, Sep 25, 2000 at 12:34:56PM -0600, [EMAIL PROTECTED] wrote:
> 
> > > > Process 1,2 and 3 all start allocating 20 pages
> > > > now 57 pages are locked up in non-swapable kernel space and the system 
>deadlocks OOM.
> > > 
> > > Or go the beancounter route: process 1 asks "can I pin 20 pages", gets
> > > told "yes", and goes allocating them, blocking as necessary until it
> > 
> > So you have a "pre-allocation allocator"?  Leads to interesting and hard to detect
> > bugs with old code that does not pre-allocate or with code that incorrectly 
>pre-allocates
> > or that blocks on something unrelated
> 
> Right, but if the alternative is spurious ENOMEM when we can satisfy

An ENOMEM is not spurious if there is not enough memory. UNIX does not ask the
OS to do impossible tricks.

> all of the pending requests just as long as they are serialised, is
> this a problem?

I think you are solving the wrong problem. On a small memory machine, the kernel,
utilities, and applications should be configured to use little memory.  
BusyBox is better than BeanCount. 


> However, you just can't escape from the fact that on low memory
> machinnes, we *need* beancounter-style accounting of pinned pages or
> we'll be in Deep Trouble (TM).  We already have nasty DoS situations

What we need is simple kernel code that does not hold resources
into a  possible deadlock situation. 

> which are embarassingly easy to reproduce.  If we need such
> beancounter protection, AND such protection can prevent the situation
> you describe, then do we need to go looking for another way of
> achieving the same protection?


On general principles, I don't see any substitute for clean code in the kernel and
my prediction is that if you show me an example of 
DoS vulnerability,  I can show you fix that does not require bean counting.
Am I wrong?





-- 
-
Victor Yodaiken 
Finite State Machine Labs: The RTLinux Company.
 www.fsmlabs.com  www.rtlinux.com

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: [patch] 2.4.0-test8: Alpha RTC clean-ups

2000-09-25 Thread Ralf Baechle

Reply #2 - the list's name changed to [EMAIL PROTECTED];
[EMAIL PROTECTED] is now bouncing.

  Ralf
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: the new VMt [4MB+ blocks]

2000-09-25 Thread Stephen Williams


[EMAIL PROTECTED] said:
> Sometimes allocating such monster memory blocks could be supported,
>   but it should not be expected to be *fast*.  E.g. if doing it in
>   "reliable" way needs possibly moving currently allocated pages
>   away from memory to create such a hole(s), so be it.


[EMAIL PROTECTED] said:
> Anybody here who can describe those M$ API calls ?
>   Are they kernel/DDK-only, or userspace ones, or both ?

NT does indeed support allocating contiguous buffers of memory, which is
useful when the hardware in question doesn't do scatter-gather. I have
on occasion been compelled to use these routines. (Paradoxically, the
requirements in my case came from broken NT mmap support and not from the
hardware. Blech!)

Anyhow, these routines are indeed slow. And judging by the amount of disk
noise I hear when they are called, they do try to kick out pages to make
an allocation work. However, even so the M$ calls will eventually fail due
to lack of large enough holes, so fragmentation takes its toll.

So, they are both slow and unreliable under NT. But drivers that use them
tend to be loaded once at boot time, and that's it.
-- 
Steve Williams"The woods are lovely, dark and deep.
[EMAIL PROTECTED]  But I have promises to keep,
[EMAIL PROTECTED]and lines to code before I sleep,
http://www.picturel.com   And lines to code before I sleep."


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: [patch] 2.4.0-test8: Alpha RTC clean-ups

2000-09-25 Thread Ralf Baechle

On Mon, Sep 25, 2000 at 12:50:06PM +0200, Jan-Benedict Glaw wrote:

> On Mon, Sep 25, 2000 at 11:35:35AM +0200, Maciej W. Rozycki wrote:
> > On Fri, 22 Sep 2000, Jan-Benedict Glaw wrote:
> > 
> > > Instead of having hard-coded values, we should maybe do something
> > > more variable like:
> > >  if (year >= (20 + YEARS_SINCE_2000) && year < (48 + YEARS_SINCE_2000)
> > >   ...
> > 
> >  This looks reasonable.
> > 
> > > This applies to other platforms using different epoch vaules as
> > > well, of course...
> > 
> >  Alpha appears to be the only one.
> 
> ./driver/char/rtc.c:rtc_init()
> #if defined(__alpha__) || defined(__mips__)
> [...]
> 
> MIPS does that as well _in the wrong way_ compared to rtc.c:
> ./arch/mips/dev/time.c:time_init()
> /*
>  * The DECstation RTC is used as a TOY (Time Of Year).
>  * The PROM will reset the year to either '70, '71 or '72.
>  * This hack will only work until Dec 31 2001.
>  */
> year += 1928;

This has already been fixed.  In any case the DECstation RTC stuff is so
weird, don't try to explain it rationally ...

> Fehler eingestehen, Größe zeigen: Nehmt die Rechtschreibreform zurück!!!

Rechtschreibdeformation ...

  Ralf
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: [patch] vmfixes-2.4.0-test9-B2 - fixing deadlocks

2000-09-25 Thread Stephen C. Tweedie

Hi,

On Mon, Sep 25, 2000 at 09:32:42PM +0200, Andrea Arcangeli wrote:

> Having shrink_mmap that browse the mapped page cache is useless
> as having shrink_mmap browsing kernel memory and anonymous pages
> as it does in 2.2.x as far I can tell. It's an algorithm
> complexity problem and it will waste lots of CPU.

It's a compromise between CPU cost and Getting It Right.  Ignoring the
mmap is not a good solution either.

> Now think this simple real life example. A 2G RAM machine running an executable
> image of 1.5G, 300M in shm and 200M in cache.

OK, and here's another simple real life example.  A 2GB RAM machine
running something like Oracle with a hundred client processes all
shm-mapping the same shared memory segment.

Oh, and you're also doing lots of file IO.  How on earth do you decide
what to swap and what to page out in this sort of scenario, where
basically the whole of memory is data cache, some of which is mapped
and some of which is not?

If you don't separate out the propagation of referenced bits from the
actual page aging, then every time you pass over the whole VM working
set, you're likely to find a handful of live references to some of the
shared memory, and a hundred or so references that haven't done
anything since last time.  Anything that only ages per-pte, not
per-page, is simply going to die horribly under such load, and any
imbalance between pure filesystem cache and VM pressure will be
magnified to the point where one dominates.

Hence my observation that it's really easy to find special cases where
certain optimisations make a ton of sense, but you often lose balance
in the process.  

Cheers,
 Stephen
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: [patch] vmfixes-2.4.0-test9-B2 - fixing deadlocks

2000-09-25 Thread Rik van Riel

On Mon, 25 Sep 2000, Andrea Arcangeli wrote:
> On Mon, Sep 25, 2000 at 07:06:57PM +0100, Stephen C. Tweedie wrote:
> > Good.  One of the problems we always had in the past, though, was that
> > getting the relative aging of cache vs. vmas was easy if you had a
> > small set of test loads, but it was really, really hard to find a
> > balance that didn't show pathological behaviour in the worst cases.
> 
> Yep, that's not trivial.

It is. Just do physical-page based aging (so you age all the
pages in the system the same) and the problem is solved.

> > > I may be overlooking something but where do you notice when a page
> > > gets unmapped from the last mapping and put it back into a place
> > > that can be reached from shrink_mmap (or whatever the cache recycler is)?
> > 
> > It doesn't --- that is part of the design.  The vm scanner propagates
> 
> And that's the inferior part of the design IMHO.

Indeed, but physical page based aging is a definate
2.5 thing ... ;(

regards,

Rik
--
"What you're running that piece of shit Gnome?!?!"
   -- Miguel de Icaza, UKUUG 2000

http://www.conectiva.com/   http://www.surriel.com/

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: the new VMt

2000-09-25 Thread Stephen C. Tweedie

Hi,

On Mon, Sep 25, 2000 at 12:34:56PM -0600, [EMAIL PROTECTED] wrote:

> > > Process 1,2 and 3 all start allocating 20 pages
> > > now 57 pages are locked up in non-swapable kernel space and the system 
>deadlocks OOM.
> > 
> > Or go the beancounter route: process 1 asks "can I pin 20 pages", gets
> > told "yes", and goes allocating them, blocking as necessary until it
> 
> So you have a "pre-allocation allocator"?  Leads to interesting and hard to detect
> bugs with old code that does not pre-allocate or with code that incorrectly 
>pre-allocates
> or that blocks on something unrelated

Right, but if the alternative is spurious ENOMEM when we can satisfy
all of the pending requests just as long as they are serialised, is
this a problem?

If you want, wrap it in a "get_free_pagev" call which returns a vector
of pointers to free pages, doing whatever accounting is needed.  You
don't have to push all of it to the callers.

However, you just can't escape from the fact that on low memory
machinnes, we *need* beancounter-style accounting of pinned pages or
we'll be in Deep Trouble (TM).  We already have nasty DoS situations
which are embarassingly easy to reproduce.  If we need such
beancounter protection, AND such protection can prevent the situation
you describe, then do we need to go looking for another way of
achieving the same protection?

--Stephen
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



[PATCH] sound updates

2000-09-25 Thread Christoph Hellwig

Hi Linus,
I've attached twosound-related patches:
 - the first one moves all remaining sound drivers to the
   module_init/module_exit stuff. It's not really critical,
   but makes another subsystem clean of the old init stuff.
 - the second patch removes the softoss software synthesizer.
   It does only work with the OSS/Free style sounddrivers and
   has some ugly hooks in the sequencer core.
   Alan has accepted this change.

Christoph

-- 
Always remember that you are unique.  Just like everyone else.


diff -uNr linux.orig/drivers/char/mem.c linux/drivers/char/mem.c
--- linux.orig/drivers/char/mem.c   Sat Sep 23 09:04:14 2000
+++ linux/drivers/char/mem.cMon Sep 25 20:42:29 2000
@@ -28,12 +28,6 @@
 #ifdef CONFIG_I2C
 extern int i2c_init_all(void);
 #endif
-#ifdef CONFIG_SOUND
-void soundcore_init(void);
-#ifdef CONFIG_SOUND_OSS
-void soundcard_init(void);
-#endif
-#endif
 #ifdef CONFIG_SPARCAUDIO
 extern int sparcaudio_init(void);
 #endif
diff -uNr linux.orig/drivers/sound/cmpci.c linux/drivers/sound/cmpci.c
--- linux.orig/drivers/sound/cmpci.cSat Sep 23 09:03:09 2000
+++ linux/drivers/sound/cmpci.c Mon Sep 25 19:54:26 2000
@@ -2287,8 +2287,6 @@
 MODULE_PARM(spdif_loop, "i");
 MODULE_PARM(four_ch, "i");
 MODULE_PARM(rear_out, "i");
-
-int  __init init_module(void)
 #else
 #ifdef CONFIG_SOUND_CMPCI_SPDIFLOOP
 static int spdif_loop = 1;
@@ -2305,9 +2303,9 @@
 #else
 static int rear_out = 0;
 #endif
-
-int __init init_cmpci(void)
 #endif
+
+static int __init init_cmpci(void)
 {
struct cm_state *s;
struct pci_dev *pcidev = NULL;
@@ -2499,12 +2497,10 @@
 
 /* - */
 
-#ifdef MODULE
-
 MODULE_AUTHOR("ChenLi Tien, [EMAIL PROTECTED]");
 MODULE_DESCRIPTION("CMPCI Audio Driver");
 
-void cleanup_module(void)
+static void __exit cleanup_cmpci(void)
 {
struct cm_state *s;
 
@@ -2538,4 +2534,5 @@
printk(KERN_INFO "cmpci: unloading\n");
 }
 
-#endif /* MODULE */
+module_init(init_cmpci);
+module_exit(cleanup_cmpci);
diff -uNr linux.orig/drivers/sound/msnd_pinnacle.c linux/drivers/sound/msnd_pinnacle.c
--- linux.orig/drivers/sound/msnd_pinnacle.cSat Sep 23 09:03:09 2000
+++ linux/drivers/sound/msnd_pinnacle.c Mon Sep 25 19:54:26 2000
@@ -1610,10 +1610,6 @@
 static int fifosize __initdata =   DEFFIFOSIZE;
 static int calibrate_signal __initdata;
 
-/* If we're a module, this is just init_module */
-
-int init_module(void)
-
 #else /* not a module */
 
 static int write_ndelay __initdata =   -1;
@@ -1692,14 +1688,10 @@
 #endif
 static int
 calibrate_signal __initdata =  CONFIG_MSND_CALSIGNAL;
+#endif /* MODULE */
 
-#ifdef MSND_CLASSIC
-int __init msnd_classic_init(void)
-#else
-int __init msnd_pinnacle_init(void)
-#endif /* MSND_CLASSIC */
 
-#endif /* MODULE */
+static int __init msnd_init(void)
 {
int err;
 #ifndef MSND_CLASSIC
@@ -1875,11 +1867,12 @@
return 0;
 }
 
-#ifdef MODULE
-void cleanup_module(void)
+static void __exit msdn_cleanup(void)
 {
unload_multisound();
msnd_fifo_free();
msnd_fifo_free();
 }
-#endif
+
+module_init(msnd_init);
+module_exit(msdn_cleanup);
diff -uNr linux.orig/drivers/sound/sound_core.c linux/drivers/sound/sound_core.c
--- linux.orig/drivers/sound/sound_core.c   Sat Sep 23 09:03:09 2000
+++ linux/drivers/sound/sound_core.cMon Sep 25 19:54:26 2000
@@ -36,6 +36,7 @@
 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -62,9 +63,6 @@
 #ifdef CONFIG_SOUND_MSNDPIN
 extern int msnd_pinnacle_init(void);
 #endif
-#ifdef CONFIG_SOUND_CMPCI
-extern int init_cmpci(void);
-#endif
 
 /*
  * Low level list operator. Scan the ordered list, find a hole and
@@ -545,12 +543,11 @@
 extern int mod_firmware_load(const char *, char **);
 EXPORT_SYMBOL(mod_firmware_load);
 
-#ifdef MODULE
 
 MODULE_DESCRIPTION("Core sound module");
 MODULE_AUTHOR("Alan Cox");
 
-void cleanup_module(void)
+static void __exit cleanup_soundcore(void)
 {
/* We have nothing to really do here - we know the lists must be
   empty */
@@ -558,10 +555,7 @@
devfs_unregister (devfs_handle);
 }
 
-int init_module(void)
-#else
-int soundcore_init(void)
-#endif
+static int __init init_soundcore(void)
 {
if(devfs_register_chrdev(SOUND_MAJOR, "sound", _fops)==-1)
{
@@ -569,20 +563,9 @@
return -EBUSY;
}
devfs_handle = devfs_mk_dir (NULL, "sound", NULL);
-   /*
-*  Now init non OSS drivers
-*/
-#ifdef CONFIG_SOUND_CMPCI
-   init_cmpci();
-#endif
-#ifdef CONFIG_SOUND_MSNDCLAS
-   msnd_classic_init();
-#endif
-#ifdef CONFIG_SOUND_MSNDPIN
-   msnd_pinnacle_init();
-#endif
-#ifdef CONFIG_SOUND_VWSND
-   init_vwsnd();
-#endif
+
return 0;
 }
+
+module_init(init_soundcore);
+module_exit(cleanup_soundcore);
diff -uNr linux.orig/drivers/sound/soundcard.c linux/drivers/sound/soundcard.c
--- linux.orig/drivers/sound/soundcard.c

Re: the new VMt

2000-09-25 Thread Stephen C. Tweedie

Hi,

On Mon, Sep 25, 2000 at 08:09:31PM +0100, Alan Cox wrote:
> > > Indeed. But we wont fail the kmalloc with a NULL return
> > 
> > Isn't that the preferred behaviour, though?  If we are completely out
> > of VM on a no-swap machine, we should be killing one of the existing
> > processes rather than preventing any progress and keeping all of the
> > old tasks alive but deadlocked.
> 
> Unless Im missing something we wont kill any task in that condition - even
> a SIGKILL will make no odds as everyone is asleep in kmalloc

Right.  Eeek.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: Better than SYNcookies?

2000-09-25 Thread Alan Cox

> So is he right, is his solution better than SYNcookies and there is
> something to be learned from his solution? Or does someone need to take
> him to school on the issue.

He isnt preserving the negotiated TCP MSS.

Other issues:

- If his ISN is the ip address then its a constant which is far worse than 
random and also usable for replay attacks 

[ie I dial up log the cookie, wait for you to get the same line - and I can
 collect the dialup rack over time]

Alan





-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: the new VMt

2000-09-25 Thread Alan Cox

> > Indeed. But we wont fail the kmalloc with a NULL return
> 
> Isn't that the preferred behaviour, though?  If we are completely out
> of VM on a no-swap machine, we should be killing one of the existing
> processes rather than preventing any progress and keeping all of the
> old tasks alive but deadlocked.

Unless Im missing something we wont kill any task in that condition - even
a SIGKILL will make no odds as everyone is asleep in kmalloc


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: the new VMt [4MB+ blocks]

2000-09-25 Thread Matti Aarnio

 [Chopped the recipient list radically]

On Mon, Sep 25, 2000 at 06:06:11PM +0100, Alan Cox wrote:
> > > > Stupidity has no limits...
> > > Unfortunately its frequently wired into the hardware to save a few cents on
> > > scatter gather logic.
> > 
> > Since when hardware folks became exempt from the rule above? 128K is
> > almost tolerable, there were requests for 64 _mega_bytes...
> 
> Most cheap ass PCI hardware is built on the basis you can do linear 4Mb 
> allocations. There is a reason for this. You can do that 4Mb allocation on
> NT or Windows 9x

Sure, but intel processors have this neat 4 MB "super-page"
feature in the MMU...  (as we all well know)

Sometimes allocating such monster memory blocks could be supported,
but it should not be expected to be *fast*.  E.g. if doing it in
"reliable" way needs possibly moving currently allocated pages
away from memory to create such a hole(s), so be it..


Anybody here who can describe those M$ API calls ?
Are they kernel/DDK-only, or userspace ones, or both ?

/Matti Aarnio
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Better than SYNcookies?

2000-09-25 Thread Dan Hollis

I dont know how many here read /. but recently someone's gone round
touting a new SYN defense system that he claims is better than SYNcookies.

http://grc.com/r/NoMoreDoS2.htm

Specifically, Steve Gibson <[EMAIL PROTECTED]> claims:

I followed those links and read about SYN Cookies yesterday after
Torinak mentioned them for the first time. The SYN Cookie system does,
indeed, have a similar concept to what I came up with.

>From my reading of everything I've been able to find in archived four-
year old threads, they appear to have over-complicated, and thus
broken, their solution and ended up deciding that it was a bad idea.
They incorporate a whole unnecessary handful of junk into the
formation of their "cookie" which is derived from an MD5 (message
digest) hash ... and they talk about incrementing a "secret" every
minute ... which is not necessary if the IP is encrypted as with my
system.

(Note that by incorporating the source port into their "cookie" they
make their "hash" algorithm easily explorable by anyone simply by
using different apparent source ports.  My solution has no such
vulnerability and weakness.)

And they have the problem of producing non-monotonically increasing
outbound initial sequence numbers in their SYN/ACK since the output of
the MD5 hash is deliberately non-monotonic. AND -- most significantly
-- the original discussion threads I found discuss and acknowledge at
least some of these limitations.  So they knew their system had these
flaws.

Their solution -- because it was unstable and broke some important
aspects of TCP protocol (which mine does not) was NOT to use the
system all the time, but rather only to "switch it on" dynamically
when they detected that their server was under SYN flooding attack.
Someone mentioned that "switching it off and on" would thus create
another problem of essentially changing the "ISN" generation, thus
potentially creating non-unique sequences on the fly.

The bottom line is, I think that the previous efforts sort of suffered
from "committee design syndrome" and never really got off the ground
because they realized that the result would break too many other
things.


So is he right, is his solution better than SYNcookies and there is
something to be learned from his solution? Or does someone need to take
him to school on the issue.

-Dan

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: the new VMt

2000-09-25 Thread yodaiken

On Mon, Sep 25, 2000 at 07:24:53PM +0100, Stephen C. Tweedie wrote:
> Hi,
> 
> On Mon, Sep 25, 2000 at 12:13:15PM -0600, [EMAIL PROTECTED] wrote:
> 
> > > Definitely not.  GFP_ATOMIC is reserved for things that really can't
> > > swap or schedule right now.  Use GFP_ATOMIC indiscriminately and you'll
> > > have to increase the number of atomic-allocatable pages.
> > 
> > Process 1,2 and 3 all start allocating 20 pages
> >   process 1 stalls after allocating 19
> >   some memory is freed and process 2 runs and stall after allocating 19
> >   some memory is free and process 3 runs and stalls after allocating 19
> >  
> > now 57 pages are locked up in non-swapable kernel space and the system 
>deadlocks OOM.
> 
> Or go the beancounter route: process 1 asks "can I pin 20 pages", gets
> told "yes", and goes allocating them, blocking as necessary until it

So you have a "pre-allocation allocator"?  Leads to interesting and hard to detect
bugs with old code that does not pre-allocate or with code that incorrectly 
pre-allocates
or that blocks on something unrelated

   preallocte 20 pages
   get first
   ask for an inode -- block waiting for an inode


or
   preallocate 20 pages
   if(checkuserpath())return -ENOWAY; /* stranding my pre-allocate */
   else get them pages


What's nice about these is they don't cause errors on test and seem more 
difficult to spot than looking for cases where allocated memory gets stranded.
Doesn't the alloc_vec method seem simpler to you?

> gets them.  Process 2 asks "can *I* pin 20 pages" and the answer is
> either "not right now", in which case it waits for process 1 to
> release its reservation, or "no, you've exceeded your user quota" in

Or for someone else to free more pages ... 

> which case it fails with ENOMEM.  (That latter case can protect us
> against a lot of DoS attacks from local users.)

I like ENOMEM anyways.

> 
> The same accounting really needs to be done for page tables, as that
> represents one of the biggest sources of unaccounted, unswappable
> pages which user processes can cause to be created right now.



-- 
-
Victor Yodaiken 
Finite State Machine Labs: The RTLinux Company.
 www.fsmlabs.com  www.rtlinux.com

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: 2.4 kernels do not boot on UX (Alpha)

2000-09-25 Thread Richard Henderson

On Mon, Sep 25, 2000 at 04:22:38AM -0500, Jeff Garzik wrote:
> To give us a knowledge jump start... what is broken?

As far as I can tell, everything wrt actually configuring bridges.

If a bridge is completely uninitialized, then it won't be properly
added to the bus heirarchy, and neither will its children.  More usual
is for the bridge to be partially initialized, with MEM for the bridge
and the device set, but IO completely disabled.

Either way we get the same result -- a device not properly rooted in
the bus heirarchy with its io and/or memory disabled.  Which tends to
piss off most of the drivers we have.

The solution, in my opinion, is to be more aggressive with bus layout
and initialization in drivers/pci/setup.c.  I think a proper implementation
would

  * Depth-first, sort the memory and io ranges for the bus by required
alignment.  This should give us better packing than we have currently.
For now, each bus starts assigning io and mem ranges at zero.  Note
that this is in the kernel data structures only.  Propagate up the
size and required alignment of the bus to its device counterpart in
the parent bus.

  * In the root bus, in some arch specific way, choose some non-zero
address in which to place the block we've collected for its children.

  * Depth-last, propagate down the base addresses plus offset for the buses.

  * Diddle the hardware.

> The latest test9 pre-patches include some bridge cleanup..

I've not seen that yet...



r~
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



mount -t bind gone?

2000-09-25 Thread H. Peter Anvin

I guess mount -t bind is officially gone.  What is the new official
replacement?  New system call?

-=hpa

-- 
<[EMAIL PROTECTED]> at work, <[EMAIL PROTECTED]> in private!
"Unix gives you enough rope to shoot yourself in the foot."
http://www.zytor.com/~hpa/puzzle.txt
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: Bonding Driver Questions

2000-09-25 Thread Thomas Davis

Constantine Gavrilov wrote:
> 
> 1) How can I check for the link status from the user space?
> 2) Could enslaved interface be released without bringing the master
> interface down? If yes, how? Could we have ifunslave?
> 

Link status is not used at all in v2.2  (and would mean a rewrite of
drivers to get it)

Link status is used in v2.4.  Not all drivers support link status.  In
fact, I don't know of any that do - but it's possible now to do it.

Simply taking down the interface should be enough to remove it from
enslavement.

-- 
+--
Thomas Davis| PDSF Project Leader
[EMAIL PROTECTED] | 
(510) 486-4524  | "Only a petabyte of data this year?"
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: the new VMt

2000-09-25 Thread Stephen C. Tweedie

Hi,

On Mon, Sep 25, 2000 at 12:13:15PM -0600, [EMAIL PROTECTED] wrote:

> > Definitely not.  GFP_ATOMIC is reserved for things that really can't
> > swap or schedule right now.  Use GFP_ATOMIC indiscriminately and you'll
> > have to increase the number of atomic-allocatable pages.
> 
> Process 1,2 and 3 all start allocating 20 pages
>   process 1 stalls after allocating 19
>   some memory is freed and process 2 runs and stall after allocating 19
>   some memory is free and process 3 runs and stalls after allocating 19
>  
> now 57 pages are locked up in non-swapable kernel space and the system deadlocks 
>OOM.

Or go the beancounter route: process 1 asks "can I pin 20 pages", gets
told "yes", and goes allocating them, blocking as necessary until it
gets them.  Process 2 asks "can *I* pin 20 pages" and the answer is
either "not right now", in which case it waits for process 1 to
release its reservation, or "no, you've exceeded your user quota" in
which case it fails with ENOMEM.  (That latter case can protect us
against a lot of DoS attacks from local users.)

The same accounting really needs to be done for page tables, as that
represents one of the biggest sources of unaccounted, unswappable
pages which user processes can cause to be created right now.

--Stephen
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: st0 errors - 2.2.16

2000-09-25 Thread Richard B. Johnson

On Mon, 25 Sep 2000, Michael J. Dikkema wrote:

> 
> I get these errors whenever I try to read data off of a new tape drive
> that we got. (Onstream ADR-50)
> 
> st0: Error 2603 (sugg. bt 0x20, driver bt 0x26, host bt 0x3).
> st0: Error on write filemark.

You should not get a write error when reading a tape drive. Are you
sure you got the 'Error on write filemark' while reading?

Otherwise, this just looks like the kind of error you get if the
tape has gotten (logically) damaged. If it's one of those tapes that
doesn't require a format, just reuse it during your next backup. If it's
one of those that requires a 'factory format' (bummer), throw it
away.


Cheers,
Dick Johnson

Penguin : Linux version 2.2.15 on an i686 machine (797.90 BogoMips).

"Memory is like gasoline. You use it up when you are running. Of
course you get it all back when you reboot..."; Actual explanation
obtained from the Micro$oft help desk.


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



2.4.0-test9-pre6 SMP detected LOCKUP, mm code

2000-09-25 Thread Petr Vandrovec

Hi,
  2.4.0-test9-pre6 just blew on me :-( OOps typed by hand...
I have no idea how CPUs are numbered now, but couple of months ago
they were numbered CPU0 and CPU1 ;-) I have only two CPUs...

NMI Watchdog detected LOCKUP on CPU 12
CPU: 12
EIP: 0010:[]   <- outofline lock code for 
EFLAGS: 0086wq_write_lock_irqsave in add_wait_queue_exclusive
EAX: CCD58104  EBX: CCD580FC  ECX: 0086  EDX: CCD59F70
ESI: CCD58104  EDI: CCD59F70  EBP: CCD58000  ESP: CCD59F54
DS: 0018  ES: 0018  SS: 0018
Process  (pid: 14, stackpage = ccd59000)   <- there is no process 14 this boot...
I do not think that it ever was...
Stack: ccd580fc ccd59f70 ccd58000 c0109b95 ccd580fc d289b000 0054 01234567
   ccd58000   c0109d58 ccd580fc 0180  c01da473
   ccd5a000 d289b000 0054 ccd5a000 5f636973 ccd580e0 5a33676e 5a313058
Call Trace: [] __down + 65/190
[]
[] __down_failed + 8/11
[] outofline lock code for __down_failed in do_page_fault+90
(down(>mmap_sem))
[]

Ideas? I'm under impression that it happened to me already on friday
with 2.4.0-test9-pre5, but as it was in the middle of some presentation,
I had no time to write oops down (for sure some NMI watchdog... occured, but
I'm not sure that this one).

At time of oops I was reading 'fbtv' manpage...

Hardware: Gigabyte 6BXDS, 2x PentiumIII/450, 256MB of RAM, 18GB IDE, watching 
  TV (olympics) with Bt848, G400 dualhead
Thanks,
Petr Vandrovec
[EMAIL PROTECTED]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: the new VMt

2000-09-25 Thread Alan Cox

> there is no swap.  If there is truly nothing kswapd can do to recover
> here, then we are truly OOM.  Otherwise, kswapd should be able to free

Indeed. But we wont fail the kmalloc with a NULL return

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: the new VMt

2000-09-25 Thread yodaiken

On Mon, Sep 25, 2000 at 08:04:54PM +0200, Jamie Lokier wrote:
> [EMAIL PROTECTED] wrote:
> > > [EMAIL PROTECTED] wrote:
> > > >walk = out;
> > > > while(nfds > 0) {
> > > > poll_table *tmp = (poll_table *) __get_free_page(GFP_KERNEL);
> > > > if (!tmp) {
> > > 
> > > Shouldn't this be GFP_USER?  (Which would also conveniently fix the
> > > problem Victor's pointing out...)
> > 
> > It should probably be GFP_ATOMIC, if I understand the mm right. 
> 
> Definitely not.  GFP_ATOMIC is reserved for things that really can't
> swap or schedule right now.  Use GFP_ATOMIC indiscriminately and you'll
> have to increase the number of atomic-allocatable pages.

Process 1,2 and 3 all start allocating 20 pages
  process 1 stalls after allocating 19
  some memory is freed and process 2 runs and stall after allocating 19
  some memory is free and process 3 runs and stalls after allocating 19
 
now 57 pages are locked up in non-swapable kernel space and the system deadlocks 
OOM.



> > The algorithm for requesting a collection of reources and freeing all
> > of them on failure is simple, fast, and robust.
> 
> Allocation is just as fast with GFP_KERNEL/USER, just less likely to

It's not speed, it's deadlock avoidance. 

> fail and less likely to break something else that really needs
> GFP_ATOMIC allocations.

My point here is simply that error returns in memory allocation allow 
higher level kernel operations to safely marshal a collection of resources following
a safe algorithm that is optimized for the case when there is no memory shortage
and that only starts going to the slow case when the system is stalling due to memory
shortages anyways.



> 
> -- Jamie

-- 
-
Victor Yodaiken 
Finite State Machine Labs: The RTLinux Company.
 www.fsmlabs.com  www.rtlinux.com

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: lvm in 2.4.0-test9pre5

2000-09-25 Thread Jeff Garzik

On Mon, 25 Sep 2000, Jan Niehusmann wrote:
> > But I don't think there is anything wrong with grouping RAID and LVM under
> > the title "md", and just leaving it as such. 

> It seems that the current setup makes it impossible to compile lvm without
> compiling md.c. But md.c is not needed for lvm, is it?
> 
> I think we need two different config options now: One to enable the 
> drivers/md/ directory, and one to compile md.c. 

We don't need a config option just to jump into another directory.
Probably just a makefile or config bug..

Jeff



-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: [patch] vmfixes-2.4.0-test9-B2 - fixing deadlocks

2000-09-25 Thread Stephen C. Tweedie

Hi,

On Mon, Sep 25, 2000 at 07:03:47PM +0200, Andrea Arcangeli wrote:
> 
> > This really seems to be the biggest difference between the two
> > approaches right now.  The FreeBSD folks believe fervently that one of
> > [ aging cache and mapped pages in the same cycle ]
> 
> Right.
> 
> And since you move the page into the active list only once you reach it from
> the cache recycler and you find it with page->age != 0, you also spend time
> putting those pages back and forth from those LRU lists while in my approch the
> mapped pages are never seen from the cycle recylcer and no cycle is spent on
> them. This mean in a pure fs read test with cache pollution going on, there's
> _no_way_ that classzone touches or notice _any_ mapped page in its path.

The "age==0" pages are basically just "pages we are ready to get rid
of right away".  The alternative to having that inactive list is to do
what we do today --- which is to throw away the pages immediately.
Having that extra list is simply giving pages a last chance before
evicting them.  It allows us to run reliably with fewer physically
free pages --- we can reap inactive pages with no IO so those pages
are as good as free for most purposes.

The alternative to moving pages to the inactive list would be freeing
them completely.  Moving a page back to the active list from inactive
is equivalent to avoiding a disk IO to pull in the page from backing
store.  It's supposed to be an optimisation to save physically
freeing things unless we really, really need to.  It is _not_ a
transition which recently referenced pages encounter.

> > the main reasons that their VM rocks is that it ages cache pages and
> > mapped pages at the same rate.  Having both on the same aging list
> > achieves that.  Separating the two raises the question of how to
> > balance the aging of cache vs. swap in a fair manner.
> 
> I believe increasing the aging in the unmapped cache should take care of that
> fine. (it was working pretty much fine also with only 1 bit of most
> frequently used aging plus the LRU order of the list)

Good.  One of the problems we always had in the past, though, was that
getting the relative aging of cache vs. vmas was easy if you had a
small set of test loads, but it was really, really hard to find a
balance that didn't show pathological behaviour in the worst cases.

> > > In classzone the aging exists too but it's _completly_ orthogonal to how
> > > rest of the VM works.
> > 
> > Umm, that applies to Rik's stuff too!
> 
> I may be overlooking something but where do you notice when a page
> gets unmapped from the last mapping and put it back into a place
> that can be reached from shrink_mmap (or whatever the cache recycler is)?

It doesn't --- that is part of the design.  The vm scanner propagates
referenced bits to the struct page, so the new shrink_mmap can do its
aging based on whether a page has been referenced at all recently, not
caring whether the reference was a VM reference or a page cache
reference.  That is done specifically to address the balance issue
between VM and filesystem memory pressure.

> Since none mapped page can in any way be freed by the cache recycler
> (you need to unmap it first from swap_out at the moment) if you
> should reach those pages from the cache recyler someway it means
> thus you're wasting CPU (I couldn't reach any mapped page from the
> cache recylcer in classzone and infact the mapped pages wasn't
> linked in any LRU at all to save even more CPU).

That's not how the current VM is supposed to work.  The cache scanner
isn't meant to reclaim pages --- it is meant to update the age
information on pages, which is not quite the same job.  If it finds
pages whose age becomes zero, those are shifted to the inactive list,
and once that list is large enough (ie. we have enough freeable
pages), it can give up.  The inactive list then gets physically freed
on demand.

The fact that we have a common loop in the VM for updating all age
information is central to the design, and requires the cache recycler
to pass over all those pages.  By doing it that way, rather than from
the VM scan, we can avoid one of the really bad properties of the old
2.0 aging code --- it means that for shared pages, we only do the
aging once per walk over the pages regardless of how many ptes refer
to the page.  This avoids the nasty worst-case behaviour of having a
recently-referenced page thrown out of memory just because there also
happened to be a lot of old, unused references to it too. 

Cheers,
 Stephen
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: the new VMt

2000-09-25 Thread Andrea Arcangeli

On Mon, Sep 25, 2000 at 11:51:39AM -0600, [EMAIL PROTECTED] wrote:
> It should probably be GFP_ATOMIC, if I understand the mm right. 

poll_wait is called from the f_op->poll callback from select just before
a sleep and since it's allowed to sleep too it should be a GFP_KERNEL
(not ATOMIC). Using GFP_ATOMIC where GFP_KERNEL can be used is a bug
and it can lead to failed allocations even while there's huge amount
of freeable/recyclable cache.

The reason it isn't GFP_USER but it's a GFP_KERNEL is because the memory
isn't allocated in userspace.

On a solid VM the only difference between GFP_USER and GFP_KERNEL happens to be
when the machine runs truly out of memory. In 2.4.x GFP_KERNEL should probably
be changed not to short the PF_MEMALLOC atomic queue when memory balancing
fails (then they would be equal).

> The algorithm for requesting a collection of reources and freeing all of them
>  on failure is simple, fast, and robust. 

Yes, I tend to like that style too because it's obviously safe and it obviously
can't dealdock during oom.

Andrea
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: lvm in 2.4.0-test9pre5

2000-09-25 Thread Andrea Arcangeli

On Mon, Sep 25, 2000 at 08:04:36PM +0200, Jan Niehusmann wrote:
> compiling md.c. But md.c is not needed for lvm, is it?

It is not needed, correct.

Andrea
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: lvm in 2.4.0-test9pre5

2000-09-25 Thread Jan Niehusmann

> But I don't think there is anything wrong with grouping RAID and LVM under
> the title "md", and just leaving it as such. 

It seems that the current setup makes it impossible to compile lvm without
compiling md.c. But md.c is not needed for lvm, is it?

I think we need two different config options now: One to enable the 
drivers/md/ directory, and one to compile md.c. 

Jan

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: the new VMt

2000-09-25 Thread Jamie Lokier

[EMAIL PROTECTED] wrote:
> > [EMAIL PROTECTED] wrote:
> > >walk = out;
> > > while(nfds > 0) {
> > > poll_table *tmp = (poll_table *) __get_free_page(GFP_KERNEL);
> > > if (!tmp) {
> > 
> > Shouldn't this be GFP_USER?  (Which would also conveniently fix the
> > problem Victor's pointing out...)
> 
> It should probably be GFP_ATOMIC, if I understand the mm right. 

Definitely not.  GFP_ATOMIC is reserved for things that really can't
swap or schedule right now.  Use GFP_ATOMIC indiscriminately and you'll
have to increase the number of atomic-allocatable pages.

> The algorithm for requesting a collection of reources and freeing all
> of them on failure is simple, fast, and robust.

Allocation is just as fast with GFP_KERNEL/USER, just less likely to
fail and less likely to break something else that really needs
GFP_ATOMIC allocations.

-- Jamie
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: the new VMt

2000-09-25 Thread yodaiken

On Mon, Sep 25, 2000 at 07:18:29PM +0200, Jamie Lokier wrote:
> [EMAIL PROTECTED] wrote:
> >walk = out;
> > while(nfds > 0) {
> > poll_table *tmp = (poll_table *) __get_free_page(GFP_KERNEL);
> > if (!tmp) {
> 
> Shouldn't this be GFP_USER?  (Which would also conveniently fix the
> problem Victor's pointing out...)

It should probably be GFP_ATOMIC, if I understand the mm right. 

The algorithm for requesting a collection of reources and freeing all of them
 on failure is simple, fast, and robust. 


  

-- 
-
Victor Yodaiken 
Finite State Machine Labs: The RTLinux Company.
 www.fsmlabs.com  www.rtlinux.com

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: the new VMt

2000-09-25 Thread Jeff Garzik

On Mon, 25 Sep 2000, Oliver Xymoron wrote:
> Sure about that? It's been a while, but I seem to recall NT enforcing a
> scatter-gather framework on all drivers because it only gave them virtual
> allocations. For the cheaper cards, the s-g was done by software issuing
> single span requests to the card.

The Matrox framegrabber guys use some API under NT to allocate
megabytes upon megabytes of contiguous memory for DMA.

Jeff



-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: the new VMt

2000-09-25 Thread Stephen C. Tweedie

Hi,

On Mon, Sep 25, 2000 at 05:51:49PM +0100, Alan Cox wrote:
> > > 2 active processes, no swap
> > > 
> > > #1#2
> > > kmalloc 32K   kmalloc 16K
> > > OKOK
> > > kmalloc 16K   kmalloc 32K
> > > block block
> > > 
> > 
> > ... and we get two wakeup_kswapd()s.  kswapd has PF_MEMALLOC and so is
> > able to eat memory which processes #1 and #2 are not allowed to touch.
> 
> 'no swap'

kswapd is perfectly capable of evicting clean pages and triggering any
necessary writeback of dirty filesystem data at this point, even if
there is no swap.  If there is truly nothing kswapd can do to recover
here, then we are truly OOM.  Otherwise, kswapd should be able to free
the required memory, providing that the PF_MEMALLOC flag allows it to
eat into a reserved set of free pages which nobody else can allocate
once physical free pages gets below a certain threshold.

--Stephen 
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: refill_inactive()

2000-09-25 Thread Stephen C. Tweedie

Hi,

On Mon, Sep 25, 2000 at 09:17:54AM -0700, Linus Torvalds wrote:
> 
> On Mon, 25 Sep 2000, Rik van Riel wrote:
> > 
> > Hmmm, doesn't GFP_BUFFER simply imply that we cannot
> > allocate new buffer heads to do IO with??
> 
> No.
> 
> New buffer heads would be ok - recursion is fine in theory, as long as it
> is bounded, and we might bound it some other way (I don't think we
> _should_ do recursion here due to the stack limit, but at least it's not
> a fundamental problem).

Right, but we still need to be careful --- we _were_ getting stack
overflows occassionally before the GFP_BUFFER semantics were set up to
prevent that recursion.

--Stephen
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: the new VMt

2000-09-25 Thread Oliver Xymoron

On Mon, 25 Sep 2000, Alan Cox wrote:

> > > > Stupidity has no limits...
> > > 
> > > Unfortunately its frequently wired into the hardware to save a few cents on
> > > scatter gather logic.
> > 
> > Since when hardware folks became exempt from the rule above? 128K is
> > almost tolerable, there were requests for 64 _mega_bytes...
> 
> Most cheap ass PCI hardware is built on the basis you can do linear 4Mb 
> allocations. There is a reason for this. You can do that 4Mb allocation on
> NT or Windows 9x

Sure about that? It's been a while, but I seem to recall NT enforcing a
scatter-gather framework on all drivers because it only gave them virtual
allocations. For the cheaper cards, the s-g was done by software issuing
single span requests to the card.

--
 "Love the dolphins," she advised him. "Write by W.A.S.T.E.." 

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



DPT SmartRAID V and Linux 2.4-

2000-09-25 Thread Nick Loman


So as far as I can tell, the i2o stack in Linux 2.4 doesn't support the
DPT SmartRAID V i2o controller.

Am I right in thinking then the only option is to combine DPT's drivers
into the kernel by hand? Is this feasible/easy to do, or better, has
someone already done it?

Thanks for your time,

Nick.


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: the new VMt

2000-09-25 Thread Stephen C. Tweedie

Hi,

On Mon, Sep 25, 2000 at 06:05:00PM +0200, Andrea Arcangeli wrote:
> On Mon, Sep 25, 2000 at 04:42:49PM +0100, Stephen C. Tweedie wrote:
> > Progress is made, clean pages are discarded and dirty ones queued for
> 
> How can you make progress if there isn't swap avaiable and all the
> freeable page/buffer cache is just been freed? The deadlock happens
> in OOM condition (not when we can make progress).

Agreed --- this assumes that all pinned, nonswappable pages are
subject to resource limiting to prevent them from exhausting the whole
of memory.  For things like page tables, that means we need
beancounter in place for us to be 100% safe.  For the no-swap case,
that requires an OOM killer.

The problem of avoiding filling memory with pinned pages is orthogonal
to the problem of managing the unpinned memory.  Both are obviously
required for a stable system.

Cheers,
 Stephen
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: [patch] vmfixes-2.4.0-test9-B2 - fixing deadlocks

2000-09-25 Thread Andrea Arcangeli

On Mon, Sep 25, 2000 at 07:21:48PM +0200, bert hubert wrote:
> Ok, sorry. Kernel development is proceding at a furious pace and I sometimes
> lose track. 

No problem :).

> I seem to remember that people were impressed by classzone, but that the
> implementation was very non-trivial and hard to grok. One of the reasons

Yes. Classzone is certainly more complex.

> There is no such thing as 'under swap'. There are lots of loadpatterns that
> will generate different kinds of memory pressure. Just calling it 'under
> swap' gives entirely the wrong impression. 

Sorry for not being precise. I meant one of those load patterns.

> 'rivaling virtual memory' code. Energies spent on Rik's VM will yield far
> higher differential improvement. 

I've spent efforts on classzone as well, and since I think it's way superior
approch I'll at least port it on top of 2.4.0-test9 as soon as time
permits to generate some number.

Andrea
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: [patch] vmfixes-2.4.0-test9-B2 - fixing deadlocks

2000-09-25 Thread bert hubert

> We're talking about shrink_[id]cache_memory change. That have _nothing_ to do
> with the VM changes that happened anywhere between test8 and test9-pre6.
> 
> You were talking about a different thing.

Ok, sorry. Kernel development is proceding at a furious pace and I sometimes
lose track. 

> I consider the current approch the wrong way to go and for this reason I
> prefer to spend time porting/improving classzone.

I seem to remember that people were impressed by classzone, but that the
implementation was very non-trivial and hard to grok. One of the reasons
Rik's vm made it (so far) is that it is pretty straightforward, with all the
marks of the right amount of simplicity. 

> In the meantime if you want to go back to 2.4.0-test1-ac22-class++ to give
> it a try under swap to see the difference in the behaviour and compare
> (Mike said it's still an order of magnitude faster with his "make -j30
> bzImage" testcase and he's always very reliable in his reports).

There is no such thing as 'under swap'. There are lots of loadpatterns that
will generate different kinds of memory pressure. Just calling it 'under
swap' gives entirely the wrong impression. 

Although Mike's compile is a relevant benchmark, every VM has cases for
which it excels, and cases for which it sucks. This appears to be a general
property of VM design. 

Given knowledge of the algorithms used, you can always dream up a situation
where it will fail. It's a bit like writing the halting problem algorithm.
Same goes the other way around, every VM will have a 'shining benchmark' -
hence the invention of benchmarketing.

We used to have a bad virtual memory implementation that was sometimes well
tuned so a lots of ordinary cases showed acceptable performance. We now have
an elegant VM that works reasonably well, but needs more tweaking.

What is the point of all this ranting? Think twice before embarking on
'rivaling virtual memory' code. Energies spent on Rik's VM will yield far
higher differential improvement. 

Regards,


bert hubert

-- 
PowerDNS Versatile DNS Services  
Trilab   The Technology People   
'SYN! .. SYN|ACK! .. ACK!' - the mating call of the internet

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: the new VMt

2000-09-25 Thread Jamie Lokier

[EMAIL PROTECTED] wrote:
>walk = out;
> while(nfds > 0) {
> poll_table *tmp = (poll_table *) __get_free_page(GFP_KERNEL);
> if (!tmp) {

Shouldn't this be GFP_USER?  (Which would also conveniently fix the
problem Victor's pointing out...)

-- Jamie
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: the new VMt

2000-09-25 Thread Andrea Arcangeli

On Mon, Sep 25, 2000 at 02:10:07PM -0300, Rik van Riel wrote:
> Not really. We could fix this by making the page freeing
> functions smarter and only free the pages we need.

That's what I proposed in first place infact.

To free large chunk of memory you may have to throw away lots of cache. We're
not freeing contigous cache as we do in 2.2.x.

Andrea
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



kernel BUG at ll_rw_blk.c:711!

2000-09-25 Thread Alvaro Lopes

Just got this one:

Sep 25 18:02:01 thecrypt kernel: kernel BUG at ll_rw_blk.c:711!
Sep 25 18:02:01 thecrypt kernel: invalid operand: 
Sep 25 18:02:01 thecrypt kernel: CPU:0
Sep 25 18:02:01 thecrypt kernel: EIP:0010:[__make_request+161/1444]
Sep 25 18:02:01 thecrypt kernel: EFLAGS: 00210282
Sep 25 18:02:01 thecrypt kernel: eax: 001f   ebx: c0f1cb00   ecx:
c39224a0   edx: 0006
Sep 25 18:02:01 thecrypt kernel: esi: c0f1cb00   edi: c02d0c20   ebp:
0001   esp: c4323ea8
Sep 25 18:02:01 thecrypt kernel: ds: 0018   es: 0018   ss: 0018
Sep 25 18:02:01 thecrypt kernel: Process communicator-sm (pid: 2815,
stackpage=c4323000)
Sep 25 18:02:01 thecrypt kernel: Stack: c021f1a5 c021f442 02c7
c0f1cb00 0001 000c  001e8480 
Sep 25 18:02:01 thecrypt kernel:c02d0c38 c02d0c30 
0002   c01563c2 00fe 
Sep 25 18:02:01 thecrypt kernel:c0156f81 c02d0c20 0001
c0f1cb00 c0f1cb00  0001 c4323f38 
Sep 25 18:02:01 thecrypt kernel: Call Trace: [tvecs+49309/72568]
[tvecs+49978/72568] [blk_get_queue+50/64] [generic_make_request+257/272]
[ll_rw_block+337/448] [writeout_one_page+57/80]
[do_buffer_fdatasync+72/124] 
Sep 25 18:02:01 thecrypt kernel:[generic_buffer_fdatasync+29/56]
[writeout_one_page+0/80] [ext2_sync_file+47/164] [sys_write+139/160]
[sys_fsync+73/104] [system_call+51/56] 
Sep 25 18:02:01 thecrypt kernel: Code: 0f 0b 83 c4 0c 0f b6 46 15 0f b7
4e 14 8b 14 85 a0 52 2c c0 

Kernel is 2.4.0-test8
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: the new VMt

2000-09-25 Thread yodaiken

On Mon, Sep 25, 2000 at 04:42:49PM +0100, Stephen C. Tweedie wrote:
> Hi,
> 
> On Mon, Sep 25, 2000 at 04:16:56PM +0100, Alan Cox wrote:
> > 
> > Unless Im missing something here think about this case
> > 
> > 2 active processes, no swap
> > 
> > #1  #2
> > kmalloc 32K kmalloc 16K
> > OK  OK
> > kmalloc 16K kmalloc 32K
> > block   block
> > 
> 
> ... and we get two wakeup_kswapd()s.  kswapd has PF_MEMALLOC and so is
> able to eat memory which processes #1 and #2 are not allowed to touch.
> Progress is made, clean pages are discarded and dirty ones queued for
> write, memory becomes free again and the world is a better place.
> 
> Or so goes the theory, at least.

from fs/select.c

   walk = out;
while(nfds > 0) {
poll_table *tmp = (poll_table *) __get_free_page(GFP_KERNEL);
if (!tmp) {
while(out != NULL) {
tmp = out->next;
free_page((unsigned long)out);
out = tmp;
}
return NULL;
}
tmp->nr = 0;
tmp->entry = (struct poll_table_entry *)(tmp + 1);
tmp->next = NULL;
walk->next = tmp;
walk = tmp;
nfds -=__MAX_POLL_TABLE_ENTRIES;
}


> 
> --Stephen
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [EMAIL PROTECTED]
> Please read the FAQ at http://www.tux.org/lkml/

-- 
-
Victor Yodaiken 
Finite State Machine Labs: The RTLinux Company.
 www.fsmlabs.com  www.rtlinux.com

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: [patch] vmfixes-2.4.0-test9-B2

2000-09-25 Thread Andrea Arcangeli

On Mon, Sep 25, 2000 at 07:05:02PM +0200, Ingo Molnar wrote:
> yep - and Jens i'm sorry about the outburst. Until a bug is found it's
> unrealistic to blame anything.

I think the only bug maybe to blame in the elevator is the EXCLUSIVE wakeup
thing (and I've not benchmarked it alone to see if it makes any real world
performance difference but for sure its behaviour wasn't intentional). Anything
else related to the elevator internals should perform better than the old
elevator (aka the 2.2.15 one). The new elevator ordering algorithm returns me
much better numbers than the CSCAN one with tiobench. Also consider the latency
control at the moment is completly disabled as default so there are no barriers
unless you change that with elvtune.

Also I'm using -r 250 and -w 500 and it doesn't change really anything in the
numbers compared to too big values (but it fixes the starvation problem).

Andrea
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: the new VMt

2000-09-25 Thread Alan Cox

> > > Stupidity has no limits...
> > 
> > Unfortunately its frequently wired into the hardware to save a few cents on
> > scatter gather logic.
> 
> Since when hardware folks became exempt from the rule above? 128K is
> almost tolerable, there were requests for 64 _mega_bytes...

Most cheap ass PCI hardware is built on the basis you can do linear 4Mb 
allocations. There is a reason for this. You can do that 4Mb allocation on
NT or Windows 9x


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: the new VMt

2000-09-25 Thread Andrea Arcangeli

On Mon, Sep 25, 2000 at 07:03:46PM +0200, Ingo Molnar wrote:
> [..] __GFP_SOFT solves this all very nicely [..]

s/very nicely/throwing away lots of useful cache for no one good reason/

Andrea
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: the new VMt

2000-09-25 Thread Andrea Arcangeli

On Mon, Sep 25, 2000 at 09:49:46AM -0700, Linus Torvalds wrote:
> [..] I
> don't think the balancing has to take the order of the allocation into
> account [..]

Why do you prefer to throw away most of the cache (potentially at fork time)
instead of freeing only the few contigous bits that we need?

Andrea
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: the new VMt

2000-09-25 Thread Alexander Viro



On Mon, 25 Sep 2000, Alan Cox wrote:

> > > yep, i agree. I'm not sure what the biggest allocation is, some drivers
> > > might use megabytes or contiguous RAM?
> > 
> > Stupidity has no limits...
> 
> Unfortunately its frequently wired into the hardware to save a few cents on
> scatter gather logic.

Since when hardware folks became exempt from the rule above? 128K is
almost tolerable, there were requests for 64 _mega_bytes...

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: the new VMt

2000-09-25 Thread Alan Cox

> > yep, i agree. I'm not sure what the biggest allocation is, some drivers
> > might use megabytes or contiguous RAM?
> 
> Stupidity has no limits...

Unfortunately its frequently wired into the hardware to save a few cents on
scatter gather logic.

We need 128K blocks for sound DMA buffers and most sound cards they need to
be linear (but not the newer ones thankfully). Some video capture hardware
needs 4Mb but that needs to use bootmem (in 2.2 they use bigmem hacks)

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: [patch] vmfixes-2.4.0-test9-B2

2000-09-25 Thread Ingo Molnar


On Mon, 25 Sep 2000, Linus Torvalds wrote:

> Blaming the elevator is unfair and unrealistic. [...]

yep - and Jens i'm sorry about the outburst. Until a bug is found it's
unrealistic to blame anything.

Ingo

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



  1   2   3   4   5   >