date:20050823

On Mon, Aug 22, 2005 at 06:20:56PM +0200, Adrian Bunk wrote:
 I didn't find any modular usage in the kernel.

And there shouldn't be one either.  This is really just for some syscalls,
everything else should use get_super based on a struct block_device. If
there's any caller using this wrongly in out of tree modules they can
be switched to bdget + get_super trivially (fixing their code would be
even better).

 Signed-off-by: Adrian Bunk [EMAIL PROTECTED]
 
 ---
 
 This patch was already sent on:
 - 30 May 2005
 - 13 May 2005
 - 1 May 2005
 - 23 Apr 2005
 
 --- linux-2.6.12-rc2-mm3-full/fs/super.c.old  2005-04-23 02:45:59.0 
 +0200
 +++ linux-2.6.12-rc2-mm3-full/fs/super.c  2005-04-23 02:46:07.0 
 +0200
 @@ -467,8 +467,6 @@
   return NULL;
  }
  
 -EXPORT_SYMBOL(user_get_super);
 -
  asmlinkage long sys_ustat(unsigned dev, struct ustat __user * ubuf)
  {
  struct super_block *s;
 
 -
 To unsubscribe from this list: send the line unsubscribe linux-kernel in
 the body of a message to [EMAIL PROTECTED]
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 Please read the FAQ at  http://www.tux.org/lkml/
---end quoted text---
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 0/2] external interrupts

On Mon, Aug 22, 2005 at 02:43:30PM -0700, Andrew Morton wrote:
  Laughter was not wholly unexpected, though I wasn't joking.  I'm trying
  to be realistic about the lifetime of any given hardware, and IOC4 is
  several years old at this point.  Couple that with a sincere desire to
  preserve application source compatability when (not if) new hardware
  appears, and an abstraction layer seemed to be a logical choice.  I'm
  more than happy to discuss problems in the abstraction layer's interface
  and make appropriate changes -- I'm nothing if not obliging.
 
 Having an abstraction layer for a single client driver does seem a bit
 pointless.  It would become more pointful if other client drivers were to
 pop up.

The Octane port will hopefully soon support external inteerupts on the
ioc3, so this does make sense.

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Linux AIO status todo

2005-08-23 Thread Jakub Jelinek

On Tue, Aug 23, 2005 at 01:14:38PM +0530, Suparna Bhattacharya wrote:

   2. No support for propagating IO completion events to user space
  threads using RT signals. User threads need to poll the completion
  queue using io_getevents. POSIX specifies that when an AIO
  request completes, a signal can be delivered to the application
  to indicate the completion of the IO.

POSIX AIO needs to handle SIGEV_NONE, SIGEV_SIGNAL and SIGEV_THREAD
notification.  Obviously kernel shouldn't create threads for SIGEV_THREAD
itself, as kernel shouldn't hardcode all the implementation details how a
thread can be created.  But it would be good if AIO signalling e.g. handled
both SIGEV_SIGNAL and SIGEV_SIGNAL | SIGEV_THREAD_ID, with the same usage as
e.g. timer_* syscalls.  If kernel makes sure SI_ASYNCIO si_code is set in
the notification signal siginfos, glibc could even use just one helper
thread for timer_*/[al]io_* and maybe in the future other SIGEV_THREAD 
notification.

Jakub
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: IRQ problem with PCMCIA

2005-08-23 Thread Alan Cox

On Maw, 2005-08-23 at 09:49 +0200, Erik Mouw wrote:
 Is there any place where we can get your current patches?

Which ones - the PATA IDE ones are in 2.6.11-ac, a subset in Fedora
(other changes in the core IDE code make forward porting stuff for
hotplug really tricky past 2.6.11).

The SATA ones I can certainly put up if there is interest. I don't want
to put them somewhere too available yet because this right now is stuff
you only want to use under controlled circumstances for development
until both they and the core SATA layer have some improvements.

Alan

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: mass tulip_stop_rxtx() failed, network stops

2005-08-23 Thread Tomasz Chmielewski


jerome lacoste schrieb:

On 8/23/05, Tomasz Chmielewski [EMAIL PROTECTED] wrote:


(...)


We are running four more machines like that, the only difference is the
kernel they are running (2.6.11.4).

On some of them, there are serious problems with a network, and they
usually happen when the traffic is bigger than usual (i.e., some big
software deployment to several workstations, remote backup, etc.).

The syslog is then full of entries like that:

Aug 21 04:04:30 SERVER-B-HS kernel: NETDEV WATCHDOG: eth0: transmit
timed out
Aug 21 04:04:30 SERVER-B-HS kernel: :00:06.0: tulip_stop_rxtx() failed



I am seeing thousands of tulip_stop_rxtx() failed messages as well
with 2.6.11. No regular network failure though.

See http://kerneltrap.org/mailarchive/1/message/110291/flat


This may have something to do with this patch, introduced with 2.6.10 
(see the ChangeLog-2.6.10).
It would explain why I had no problems on ~20 machines with 2.6.8.1 
kernel, and I have this issue on the machines with 2.6.11.5 kernel.




[PATCH] tulip: make tulip_stop_rxtx() wait for DMA to fully stop

From: John W. Linville [EMAIL PROTECTED]

tulip_stop_rxtx() doesn't wait for DMA to fully stop like the function
call name implies.

This was submitted through my employer -- I am not the original author 
of this	patch.  However, I passed it by Jeff Garizk and he expressed 
interest in having it upstream.



--
Tomek
http://wpkg.org
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [Alsa-devel] [2.6 patch] sound/core/memalloc.c: fix PROC_FS=n compilation

2005-08-23 Thread Takashi Iwai

At Tue, 23 Aug 2005 03:24:25 +0200,
Adrian Bunk wrote:
 
 On Mon, Aug 22, 2005 at 02:41:07PM +0200, Takashi Iwai wrote:
 ...
  I think the below is simpler.
 
 Looks good.

OK, it's now on ALSA tree.

Thanks.


Takashi
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] Add MCE resume under ia32

Hi!

 It's widely seen a MCE non-fatal error reported after resume. It seems
 MCE resume is lacked under ia32. This patch tries to fix the gap.

Well, you patch seems like missing piece of puzzle, but:

a) we probably want to do it for x86-64, too, and 

b)

 diff -puN arch/i386/power/cpu.c~mcheck_resume arch/i386/power/cpu.c
 --- linux-2.6.13-rc6/arch/i386/power/cpu.c~mcheck_resume  2005-08-23 
 09:32:13.054008584 +0800
 +++ linux-2.6.13-rc6-root/arch/i386/power/cpu.c   2005-08-23 
 09:41:54.992540480 +0800
 @@ -104,6 +104,8 @@ static void fix_processor_context(void)
  
  }
  
 +extern void mcheck_init(struct cpuinfo_x86 *c);
 +
  void __restore_processor_state(struct saved_context *ctxt)
  {
   /*


this should go to some header file and most importantly

 @@ -138,6 +140,9 @@ void __restore_processor_state(struct sa
   fix_processor_context();
   do_fpu_end();
   mtrr_ap_init();
 +#ifdef CONFIG_X86_MCE
 + mcheck_init(boot_cpu_data);
 +#endif
  }

c) can't we register MCEs like some kind of system device so that this
kind of hooks is not neccessary?
Pavel
-- 
if you have sharp zaurus hardware you don't need... you know my address
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: APIC version and 8-bit APIC IDs

2005-08-23 Thread Maciej W. Rozycki

On Mon, 22 Aug 2005, Martin Wilck wrote:

 It's a scalable system where multiple boards may be combined. Anyway, I see
 nothing in the specs that says you must start counting CPUs from zero.

 Well, Intel's Multiprocessor Specification mandates that (see section 
3.6.1 and also the compliance list in Appendix C).  I does not mandate 
local APIC IDs to be consecutive though.

  Maciej
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: kernel module seg fault

2005-08-23 Thread linux-os \(Dick Johnson\)


On Tue, 23 Aug 2005, manomugdha biswas wrote:

 Hi,
 I have written a kernel module and I can load (insmod)
 it without any error. But when i run my module it gets
 seg fault at interruptible_sleep_on_timeout();

 I have used this function in the following way:

 DECLARE_WAIT_QUEUE_HEAD(wq);
 init_waitqueue_head(wq);
 interruptible_sleep_on_timeout(wq, 2);

 I am using redhat version 9.0 and kernel version
 2.4.20-8.
 Could you please give some light on this issue?

 Manomugdha Biswas

seg fault??  You meen you get a kernel panic? Please
show us what it says. Note you can't sleep with a spin-lock
held.


Cheers,
Dick Johnson
Penguin : Linux version 2.6.12.5 on an i686 machine (5537.79 BogoMips).
Warning : 98.36% of all statistics are fiction.
.
I apologize for the following. I tried to kill it with the above dot :


The information transmitted in this message is confidential and may be 
privileged.  Any review, retransmission, dissemination, or other use of this 
information by persons or entities other than the intended recipient is 
prohibited.  If you are not the intended recipient, please notify Analogic 
Corporation immediately - by replying to this message or by sending an email to 
[EMAIL PROTECTED] - and destroy all copies of this information, including any 
attachments, without reading or disclosing them.

Thank you.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

patch for compiling ppc without pmu

2005-08-23 Thread Johannes Berg

Hi,

This patch seems to be required to compile 2.6.13-rc6 for ppc configured
without PMU.

Apologies if it is already known, I haven't found anything like this
quickly.

Signed-Off-By: Johannes Berg [EMAIL PROTECTED]

--- linux-2.6.13-rc6.orig/arch/ppc/platforms/pmac_time.c2005-08-23 
12:14:37.689485664 +0200
+++ linux-2.6.13-rc6/arch/ppc/platforms/pmac_time.c 2005-08-23 
12:14:37.689485664 +0200
@@ -251,7 +251,7 @@
struct device_node *cpu;
unsigned int freq, *fp;
 
-#ifdef CONFIG_PM  CONFIG_ADB_PMU
+#if defined(CONFIG_PM)  defined(CONFIG_ADB_PMU)
pmu_register_sleep_notifier(time_sleep_notifier);
 #endif /* CONFIG_PM */
 



signature.asc
Description: This is a digitally signed message part

Re: skge missing ifdefs.

2005-08-23 Thread Roman Zippel

Hi,

On Tue, 23 Aug 2005, Al Viro wrote:

 As for your s/thread_info/stack/ - I don't believe it's doable in mainline
 right now.  It's definitely separate from m68k merge and should not be
 mixed into it.  Moreover, mandatory changes to every platform arch-specific
 code over basically cosmetic issue (renaming a field of task_struct) at
 this point are going to be gratitious PITA for every architecture with
 out-of-tree development.  And m68k folks, of all people, should know what
 fun it is.

No, I don't know it. Sometimes merging can be tricky, but then I check the 
original diff and apply it manually. What I'm planning involves no logical 
changes, so it would be an absolute no-brainer to merge. It's the logical 
changes that may even compile normally, that can be the a real PITA.

 When folks start using task_thread_info() in arch/* (i.e. by 2.6.1[45]) the
 size of that delta will go down big way and it will be less painful.  Until
 then...  Not a good idea.

I already did the complete conversion (and I did it forward and backward 
to be sure the result is the same), so I dont see the problem to merge it 
in 2.6.13. The final removal of the thread_info field can happen in 2.6.14 
and any missed changes in external trees are trivially fixable.

bye, Roman
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: sched_yield() makes OpenLDAP slow

2005-08-23 Thread linux-os \(Dick Johnson\)


On Mon, 22 Aug 2005, Robert Hancock wrote:

 linux-os (Dick Johnson) wrote:
 I reported thet sched_yield() wasn't working (at least as expected)
 back in March of 2004.

  for(;;)
  sched_yield();

 ... takes 100% CPU time as reported by `top`. It should take
 practically 0. Somebody said that this was because `top` was
 broken, others said that it was because I didn't know how to
 code. Nevertheless, the problem was not fixed, even after
 schedular changes were made for the current version.

 This is what I would expect if run on an otherwise idle machine.
 sched_yield just puts you at the back of the line for runnable
 processes, it doesn't magically cause you to go to sleep somehow.


When a kernel build is occurring??? Plus `top` itself It damn
well sleep while giving up the CPU. If it doesn't it's broken.

 --

 Robert Hancock  Saskatoon, SK, Canada
 To email, remove nospam from [EMAIL PROTECTED]
 Home Page: http://www.roberthancock.com/



Cheers,
Dick Johnson
Penguin : Linux version 2.6.12.5 on an i686 machine (5537.79 BogoMips).
Warning : 98.36% of all statistics are fiction.
.
I apologize for the following. I tried to kill it with the above dot :


The information transmitted in this message is confidential and may be 
privileged.  Any review, retransmission, dissemination, or other use of this 
information by persons or entities other than the intended recipient is 
prohibited.  If you are not the intended recipient, please notify Analogic 
Corporation immediately - by replying to this message or by sending an email to 
[EMAIL PROTECTED] - and destroy all copies of this information, including any 
attachments, without reading or disclosing them.

Thank you.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 1/2] New system call, unshare

2005-08-23 Thread Al Viro

On Mon, Aug 08, 2005 at 03:46:06PM +0100, Alan Cox wrote:
 On Llu, 2005-08-08 at 09:33 -0400, Janak Desai wrote:
  
  [PATCH 1/2] unshare system call: System Call handler function sys_unshare
 
 
 Given the complexity of the kernel code involved and the obscurity of
 the functionality why not just do another clone() in userspace to
 unshare the things you want to unshare and then _exit the parent ?

Because you want to keep children?  Because you don't want to deal with
the implications for sessions/groups/etc.?

FWIW, syscall makes sense.  It is a valid primitive and the only reason
to keep it out of clone() (i.e. not making it just another flag to clone())
is that clone() is already cluttered _and_ uses bad calling conventions
for that stuff (I want to retain list rather than I want private list).
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 0/3] New system call, unshare

2005-08-23 Thread Al Viro

On Wed, Aug 10, 2005 at 04:08:31PM +0200, Florian Weimer wrote:
 * Janak Desai:
 
  With unshare, namespace setup can be done using PAM session
  management functions without patching individual commands.
 
 I don't think it's a good idea to use security-critical code well
 without its original specification.  Clearly the current situation
 sucks, but this is mainly a lack of PAM functionality, IMHO.

Eh?  We are talking about a primitive that has far more uses than
PAM.  This is a missing piece of the stuff done by clone() and fork():
each task is a virtual machine with sharable components.  We can
get a copy of machine  with arbitrary set of components replaced with
private copies.  That's what clone() and fork() do.  The thing missing
from that set is taking a component (VM, descriptors, etc.) of process
itself and making it private.  The same thing we do on fork(), but
without creating a new process.

FWIW, I'm OK with that.  IIRC, Linus ACKed the concept some time ago.
PAM is one obvious use, but there's are other situations where the lack
of that primitive is inconvenient...
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] fix send_sigqueue() vs thread exit race

2005-08-23 Thread Thomas Gleixner

On Mon, 2005-08-22 at 20:45 +0400, Oleg Nesterov wrote:
 Thomas Gleixner wrote:
 Ok, exit_itimers()-itimer_delete() called when the last thread exits
 or does exec.
 
 kernel/posix-timers.c:common_timer_del() calls del_timer_sync(), after
 that nobody can access this timer, so we don't need to lock timer-it_lock
 at all in this case. No lock - no deadlock.

It still deadlocks:

CPU 0   CPU 1
write_lock(tasklist_lock); 
__exit_signal()
timer expires
base-running_timer = timer
  send_group_sigqueue()
   read_lock(tasklist_lock();
exit_itimers()
  del_timer_sync(timer)
 waits for ever because   waits for ever on tasklist_lock
 base-running_timer == timer


I still think the last patch I sent is still necessary.

 But I know nothing about kernel/posix-cpu-timers.c, I doubt it will work
 for posix_cpu_timer_del(). I don't have time to study posix-cpu-timers now.
 However, I see that __exit_signal() calls posix_cpu_timers_exit_xxx(), so
 may be it can work?
 
380  int posix_cpu_timer_del(struct k_itimer *timer)
381  {
382  struct task_struct *p = timer-it.cpu.task;
383
384  if (timer-it.cpu.firing)
385  return TIMER_RETRY;
386
387  if (unlikely(p == NULL))
388  return 0;
389
390  if (!list_empty(timer-it.cpu.entry)) {
391  read_lock(tasklist_lock);
 
 Surely, it should be impossible to happen when process exists, otherwise
 it would deadlock immediately, we did write_lock(tasklist).
 
 Thomas, do you know something about posix-cpu-timers.c?

Not much. I look into this 

tglx


-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 2.6.13-rc6] cpu_exclusive sched domains on partial nodes temp fix


* Paul Jackson [EMAIL PROTECTED] wrote:

   /*
 +  * Hack to avoid 2.6.13 partial node dynamic sched domain bug.
 +  * Require the 'cpu_exclusive' cpuset to include all (or none)
 +  * of the CPUs on each node, or return w/o changing sched domains.
 +  * Remove this hack when dynamic sched domains fixed.
 +  */
 + {
 + int i, j;
 +
 + for_each_cpu_mask(i, cur-cpus_allowed) {
 + for_each_cpu_mask(j, node_to_cpumask(cpu_to_node(i))) {
 + if (!cpu_isset(j, cur-cpus_allowed))
 + return;
 + }
 + }
 + }
 +

certainly looks acceptable from a scheduler POV.

Acked-by: Ingo Molnar [EMAIL PROTECTED]

Ingo
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 2.6.13-rc6] cpu_exclusive sched domains on partial nodes temp fix

2005-08-23 Thread Paul Jackson

If Dinakar, Hawkes and Nick concur (and no one else complains too
loud) then the following should go into 2.6.13, to avoid the potential
kernel oops that Hawkes reported in Dinakar's feature to allow user
control of dynamic sched domain placement using cpu_exclusive cpusets.

This patch keeps the kernel/cpuset.c routine update_cpu_domains()
from invoking the sched.c routine partition_sched_domains() if the
cpuset in question doesn't fall on node boundaries.

I have boot tested this on an SN2, and with the help of a couple of
ad hoc printk's, determined that it does indeed avoid calling the
partition_sched_domains() routine on partial nodes.

I did not directly verify that this avoids setting up bogus sched
domains or avoids the oops that Hawkes saw.

Obviously, if the above named parties decide to take some other path,
then this patch should be discarded.  I submit this patch under the
expectation that Hawkes and others fixes to support sched domains not
on node boundaries will go into *-mm and 2.6.14.  Do not include the
following patch in *-mm or 2.6.14 versions which have the real sched
domain fixes.

This patch imposes a silent artificial constraint on which cpusets
can be used to define dynamic sched domains.

This patch should allow proceeding with this new feature in 2.6.13 for
the configurations in which it is useful (node alligned sched domains)
while avoiding trying to setup sched domains in the less useful cases
that can cause the kernel corruption and oops.

Signed-off-by: Paul Jackson [EMAIL PROTECTED]

Index: linux-2.6.13-cpuset-mempolicy-migrate/kernel/cpuset.c
===
--- linux-2.6.13-cpuset-mempolicy-migrate.orig/kernel/cpuset.c
+++ linux-2.6.13-cpuset-mempolicy-migrate/kernel/cpuset.c
@@ -636,6 +636,23 @@ static void update_cpu_domains(struct cp
return;
 
/*
+* Hack to avoid 2.6.13 partial node dynamic sched domain bug.
+* Require the 'cpu_exclusive' cpuset to include all (or none)
+* of the CPUs on each node, or return w/o changing sched domains.
+* Remove this hack when dynamic sched domains fixed.
+*/
+   {
+   int i, j;
+
+   for_each_cpu_mask(i, cur-cpus_allowed) {
+   for_each_cpu_mask(j, node_to_cpumask(cpu_to_node(i))) {
+   if (!cpu_isset(j, cur-cpus_allowed))
+   return;
+   }
+   }
+   }
+
+   /*
 * Get all cpus from parent's cpus_allowed not part of exclusive
 * children
 */

-- 
  I won't rest till it's the best ...
  Programmer, Linux Scalability
  Paul Jackson [EMAIL PROTECTED] 1.650.933.1373
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] fix send_sigqueue() vs thread exit race

2005-08-23 Thread Thomas Gleixner

On Mon, 2005-08-22 at 20:45 +0400, Oleg Nesterov wrote:
 But I know nothing about kernel/posix-cpu-timers.c, I doubt it will work
 for posix_cpu_timer_del(). I don't have time to study posix-cpu-timers now.
 However, I see that __exit_signal() calls posix_cpu_timers_exit_xxx(), so
 may be it can work?

timer-it.cpu.task is set to NULL by posix_cpu_timers_exit(), so the
code in posix_cpu_timer_del returns before accessing tasklist_lock.


The exit functions do not take any locks, but it is not necessary
there. 

posix_run_cpu_timers(p) is called with p=current() and we have
interrupts disabled, so the timer interrupt can not run on this CPU. The
current exiting process can not run at the same time on a different CPU,
so no race and lockup possible here.

tglx



-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

usb oops in 2.6.13-rc6-mm2

2005-08-23 Thread Jens Axboe

Hi,

usbcore: deregistering driver usb-storage
usb 1-1: USB disconnect, address 3
Unable to handle kernel NULL pointer dereference at 
RIP: 
803cf140{_spin_lock+0}
PGD 1c303067 PUD 1c304067 PMD 0 
Oops: 0002 [1] SMP 
CPU 0 
Modules linked in: nls_iso8859_1 nls_cp437 vfat fat nls_base ide_cd
cdrom
Pid: 80, comm: khubd Not tainted 2.6.13-rc6-mm2
RIP: 0010:[803cf140] 803cf140{_spin_lock+0}
RSP: 0018:81001fc75d80  EFLAGS: 00010296
RAX: 81001c08cdb0 RBX: 810019f5f8f8 RCX: 81001c4b14e8
RDX: 0070 RSI: 8040cfcc RDI: 
RBP: 810019f5f8a0 R08:  R09: 
R10: 0001 R11: 8018ad27 R12: 
R13: 810001a23c20 R14: 810001a23c00 R15: 0100
FS:  2ade8b00() GS:80612880()
knlGS:61ad4bb0
CS:  0010 DS: 0018 ES: 0018 CR0: 8005003b
CR2:  CR3: 1c302000 CR4: 06e0
Process khubd (pid: 80, threadinfo 81001fc74000, task
8100019f4e80)
Stack: 803cd130 80500aa0 810019f5f980
80500aa0 
   802a3263 810019f5f9e8 810019f5f8a0
810019f5f8a0 
   802a34f2 80500880 
Call Trace:803cd130{klist_remove+21}
802a3263{__device_release_driver+75}
   802a34f2{device_release_driver+39}
802a2db7{bus_remove_device+146}
   802a1f75{device_del+55}
802a1fbc{device_unregister+9}
   802ff51c{hub_thread+900}
80145e70{autoremove_wake_function+0}
   802ff198{hub_thread+0}
80145a70{keventd_create_kthread+0}
   80145c9e{kthread+203}
8012e3ae{schedule_tail+57}
   8010e6ce{child_rip+8}
80145a70{keventd_create_kthread+0}
   80145bd3{kthread+0} 8010e6c6{child_rip+0}
   

Code: f0 fe 0f 79 09 f3 90 80 3f 00 7e f9 eb f2 c3 f0 ff 0f 8b 07 
RIP 803cf140{_spin_lock+0} RSP 81001fc75d80
CR2: 

Just got this oops removing a usb-storage managed usb device.
usb-storage had been manually removed (as you can see from the kernel
message), a few seconds later I removed power from the device and the
oopsed happened right then.

-- 
Jens Axboe

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

new qla2xxx driver breaks SAN setup with 2 controllers

2005-08-23 Thread Frederik Schueler

hello,

we are experiencing problems with the new qlogic driver in 2.6.12 on
a set of servers with qla2310 HBAs.

The problem is as follows:

The Infotrend storage array we are using has two controllers, each
of them has two virtual discs with a couple of partitions exported
as shared storage.

The controllers are linked inside of the storage box, each controller
has one qlogic fabric switch attached, and half of the servers are
connected to the lefthand switch, the other half is connected to the
righthand switch.

Now, with the qlogic driver in 2.6.11.12, we can access all shares
on both controllers from every server, while the new driver allows
only access to the respective controller where the switch is attached
to directly, thus depriving the servers of half of it's shared
storage devices.

Example: on server s05, we have a boot device (lun 3 on primary
controller), and 2 shared storages (lun 9 on primary, lun 10 on
secondary controller).

With 2.6.11.12, this looks as follows:

s05:~# cat /proc/scsi/scsi
Attached devices:
Host: scsi0 Channel: 00 Id: 00 Lun: 03
  Vendor: IFT  Model: A16F-R1211   Rev: 334B
  Type:   Direct-AccessANSI SCSI revision: 03
Host: scsi0 Channel: 00 Id: 00 Lun: 09
  Vendor: IFT  Model: A16F-R1211   Rev: 334B
  Type:   Direct-AccessANSI SCSI revision: 03
Host: scsi0 Channel: 00 Id: 01 Lun: 10
  Vendor: IFT  Model: A16F-R1211   Rev: 334B
  Type:   Direct-AccessANSI SCSI revision: 03


and the driver sees everything:

s05:~# cat /proc/scsi/qla2xxx/0
QLogic PCI to Fibre Channel Host Adapter for QLA2310:
Firmware version 3.03.08 IPX, Driver version 8.00.02b4-k
ISP: ISP2300, Serial# R74545
Request Queue = 0xcf94, Response Queue = 0xcf98
Request Queue count = 2048, Response Queue count = 512
Total number of active commands = 0
Total number of interrupts = 1117762
Device queue depth = 0x20
Number of free request entries = 964
Number of mailbox timeouts = 0
Number of ISP aborts = 0
Number of loop resyncs = 0
Number of retries for empty slots = 0
Number of reqs in pending_q= 0, retry_q= 0, done_q= 0, scsi_retry_q= 0
Host adapter:loop state = READY, flags = 0x1a03
Dpc flags = 0x0
MBX flags = 0x0
Link down Timeout = 030
Port down retry = 030
Login retry count = 030
Commands retried with dropped frame(s) = 0
Product ID = 4953 5020 2020 0001


SCSI Device Information:
scsi-qla0-adapter-node=20e08b1bd113;
scsi-qla0-adapter-port=21e08b1bd113;
scsi-qla0-target-0=21d02382;
scsi-qla0-target-1=21d02362;

SCSI LUN Information:
(Id:Lun)  * - indicates lun is not registered with the OS.
( 0: 0): Total reqs 2, Pending reqs 0, flags 0x0*, 0:0:81 00
( 0: 3): Total reqs 470693, Pending reqs 0, flags 0x0, 0:0:81 00
( 0: 9): Total reqs 227717, Pending reqs 0, flags 0x0, 0:0:81 00
( 0:11): Total reqs 0, Pending reqs 0, flags 0x0*, 0:0:81 00
( 0:13): Total reqs 0, Pending reqs 0, flags 0x0*, 0:0:81 00
( 1: 0): Total reqs 2, Pending reqs 0, flags 0x0*, 0:0:82 00
( 1:10): Total reqs 12, Pending reqs 0, flags 0x0, 0:0:82 00
( 1:12): Total reqs 0, Pending reqs 0, flags 0x0*, 0:0:82 00
( 1:14): Total reqs 0, Pending reqs 0, flags 0x0*, 0:0:82 00


while on 2.6.12.5 and 2.6.13-rc6 it looks like this:

sm05:~# scsiadd -a 0 0 0 9
Attached devices:
Host: scsi0 Channel: 00 Id: 00 Lun: 03
  Vendor: IFT  Model: A16F-R1211   Rev: 334B
  Type:   Direct-AccessANSI SCSI revision: 03
Host: scsi0 Channel: 00 Id: 00 Lun: 09
  Vendor: IFT  Model: A16F-R1211   Rev: 334B
  Type:   Direct-AccessANSI SCSI revision: 03


sm05:~# scsiadd -a 0 0 1 10
Attached devices:
Host: scsi0 Channel: 00 Id: 00 Lun: 03
  Vendor: IFT  Model: A16F-R1211   Rev: 334B
  Type:   Direct-AccessANSI SCSI revision: 03
Host: scsi0 Channel: 00 Id: 00 Lun: 09
  Vendor: IFT  Model: A16F-R1211   Rev: 334B
  Type:   Direct-AccessANSI SCSI revision: 03


unfortunately, the proc interface was removed:

s05:/sys/devices/pci:00/:00:02.0/:01:00.0/:02:02.0/host0#
find .
.
./rport-0:0-1
./rport-0:0-1/power
./rport-0:0-1/power/state
./rport-0:0-0
./rport-0:0-0/target0:0:0
./rport-0:0-0/target0:0:0/0:0:0:9
./rport-0:0-0/target0:0:0/0:0:0:9/ioerr_cnt
./rport-0:0-0/target0:0:0/0:0:0:9/iodone_cnt
./rport-0:0-0/target0:0:0/0:0:0:9/iorequest_cnt
./rport-0:0-0/target0:0:0/0:0:0:9/iocounterbits
./rport-0:0-0/target0:0:0/0:0:0:9/timeout
./rport-0:0-0/target0:0:0/0:0:0:9/state
./rport-0:0-0/target0:0:0/0:0:0:9/delete
./rport-0:0-0/target0:0:0/0:0:0:9/rescan
./rport-0:0-0/target0:0:0/0:0:0:9/rev
./rport-0:0-0/target0:0:0/0:0:0:9/model
./rport-0:0-0/target0:0:0/0:0:0:9/vendor
./rport-0:0-0/target0:0:0/0:0:0:9/scsi_level
./rport-0:0-0/target0:0:0/0:0:0:9/type
./rport-0:0-0/target0:0:0/0:0:0:9/queue_type
./rport-0:0-0/target0:0:0/0:0:0:9/queue_depth
./rport-0:0-0/target0:0:0/0:0:0:9/device_blocked

Re: 2.6.13-rc6-mm2

2005-08-23 Thread Reuben Farrelly


Hi,

On 23/08/2005 4:30 p.m., Andrew Morton wrote:

ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.13-rc6/2.6.13-rc6-mm2/

- Various updates.  Nothing terribly noteworthy.


Yup, seems to be generally good...

Noticed this in the log earlier tonight:

Aug 23 19:44:51 tornado kernel: hub 5-0:1.0: port 1 disabled by hub (EMI?), 
re-enabling...

Aug 23 19:44:51 tornado kernel: usb 5-1: USB disconnect, address 2
Aug 23 19:44:51 tornado kernel: drivers/usb/class/usblp.c: usblp0: removed
Aug 23 19:44:51 tornado kernel: Unable to handle kernel NULL pointer 
dereference at virtual address 0004

Aug 23 19:44:51 tornado kernel:  printing eip:
Aug 23 19:44:51 tornado kernel: c01ccef2
Aug 23 19:44:51 tornado kernel: *pde = 
Aug 23 19:44:51 tornado kernel: Oops:  [#1]
Aug 23 19:44:51 tornado kernel: SMP
Aug 23 19:44:51 tornado kernel: last sysfs file: 
/devices/pci:00/:00:1f.3/i2c-0/name
Aug 23 19:44:51 tornado kernel: Modules linked in: nfsd exportfs lockd eeprom 
sunrpc ipv6 iptable_filter binfmt_misc reiser4 zlib_de
flate zlib_inflate dm_mod video thermal processor fan button ac tpm_nsc 
i2c_i801 sky2 e100 sr_mod

Aug 23 19:44:51 tornado kernel: CPU:1
Aug 23 19:44:51 tornado kernel: EIP:0060:[c01ccef2]Not tainted VLI
Aug 23 19:44:51 tornado kernel: EFLAGS: 00010286   (2.6.13-rc6-mm2)
Aug 23 19:44:51 tornado kernel: EIP is at _raw_spin_lock+0x7/0x73
Aug 23 19:44:51 tornado kernel: eax:    ebx:    ecx: c1a60658 
  edx: c1a63e24
Aug 23 19:44:51 tornado kernel: esi:    edi: c0382400   ebp: f7c55e98 
  esp: f7c55e90

Aug 23 19:44:51 tornado kernel: ds: 007b   es: 007b   ss: 0068
Aug 23 19:44:51 tornado kernel: Process khubd (pid: 109, threadinfo=f7c54000 
task=c192b030)
Aug 23 19:44:51 tornado kernel: Stack: f7c58a8c  f7c55ea0 c0312219 
f7c55eb0 c030feb7 f7c58ae8 f7c58a48
Aug 23 19:44:51 tornado kernel:f7c55ec4 c0217e73 f7c58a48 f7d134ec 
0040 f7c55ed0 c0217ec0 f7c58a48
Aug 23 19:44:51 tornado kernel:f7c55edc c0217814 f7c58a48 f7c55eec 
c0216ad2 f7c58a48 f7c58a14 f7c55ef8

Aug 23 19:44:51 tornado kernel: Call Trace:
Aug 23 19:44:51 tornado kernel:  [c01039c3] show_stack+0x94/0xca
Aug 23 19:44:51 tornado kernel:  [c0103b6c] show_registers+0x15a/0x1ea
Aug 23 19:44:51 tornado kernel:  [c0103d8a] die+0x108/0x183
Aug 23 19:44:51 tornado kernel:  [c031295a] do_page_fault+0x1ea/0x63d
Aug 23 19:44:51 tornado kernel:  [c0103693] error_code+0x4f/0x54
Aug 23 19:44:51 tornado kernel:  [c0312219] _spin_lock+0x8/0xa
Aug 23 19:44:51 tornado kernel:  [c030feb7] klist_remove+0x10/0x2c
Aug 23 19:44:51 tornado kernel:  [c0217e73] __device_release_driver+0x41/0x65
Aug 23 19:44:51 tornado kernel:  [c0217ec0] device_release_driver+0x29/0x39
Aug 23 19:44:51 tornado kernel:  [c0217814] bus_remove_device+0x52/0x60
Aug 23 19:44:51 tornado kernel:  [c0216ad2] device_del+0x2e/0x5d
Aug 23 19:44:51 tornado kernel:  [c0216b0c] device_unregister+0xb/0x15
Aug 23 19:44:51 tornado kernel:  [c0275d67] usb_disconnect+0x115/0x15c
Aug 23 19:44:51 tornado kernel:  [c0276b85] hub_port_connect_change+0x54/0x399
Aug 23 19:44:51 tornado kernel:  [c027713e] hub_events+0x274/0x3b2
Aug 23 19:44:51 tornado kernel:  [c0277296] hub_thread+0x1a/0xdf
Aug 23 19:44:51 tornado kernel:  [c012fba7] kthread+0x99/0x9d
Aug 23 19:44:51 tornado kernel:  [c01010b5] kernel_thread_helper+0x5/0xb
Aug 23 19:44:51 tornado kernel: Code: 00 00 00 8b 0d a8 62 36 c0 e9 61 ff ff 
ff f3 90 31 c0 86 07 84 c0 0f 8e 79 ff ff ff 83 c4 18 5
b 5e 5f 5d c3 55 89 e5 56 53 89 c3 81 78 04 ad 4e ad de 75 2d be 00 e0 ff ff 
21 e6 8b 06 39 43 0c


reuben

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] mm: return ENOBUFS instead of ENOMEM in generic_file_buffered_write

On Tue, Aug 23, 2005 at 11:46:33AM +0300, Pekka J Enberg wrote:
 As noticed by Dmitry Torokhov, write() can not return ENOMEM:
 
 http://www.opengroup.org/onlinepubs/95399/functions/write.html
 
 Therefore fixup generic_file_buffered_write() in mm/filemap.c (pointed out by
 Nathan Scott).

We had this discussion before, for EACCESS then.  We've always been returning
more errnos than SuS mentioned and Linus declared it's fine.

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: sched_yield() makes OpenLDAP slow

2005-08-23 Thread Denis Vlasenko

On Tuesday 23 August 2005 14:17, linux-os \(Dick Johnson\) wrote:
 
 On Mon, 22 Aug 2005, Robert Hancock wrote:
 
  linux-os (Dick Johnson) wrote:
  I reported thet sched_yield() wasn't working (at least as expected)
  back in March of 2004.
 
 for(;;)
   sched_yield();
 
  ... takes 100% CPU time as reported by `top`. It should take
  practically 0. Somebody said that this was because `top` was
  broken, others said that it was because I didn't know how to
  code. Nevertheless, the problem was not fixed, even after
  schedular changes were made for the current version.
 
  This is what I would expect if run on an otherwise idle machine.
  sched_yield just puts you at the back of the line for runnable
  processes, it doesn't magically cause you to go to sleep somehow.
 
 
 When a kernel build is occurring??? Plus `top` itself It damn
 well sleep while giving up the CPU. If it doesn't it's broken.

top doesn't run all the time:

# strace -o top.strace -tt top

14:52:19.407958 write(1,   758 root  16   0   104   2..., 79) = 79
14:52:19.408318 write(1,   759 root  16   0   100   1..., 79) = 79
14:52:19.408659 write(1,   760 root  16   0   100   1..., 79) = 79
14:52:19.409001 write(1,   761 root  18   0  2604  39..., 74) = 74
14:52:19.409342 write(1,   763 daemon17   0   108   1..., 78) = 78
14:52:19.409672 write(1,   773 root  16   0   104   2..., 79) = 79
14:52:19.410010 write(1,   774 root  16   0   104   2..., 79) = 79
14:52:19.410362 write(1,   775 root  16   0   100   1..., 79) = 79
14:52:19.410692 write(1,   776 root  16   0   104   2..., 79) = 79
14:52:19.411136 write(1,   777 daemon17   0   108   1..., 86) = 86
14:52:19.411505 select(1, [0], NULL, NULL, {5, 0}) = 0 (Timeout)
hrrr. ps...
14:52:24.411744 time([1124797944])  = 1124797944
14:52:24.411883 lseek(4, 0, SEEK_SET)   = 0
14:52:24.411957 read(4, 24822.01 18801.28\n, 1023) = 18
14:52:24.412082 access(/var/run/utmpx, F_OK) = -1 ENOENT (No such file or 
directory)
14:52:24.412224 open(/var/run/utmp, O_RDWR) = 8
14:52:24.412328 fcntl64(8, F_GETFD) = 0
14:52:24.412399 fcntl64(8, F_SETFD, FD_CLOEXEC) = 0
14:52:24.412467 _llseek(8, 0, [0], SEEK_SET) = 0
14:52:24.412556 alarm(0)= 0
14:52:24.412643 rt_sigaction(SIGALRM, {0x4015a57c, [], SA_RESTORER, 
0x40094ae8}, {SIG_DFL}, 8) = 0
14:52:24.412747 alarm(1)= 0

However, kernel compile shouldn't.

I suggest stracing with -tt for(;;) yield(); test proggy with and without
kernel compile in parallel, and comparing the output...

Hmm... actually, knowing that you will argue to death instead...

# cat t.c
#include sched.h

int main() {
for(;;) sched_yield();
return 0;
}
# gcc t.c
# strace -tt ./a.out
...
15:03:41.211324 sched_yield()   = 0
15:03:41.211673 sched_yield()   = 0
15:03:41.212034 sched_yield()   = 0
15:03:41.212400 sched_yield()   = 0
15:03:41.212749 sched_yield()   = 0
15:03:41.213126 sched_yield()   = 0
15:03:41.213486 sched_yield()   = 0
15:03:41.213835 sched_yield()   = 0
15:03:41.214220 sched_yield()   = 0
15:03:41.214577 sched_yield()   = 0
15:03:41.214939 sched_yield()   = 0
I start while true; do true; done on another console...
15:03:43.314645 sched_yield()   = 0
15:03:43.847644 sched_yield()   = 0
15:03:43.954635 sched_yield()   = 0
15:03:44.063798 sched_yield()   = 0
15:03:44.171596 sched_yield()   = 0
15:03:44.282624 sched_yield()   = 0
15:03:44.391632 sched_yield()   = 0
15:03:44.498609 sched_yield()   = 0
15:03:44.605584 sched_yield()   = 0
15:03:44.712538 sched_yield()   = 0
15:03:44.819557 sched_yield()   = 0
15:03:44.928594 sched_yield()   = 0
15:03:45.040603 sched_yield()   = 0
15:03:45.148545 sched_yield()   = 0
15:03:45.259311 sched_yield()   = 0
15:03:45.368563 sched_yield()   = 0
15:03:45.476482 sched_yield()   = 0
15:03:45.583568 sched_yield()   = 0
15:03:45.690491 sched_yield()   = 0
15:03:45.797512 sched_yield()   = 0
15:03:45.906534 sched_yield()   = 0
15:03:46.013545 sched_yield()   = 0
15:03:46.120505 sched_yield()   = 0
Ctrl-C

# uname -a
Linux firebird 2.6.12-r4 #1 SMP Sun Jul 17 13:51:47 EEST 2005 i686 unknown 
unknown GNU/Linux
--
vda

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC - 0/9] Generic timekeeping subsystem (v. B5)

2005-08-23 Thread Roman Zippel

Hi,

On Mon, 22 Aug 2005, john stultz wrote:

 The reason why we calculate the interval_length in the continuous
 timesource case is because we are not assuming anything about the
 frequency that the timekeeping_periodic_hook() is called.

The problem with your patch is that it doesn't allow making such 
assumptions.
Anyway, it's rather simple, if you want to update the time asynchronously:

cycle_offset = get_cycles() - last_update;

while (cycle_offset = update_cycles) {
cycle_offset -= update_cycles;
last_update += update_cycles;
// at init: system_update = update_cycles * mult;
system_time += system_update;
xtime += [tick_nsec, time_adj];
}

error = system_time - (xtime.tv_nsec  shift);

if (abs(error)  update_cycles/2) {
mult_adj = (error +- update_cycles/2) / update_cycles;
mult += mult_adj;
system_update += mult_adj * update_cycles;
system_time -= mult_adj * cycle_offset;
error -= mult_adj * cycle_offset;
}

if (xtime.tv_nsec + (error  shift)  NSEC_PER_SEC) {
system_time -= NSEC_PER_SEC  shift;
second_overflow();
}

Since we usually don't have to adjust for the error all at once, it should 
be possible to precalculate some of it in adjtimex/second_overflow and 
turn mult_adj into a mult_adj_shift.
I didn't really check the math here in detail, so there should be enough 
errors left :), but I hope it's enough to show the idea (especially how to 
do it without mult/divide).

There are now variations of this possible, the initial cycle_offset can be 
constant, this happens if it's regularly  called from an interrupt (and 
it's sufficient for UP systems). We could also completely ignore the 
error, so that the core calculation of the above results in the familiar:

xtime += [tick_nsec, time_adj];
if (xtime.tv_nsec  NSEC_PER_SEC)
second_overflow();

Another variation would be useful for ppc64 (or maybe any 64bit arch, but 
ppc64 has already the matching gettimeofday). In this case we don't use a 
timespec based xtime and don't scale it to ns, but use 64bit values 
instead scaled to seconds.
The last one may become a bit of a challenge to keep as much as possible 
code common without abusing the preprocessor too much. In any case some 
functions will differ completely anyway, especially gettimeofday will be 
optimized differently depending on the arch/clock requirements, OTOH
introducing a common gettimeofday (that would even require a 64bit 
divide) would be a huge mistake.

bye, Roman
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

RE: kernel module seg fault

2005-08-23 Thread bunnans

Hi Biswas,

You need to post the complete kernel dump message and body of your
source code.

-Bunnan
 
-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of manomugdha
biswas
Sent: Tuesday, August 23, 2005 3:13 PM
To: linux-kernel@vger.kernel.org
Subject: kernel module seg fault

Hi,
I have written a kernel module and I can load (insmod)
it without any error. But when i run my module it gets
seg fault at interruptible_sleep_on_timeout();

I have used this function in the following way:

DECLARE_WAIT_QUEUE_HEAD(wq);
init_waitqueue_head(wq);
interruptible_sleep_on_timeout(wq, 2);

I am using redhat version 9.0 and kernel version
2.4.20-8.
Could you please give some light on this issue?

Manomugdha Biswas







Send a rakhi to your brother, buy gifts and win attractive prizes. Log
on to http://in.promos.yahoo.com/rakhi/index.html
-
To unsubscribe from this list: send the line unsubscribe linux-kernel
in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] blk queue io tracing support

2005-08-23 Thread Jens Axboe

Hi,

This is a little something I have played with. It allows you to see
exactly what is going on in the block layer for a given queue. Currently
it can logs request queueing and building, dispatches, requeues, and
completions. I've uploaded a little silly app to do dumps here:

http://www.kernel.org/pub/linux/kernel/people/axboe/tools/blktrace.c

Sample output looks like this:

wiggum:~ # ./blktrace /dev/sda
relay name: /relay/sda0
   0  3765 Q R 192-200
   5  3765 G R
  13  3765 M R [200-208]
  15  3765 M R [208-216]
  17  3765 M R [216-224]
  18  3765 M R [224-232]
  19  3765 M R [232-240]
  20  3765 M R [240-248]
  21  3765 M R [248-256]
 154  3765 M R [256-264]
 156  3765 M R [264-272]
 157  3765 M R [272-280]
 159  3765 M R [280-288]
 160  3765 M R [288-296]
 161  3765 M R [296-304]
 162  3765 M R [304-312]
 163  3765 M R [312-320]
 164  3765 M R [320-328]
 170  3765 M R [328-336]
 171  3765 M R [336-344]
 172  3765 M R [344-352]
 173  3765 M R [352-360]
 174  3765 M R [360-368]
 175  3765 M R [368-376]
 177  3765 M R [376-384]
 178  3765 M R [384-392]
 179  3765 Q R 392-400
 180  3765 G R
 181  3765 M R [400-408]
 182  3765 M R [408-416]
 183  3765 M R [416-424]
 184  3765 M R [424-432]
 185  3765 M R [432-440]
 186  3765 M R [440-448]
 187  3765 M R [448-456]
 189  3765 M R [456-464]
 190  3765 M R [464-472]
 191  3765 M R [472-480]
 193  3765 M R [480-488]
 194  3765 M R [488-496]
 196  3765 M R [496-504]
 197  3765 M R [504-512]
 228  3765 D R 192-392
 245  3765 D R 392-512
   14049 0 C R 192-392 [0]
   14067 0 D R 392-512
   14807 0 C R 392-512 [0]
Reads:  Queued:   2,  160KiB
Completed:2,  160KiB
Merges:  38
Writes: Queued:   0,0KiB
Completed:0,0KiB
Merges:   0
Events: 47
Missed events: 0

This is a log of a dd if=/dev/sda of=/dev/null bs=64k count=2 and it
shows queueing (Q) and allocation (G) of two requests, along with the
merges (M) that happens there. Finally you see dispatch (D) and
completion (C) of them as well. When sigint is received, blktrace dumps
stats of the current run.

It will work for scsi commands as well, so you can see what is going on
when cdrecord is talking to the device (the cdb is dumped, not the
data). The final integer printed in [] after a completion is the error,
0 for correct completion.

You can register interest in various events, see blktrace.c (grep for
buts and BLKSTARTTRACE).

Patch is against 2.6.13-rc6-mm2. I'm attaching a relayfs update from Tom
Zanussi as well, which is required to handle sub-buffer wrapping
correctly. You need to apply both patches to play with this - and make
sure to enable CONFIG_BLK_DEV_IO_TRACE in your .config, of course. And
blktrace.c relies on relayfs being mounted on /relay, add something ala

none /relay   relayfsdefaults 0 0

to your /etc/fstab to accomplish that (or do it manually, only
mentioning it for completeness).

-- 
Jens Axboe

diff -urpN -X /home/axboe/cdrom/exclude 
/opt/kernel/linux-2.6.13-rc6-mm2/drivers/block/blktrace.c 
linux-2.6.13-rc6-mm2/drivers/block/blktrace.c
--- /opt/kernel/linux-2.6.13-rc6-mm2/drivers/block/blktrace.c   1970-01-01 
01:00:00.0 +0100
+++ linux-2.6.13-rc6-mm2/drivers/block/blktrace.c   2005-08-23 
13:34:17.0 +0200
@@ -0,0 +1,119 @@
+#include linux/config.h
+#include linux/kernel.h
+#include linux/blkdev.h
+#include linux/blktrace.h
+#include asm/uaccess.h
+
+void __blk_add_trace(struct blk_trace *bt, sector_t sector, int bytes,
+int rw, u32 what, int error, int pdu_len, char *pdu_data)
+{
+   struct blk_io_trace t;
+   unsigned long flags;
+
+   if (rw == WRITE)
+   what |= BLK_TC_ACT(BLK_TC_WRITE);
+   else
+   what |= BLK_TC_ACT(BLK_TC_READ);
+   
+   if (((bt-act_mask  BLK_TC_SHIFT)  what) == 0)
+   return;
+
+   t.magic = BLK_IO_TRACE_MAGIC | BLK_IO_TRACE_VERSION;
+   t.sequence  = atomic_add_return(1, bt-sequence);
+   t.time  = sched_clock() / 1000;
+   t.sector= sector;
+   t.bytes = bytes;
+   t.action= what;
+   t.pid   = current-pid;
+   t.error = error;
+   t.pdu_len   = pdu_len;
+
+   local_irq_save(flags);
+   __relay_write(bt-rchan, t, sizeof(t));
+   if (pdu_len)
+   __relay_write(bt-rchan, pdu_data, pdu_len);
+   local_irq_restore(flags);
+}
+
+int blk_stop_trace(struct block_device *bdev)
+{
+   request_queue_t *q =

Re: 2.6.13-rc6-rt9


* Steven Rostedt [EMAIL PROTECTED] wrote:

 Ingo, can't you get rt.c to be more confusing. I mean it is too 
 simple. We need to add a few more underscores here and there :-) 
 Seriously, that rt.c is mind boggling. It was nice before, now it is 
 just screaming for a cleanup (come now, do we really need the four 
 underscores?). Same with latency.c.

i agree that it's ugly, but some of that ugliness is to achieve the 
7-instructions fail-through codepath for the common acquire (and 
release) codepath:

 c03a5320 __down_mutex:
 c03a5320:   89 c1   mov%eax,%ecx
 c03a5322:   8b 15 08 76 3a c0   mov0xc03a7608,%edx
 c03a5328:   31 c0   xor%eax,%eax
 c03a532a:   0f b1 51 14 cmpxchg %edx,0x14(%ecx)
 c03a532e:   85 c0   test   %eax,%eax
 c03a5330:   75 01   jnec03a5333 __down_mutex+0x13
 c03a5332:   c3  ret

that's how much it takes to acquire an RT lock, and i worked hard to get 
there. As long as the fastpath is kept this tight, feel free to do 
cleanups. But i really want to avoid having to write mutex_down/up in 
assembly for 24 architectures ...

Ingo
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 2.6.13-rc6-rt9

2005-08-23 Thread Steven Rostedt

On Tue, 2005-08-23 at 14:36 +0200, Ingo Molnar wrote:
 * Steven Rostedt [EMAIL PROTECTED] wrote:
 
  Ingo, can't you get rt.c to be more confusing. I mean it is too 
  simple. We need to add a few more underscores here and there :-) 
  Seriously, that rt.c is mind boggling. It was nice before, now it is 
  just screaming for a cleanup (come now, do we really need the four 
  underscores?). Same with latency.c.
 
 i agree that it's ugly, but some of that ugliness is to achieve the 
 7-instructions fail-through codepath for the common acquire (and 
 release) codepath:
 
  c03a5320 __down_mutex:
  c03a5320:   89 c1   mov%eax,%ecx
  c03a5322:   8b 15 08 76 3a c0   mov0xc03a7608,%edx
  c03a5328:   31 c0   xor%eax,%eax
  c03a532a:   0f b1 51 14 cmpxchg %edx,0x14(%ecx)
  c03a532e:   85 c0   test   %eax,%eax
  c03a5330:   75 01   jnec03a5333 __down_mutex+0x13
  c03a5332:   c3  ret
 

Impressive!

 that's how much it takes to acquire an RT lock, and i worked hard to get 
 there. As long as the fastpath is kept this tight, feel free to do 
 cleanups. But i really want to avoid having to write mutex_down/up in 
 assembly for 24 architectures ...

Warning! I'm hacking hard to get rid of the global pi_lock, and I'm not
worrying now about efficiency.  I figure that if I can get it to work,
then we can speed it up afterwards.  Since it's complex enough keeping
all the locks straight, I just want it to work without deadlocking. 

Once I get it to work, I'll let you figure out how get it back down to
7-instructions :-)

-- Steve


-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch] suspend: update warnings

Hi!

  + * If you have unsupported (*) devices using DMA, you may have some
  + * problems. If your disk driver does not support suspend... (IDE does),
  + * it may cause some problems, too. If you change kernel command line 
  + * between suspend and resume, it may do something wrong. If you change 
  + * your hardware while system is suspended... well, it was not good idea;
  + * but it wil probably only crash.
 
 The most common driver issues I see involve:
 - USB being built in or as modules that are still loaded while
 suspending (getting better, but not there yet)
 - DRI being used in X where the drivers don't properly support
 suspend/resume (NVidia esp)
 - Firewire
 - CPU Freq  (improving too)
 
 It might be good to mention these areas too.

Well, right; but those 'only' cause system to crash during suspend. I
was talking about really dangerous stuff.

Both usb and cpufreq seems to work okay here.

I've added FAQ entry at the end:

Q: What information is usefull for debugging suspend-to-disk problems?

A: Well, last messages on the screen are always useful. If something
is broken, it is usually some kernel driver, therefore trying with as
little as possible modules loaded helps a lot. I also prefer people to
suspend from console, preferably without X running. Booting with
init=/bin/bash, then swapon and starting suspend sequence manually
usually does the trick. Then it is good idea to try with latest
vanilla kernel.

Known problematic modules are; be sure to unload them before
suspend:
- DRI being used in X where the drivers don't properly support
suspend/resume (NVidia esp)
- Firewire
- SCSI


 Perhaps the 'changing your hardware' could mention that replacing faulty
 hardware may be safe.

I do not want to encourage people to do that. Yep, its probably safe,
no, I do not want them to know.

Pavel
-- 
if you have sharp zaurus hardware you don't need... you know my address
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 2.6.13-rc6-mm2 (hangs on non-SMP x86-64 and oopses)

2005-08-23 Thread Rafael J. Wysocki

On Tuesday, 23 of August 2005 06:30, Andrew Morton wrote:
 
 ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.13-rc6/2.6.13-rc6-mm2/
 
 - Various updates.  Nothing terribly noteworthy.

It hangs solig during boot (after starting kjournald) on Asus L5D (non-SMP 
x86-64),
which is caused by this patch:

8250-serial-console-locking-bug-spelling-fix.patch

(from binary search).

If this patch is reverted, it oopses like in the following trace.

At the same time it works fine on an SMP box (dual-core Athlon 64).

Greetings,
Rafael


ACPI: PCI Interrupt Link [LUS2] enabled at IRQ 5
PCI: setting IRQ 5 as level-triggered
ACPI: PCI Interrupt :00:02.2[C] - Link [LUS2] - GSI 5 (level, low) - IRQ 
5
PCI: Setting latency timer of device :00:02.2 to 64
ehci_hcd :00:02.2: EHCI Host Controller
ehci_hcd :00:02.2: debug port 1
ehci_hcd :00:02.2: new USB bus registered, assigned bus number 3
ehci_hcd :00:02.2: irq 5, io mem 0xfebfdc00
PCI: cache line size of 64 is not supported by device :00:02.2
ehci_hcd :00:02.2: park 0
ehci_hcd :00:02.2: USB 2.0 initialized, EHCI 1.00, driver 10 Dec 2004
hub 3-0:1.0: USB hub found
usb 2-2: string descriptor 0 read error: -110
hub 3-0:1.0: 6 ports detected
usb 2-2: string descriptor 0 read error: -110
usb 2-2: can't set config #1, error -110
Unable to handle kernel NULL pointer dereference at 0004 RIP:
8024373b{_raw_spin_lock+27}
PGD 2ca73067 PUD 2ca46067 PMD 0
Oops:  [1] PREEMPT
CPU 0
Modules linked in: ehci_hcd ohci_hcd sk98lin evdev joydev sg st sr_mod sd_mod 
scsi_mod ide_cd cdrom dm_mod parport_pc lp parport
Pid: 108, comm: khubd Not tainted 2.6.13-rc6-mm2
RIP: 0010:[8024373b] 8024373b{_raw_spin_lock+27}
RSP: :81002fc7dcc8  EFLAGS: 00010282
RAX: 810001ce20d0 RBX:  RCX: 81002d586530
RDX:  RSI: 81002d586540 RDI: 
RBP: 81002fc7dce8 R08:  R09: 81002d586410
R10:  R11:  R12: 
R13: 803f06a0 R14: 81002d5557f8 R15: 0002
FS:  2b28fe80() GS:804f8840() knlGS:
CS:  0010 DS: 0018 ES: 0018 CR0: 8005003b
CR2: 0004 CR3: 2ca61000 CR4: 06e0
Process khubd (pid: 108, threadinfo 81002fc7c000, task 810001ce20d0)
Stack:   803f06a0 81002d5557f8
   81002fc7dd08 8035612e 81002d555918 81002d555870
   81002fc7dd28 80353b2f
Call Trace:8035612e{_spin_lock+30} 80353b2f{klist_remove+31}
   802ad11d{__device_release_driver+93} 
802ad254{device_release_driver+52}
   802ac994{bus_remove_device+180} 
802ab7f8{device_del+56}
   802d657f{usb_new_device+495} 
802d7419{hub_thread+1961}
   80354b6f{thread_return+187} 
8014a710{autoremove_wake_function+0}
   8014a710{autoremove_wake_function+0} 
802d6c70{hub_thread+0}
   8014a583{kthread+211} 8010f5e6{child_rip+8}
   8014a4b0{kthread+0} 8010f5de{child_rip+0}

BUG: spinlock trylock failure on UP on CPU#0, khubd/108
 lock: 803bf020, .magic: dead4ead, .owner: khubd/108, .owner_cpu: 0

Call Trace:802439f9{add_preempt_count+105} 
80243623{spin_bug+211}
   8011004b{show_trace+571} 
8024370e{_raw_spin_trylock+62}
   80355e4e{_spin_trylock+30} 8010fc81{oops_begin+17}
   8035702a{do_page_fault+1722} 8013452e{vprintk+830}
   8013452e{vprintk+830} 80152296{kallsyms_lookup+246}
   8010f431{error_exit+0} 8011004b{show_trace+571}
   80110047{show_trace+567} 80110168{show_stack+216}
   80110207{show_registers+135} 8011050e{__die+142}
   80357098{do_page_fault+1832} 
80355fa4{_spin_unlock_irq+20}
   80354b6f{thread_return+187} 8010f431{error_exit+0}
   8024373b{_raw_spin_lock+27} 
802439f9{add_preempt_count+105}
   8035612e{_spin_lock+30} 80353b2f{klist_remove+31}
   802ad11d{__device_release_driver+93} 
802ad254{device_release_driver+52}
   802ac994{bus_remove_device+180} 
802ab7f8{device_del+56}
   802d657f{usb_new_device+495} 
802d7419{hub_thread+1961}
   80354b6f{thread_return+187} 
8014a710{autoremove_wake_function+0}
   8014a710{autoremove_wake_function+0} 
802d6c70{hub_thread+0}
   8014a583{kthread+211} 8010f5e6{child_rip+8}
   8014a4b0{kthread+0} 8010f5de{child_rip+0}

---
| preempt count: 0003 ]
| 3 level deep critical section nesting:

.. [80356126]

Re: [patch] suspend: update warnings

2005-08-23 Thread Nigel Cunningham

Hi.

On Tue, 2005-08-23 at 22:50, Pavel Machek wrote:
 Hi!
 
   + * If you have unsupported (*) devices using DMA, you may have some
   + * problems. If your disk driver does not support suspend... (IDE does),
   + * it may cause some problems, too. If you change kernel command line 
   + * between suspend and resume, it may do something wrong. If you change 
   + * your hardware while system is suspended... well, it was not good idea;
   + * but it wil probably only crash.
  
  The most common driver issues I see involve:
  - USB being built in or as modules that are still loaded while
  suspending (getting better, but not there yet)
  - DRI being used in X where the drivers don't properly support
  suspend/resume (NVidia esp)
  - Firewire
  - CPU Freq  (improving too)
  
  It might be good to mention these areas too.
 
 Well, right; but those 'only' cause system to crash during suspend. I
 was talking about really dangerous stuff.
 
 Both usb and cpufreq seems to work okay here.

It depends on what you're using. I believe one of the usb root hub
drivers is okay, the others aren't. Similar for cpufreq. USB certainly
accounts for a high percentage of the failures I see.

 I've added FAQ entry at the end:
 
 Q: What information is usefull for debugging suspend-to-disk problems?
 
 A: Well, last messages on the screen are always useful. If something
 is broken, it is usually some kernel driver, therefore trying with as
 little as possible modules loaded helps a lot. I also prefer people to
 suspend from console, preferably without X running. Booting with
 init=/bin/bash, then swapon and starting suspend sequence manually
 usually does the trick. Then it is good idea to try with latest
 vanilla kernel.
 
 Known problematic modules are; be sure to unload them before
 suspend:
 - DRI being used in X where the drivers don't properly support
 suspend/resume (NVidia esp)
 - Firewire
 - SCSI
 
 
  Perhaps the 'changing your hardware' could mention that replacing faulty
  hardware may be safe.
 
 I do not want to encourage people to do that. Yep, its probably safe,
 no, I do not want them to know.

:

Thanks

Nigel
-- 
Evolution.
Enumerate the requirements.
Consider the interdependencies.
Calculate the probabilities.

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 2.6.13-rc6-rt9


* Steven Rostedt [EMAIL PROTECTED] wrote:

 On Tue, 2005-08-23 at 14:36 +0200, Ingo Molnar wrote:
  * Steven Rostedt [EMAIL PROTECTED] wrote:
  
   Ingo, can't you get rt.c to be more confusing. I mean it is too 
   simple. We need to add a few more underscores here and there :-) 
   Seriously, that rt.c is mind boggling. It was nice before, now it is 
   just screaming for a cleanup (come now, do we really need the four 
   underscores?). Same with latency.c.
  
  i agree that it's ugly, but some of that ugliness is to achieve the 
  7-instructions fail-through codepath for the common acquire (and 
  release) codepath:
  
   c03a5320 __down_mutex:
   c03a5320:   89 c1   mov%eax,%ecx
   c03a5322:   8b 15 08 76 3a c0   mov0xc03a7608,%edx
   c03a5328:   31 c0   xor%eax,%eax
   c03a532a:   0f b1 51 14 cmpxchg %edx,0x14(%ecx)
   c03a532e:   85 c0   test   %eax,%eax
   c03a5330:   75 01   jnec03a5333 __down_mutex+0x13
   c03a5332:   c3  ret
  
 
 Impressive!
 
  that's how much it takes to acquire an RT lock, and i worked hard to get 
  there. As long as the fastpath is kept this tight, feel free to do 
  cleanups. But i really want to avoid having to write mutex_down/up in 
  assembly for 24 architectures ...
 
 Warning! I'm hacking hard to get rid of the global pi_lock, and I'm not
 worrying now about efficiency.  I figure that if I can get it to work,
 then we can speed it up afterwards.  Since it's complex enough keeping
 all the locks straight, I just want it to work without deadlocking. 
 
 Once I get it to work, I'll let you figure out how get it back down to 
 7-instructions :-)

yeah. It can always be done after the fact - the basics wont change.  
(Note that the above disassembly is for UP, on SMP the fastpath is 
longer and around 10-15 instructions.)

Ingo
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch] suspend: update warnings

Hi!

+ * If you have unsupported (*) devices using DMA, you may have some
+ * problems. If your disk driver does not support suspend... (IDE 
does),
+ * it may cause some problems, too. If you change kernel command line 
+ * between suspend and resume, it may do something wrong. If you 
change 
+ * your hardware while system is suspended... well, it was not good 
idea;
+ * but it wil probably only crash.
   
   The most common driver issues I see involve:
   - USB being built in or as modules that are still loaded while
   suspending (getting better, but not there yet)
   - DRI being used in X where the drivers don't properly support
   suspend/resume (NVidia esp)
   - Firewire
   - CPU Freq  (improving too)
   
   It might be good to mention these areas too.
  
  Well, right; but those 'only' cause system to crash during suspend. I
  was talking about really dangerous stuff.
  
  Both usb and cpufreq seems to work okay here.
 
 It depends on what you're using. I believe one of the usb root hub
 drivers is okay, the others aren't. Similar for cpufreq. USB certainly
 accounts for a high percentage of the failures I see.

Do you remember which one is it? I have UHCI here, and it seems to
work okay. powernow-k8 and cpufreq-centrino also seems to behave ok.

Pavel
-- 
if you have sharp zaurus hardware you don't need... you know my address
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch] suspend: update warnings

On Tue, Aug 23, 2005 at 02:50:17PM +0200, Pavel Machek wrote:
 - DRI being used in X where the drivers don't properly support
 suspend/resume (NVidia esp)

NVidias driver is not support and a copyright violation of the
copyrights of many of use.  It's never supported so please don't
mention it.

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch] suspend: update warnings

Hi!

  - DRI being used in X where the drivers don't properly support
  suspend/resume (NVidia esp)
 
 NVidias driver is not support and a copyright violation of the
 copyrights of many of use.  It's never supported so please don't
 mention it.

Unfortunately, it is quite common out there. I need to somehow keep
those bug reports off my mailbox.

Okay, this should be enough:

Q: What information is usefull for debugging suspend-to-disk problems?

A: Well, last messages on the screen are always useful. If something
is broken, it is usually some kernel driver, therefore trying with as
little as possible modules loaded helps a lot. I also prefer people to
suspend from console, preferably without X running. Booting with
init=/bin/bash, then swapon and starting suspend sequence manually
usually does the trick. Then it is good idea to try with latest
vanilla kernel.

Known problematic modules are; be sure to unload them before
suspend:
- DRI being used (3D acceleration)
- Firewire
- SCSI



-- 
if you have sharp zaurus hardware you don't need... you know my address
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch] suspend: update warnings

On Tue, Aug 23, 2005 at 03:00:50PM +0200, Pavel Machek wrote:
 Hi!
 
   - DRI being used in X where the drivers don't properly support
   suspend/resume (NVidia esp)
  
  NVidias driver is not support and a copyright violation of the
  copyrights of many of use.  It's never supported so please don't
  mention it.
 
 Unfortunately, it is quite common out there. I need to somehow keep
 those bug reports off my mailbox.

I think we made it pretty clear that people with binary modules should
sodd off.  Feel free to use banner for a big sod off as usual warning
for all binary module user idiots.

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Ext3 Errors on Dell RAID

2005-08-23 Thread Jess Balint

Problem:
I get massive ext3 errors once every few days. See errors on console
section below. Almost all commands return I/O error. I have to power
cycle the machine to get it running again. Upon reboot, there are
usually 3 orphan inodes deleted and everything is fine. See messages
on reboot below.

Configuration:
System: Dell PowerEdge 6300/500, 4 CPU SMP w/2GB memory
Discs: 3 SCSI discs in a controller-managed striped configuration
Controller: Dell PERC-2
kernel messages in kernel boot messages below

Other:
I had this problem before. I upgrade the card firmware to 2.8/build
6809, but still the same issue. I tried with the 2.4.29 kernel
(aacraid driver v 1.1-3) from the Slackware (10?) distribution and
then I upgraded to 2.4.31. It has the same driver version and same
problem. Running fsck always shows everything is fine (rc=0).

Does anybody have experience with this machine working well? If so,
what combination of kernel and firmware version?

Or does anybody know the root cause of the occasional massive ext3
errors or what I can do to test and/or fix it?

Please cc me jbalint-at-gmail as I am not on the list.
Thanks.
Jess

--
--
errors on console
--
--
EXT3-fs error (device sd(8,2)) in ext3_reserve_inode_write: IO failure
EXT3-fs error (device sd(8,2)) in ext3_orphan_add: IO failure
EXT3-fs error (device sd(8,2)): ext3_get_inode_loc: unable to read
inode block - inode=1015869, block=1015811
EXT3-fs error (device sd(8,2)) in ext3_reserve_inode_write: IO failure
EXT3-fs error (device sd(8,2)): ext3_get_inode_loc: unable to read
inode block - inode=1015869, block=1015811
EXT3-fs error (device sd(8,2)) in ext3_reserve_inode_write: IO failure
EXT3-fs error (device sd(8,2)) in ext3_orphan_add: IO failure
EXT3-fs error (device sd(8,2)): ext3_get_inode_loc: unable to read
inode block - inode=1213811, block=1212461
EXT3-fs error (device sd(8,2)) in ext3_reserve_inode_write: IO failure
EXT3-fs error (device sd(8,2)) in ext3_new_inode: IO failure

--
--
messages on reboot
--
--
EXT3-fs: INFO: recovery required on readonly filesystem.
EXT3-fs: write access will be enabled during recovery.
kjournald starting.  Commit interval 5 seconds
EXT3-fs: sd(8,2): orphan cleanup on readonly fs
EXT3-fs: sd(8,2): 3 orphan inodes deleted
EXT3-fs: recovery complete.
EXT3-fs: mounted filesystem with ordered data mode.
VFS: Mounted root (ext3 filesystem) readonly.

--
--
kernel boot messages
--
--
SCSI subsystem driver Revision: 1.00
Red Hat/Adaptec aacraid driver (1.1-3 Aug 16 2005 17:25:05)
AAC0: kernel 2.8.4 build 6089
AAC0: monitor 2.8.4 build 6089
AAC0: bios 2.8.0 build 6089
AAC0: serial 4c72e2fafaf001
scsi0 : percraid
  Vendor: DELL  Model: rootvgRev: V1.0
  Type:   Direct-Access  ANSI SCSI revision: 02
scsi1 : Adaptec AIC7XXX EISA/VLB/PCI SCSI HBA DRIVER, Rev 6.2.36
Adaptec aic7890/91 Ultra2 SCSI adapter
aic7890/91: Ultra2 Wide Channel A, SCSI Id=7, 32/253 SCBs

scsi2 : Adaptec AIC7XXX EISA/VLB/PCI SCSI HBA DRIVER, Rev 6.2.36
Adaptec aic7890/91 Ultra2 SCSI adapter
aic7890/91: Ultra2 Wide Channel A, SCSI Id=7, 32/253 SCBs

scsi3 : Adaptec AIC7XXX EISA/VLB/PCI SCSI HBA DRIVER, Rev 6.2.36
Adaptec aic7860 Ultra SCSI adapter
aic7860: Ultra Single Channel A, SCSI Id=7, 3/253 SCBs

blk: queue f7aaca18, I/O limit 4095Mb (mask 0x)
(scsi3:A:5): 20.000MB/s transfers (20.000MHz, offset 15)
  Vendor: NEC   Model: CD-ROM DRIVE:465  Rev: 1.03
  Type:   CD-ROM ANSI SCSI revision: 02
blk: queue f7aac818, I/O limit 4095Mb (mask 0x)
Attached scsi disk sda at scsi0, channel 0, id 0, lun 0
SCSI device sda: 213274368 512-byte hdwr sectors (109196 MB)
Partition check:
 sda: sda1 sda2
Attached scsi CD-ROM sr0 at scsi3, channel 0, id 5, lun 0
sr0: scsi3-mmc drive: 14x/32x cd/rw xa/form2 cdda tray
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 2.6.13-rc6-mm2 (hangs on non-SMP x86-64 and oopses)

2005-08-23 Thread Ralf Baechle

Andrew,

On Tue, Aug 23, 2005 at 02:51:51PM +0200, Rafael J. Wysocki wrote:

  
  ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.13-rc6/2.6.13-rc6-mm2/
  
  - Various updates.  Nothing terribly noteworthy.
 
 It hangs solig during boot (after starting kjournald) on Asus L5D (non-SMP 
 x86-64),
 which is caused by this patch:
 
 8250-serial-console-locking-bug-spelling-fix.patch
 
 (from binary search).
 
 If this patch is reverted, it oopses like in the following trace.

I thought this one was already pulled?

  Ralf
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[2.4.31] - USB device numbering in /proc/bus/usb

2005-08-23 Thread Paul Rolland

Hello,

I've just rebooted a machine, and the eagle ADSL modem I was using,
presented as /proc/bus/usb/002/005 in now presented as 
/proc/bus/usb/002/003 (same bus, but device ID changed from 5 to 3).

Is this an expected behavior, when running a 2.4.31 kernel ?
I would have been expecting some more stability in the numbering across
reboot, the same way IDE disks numbers are stable.

Paul

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] asus_acpi M6000A model support

2005-08-23 Thread Lukas Hejtmanek

Hello,

here is patch for Asus M6A laptop support. It works fine for me.

-- 
Lukáš Hejtmánek
--- asus_acpi.c.old 2005-04-21 02:03:13.0 +0200
+++ asus_acpi.c 2005-05-08 18:22:49.0 +0200
@@ -128,6 +128,7 @@
L8L,  //L8400L
M1A,  //M1300A
M2E,  //M2400E, L4400L
+   M6A,  //M6000A
M6N,  //M6800N
M6R,  //M6700R
P30,  //Samsung P30
@@ -304,7 +305,20 @@
.display_set   = SDSP,
.display_get   = \\INFB
},
-
+   {
+   .name  = M6A,
+   /* M6A does not have MLED */
+   .mt_wled   = WLED,
+   .mt_lcd_switch = xxN_PREFIX _Q10,
+   .lcd_status= \\RGPL,
+   .brightness_set= SPLV,
+   .brightness_get= GPLV,
+   .display_set   = SDSP,
+   /* FIXME: this is not correct display_get.
+* It always returns 1 
+* */
+   .display_get   = \\ADVG
+   },
{
.name  = M6N,
.mt_mled   = MLED,
@@ -622,7 +636,7 @@
 {
int lcd = 0;
 
-   if (hotk-model != L3H) {
+   if (hotk-model != L3H  hotk-model != M6A) {
/* We don't have to check anything if we are here */
if (!read_acpi_int(NULL, hotk-methods-lcd_status, lcd))
printk(KERN_WARNING Asus ACPI: Error reading LCD 
status\n);
@@ -638,22 +652,33 @@

input.count = 2;
input.pointer = mt_params;
-   /* Note: the following values are partly guessed up, but 
-  otherwise they seem to work */
mt_params[0].type = ACPI_TYPE_INTEGER;
-   mt_params[0].integer.value = 0x02;
mt_params[1].type = ACPI_TYPE_INTEGER;
-   mt_params[1].integer.value = 0x02;
+   if(hotk-model == L3H) {
+   /* Note: the following values are partly guessed up, 
+* but otherwise they seem to work */
+   mt_params[0].integer.value = 0x02;
+   mt_params[1].integer.value = 0x02;
+   } else if(hotk-model == M6A) {
+   mt_params[0].integer.value = 0x15;
+   mt_params[1].integer.value = 0x01;
+   }
 
output.length = sizeof(out_obj);
output.pointer = out_obj;

-   status = acpi_evaluate_object(NULL, hotk-methods-lcd_status, 
input, output);
+   status = acpi_evaluate_object(NULL, hotk-methods-lcd_status, 
+   input, output);
if (status != AE_OK)
return -1;
-   if (out_obj.type == ACPI_TYPE_INTEGER)
-   /* That's what the AML code does */
-   lcd = out_obj.integer.value  8;
+   if (out_obj.type == ACPI_TYPE_INTEGER) {
+   if(hotk-model== L3H) {
+   /* That's what the AML code does */
+   lcd = out_obj.integer.value  8;
+   } else if(hotk-model == M6A) {
+   lcd = out_obj.integer.value;
+   }
+   }
}

return (lcd  1);
@@ -1029,6 +1054,8 @@
hotk-model = M6N;
else if (strncmp(model-string.pointer, M6R, 3) == 0)
hotk-model = M6R;
+   else if (strncmp(model-string.pointer, M6A, 3) == 0)
+   hotk-model = M6A;
else if (strncmp(model-string.pointer, M2N, 3) == 0 ||
 strncmp(model-string.pointer, M3N, 3) == 0 ||
 strncmp(model-string.pointer, M5N, 3) == 0 ||
@@ -1058,8 +1085,9 @@
hotk-model = L5x;
 
if (hotk-model == END_MODEL) {
-   printk(unsupported, trying default values, supply the 
-  developers with your DSDT\n);
+   printk(unsupported model %s, trying default values, supply 
+  the developers with your DSDT\n, 
+  model-string.pointer);
hotk-model = M2E;
} else {
printk(supported\n);

Re: Linux AIO status todo

2005-08-23 Thread Laurent Vivier

Le mar 23/08/2005 à 11:56, Jakub Jelinek a écrit :
On Tue, Aug 23, 2005 at 01:14:38PM +0530, Suparna Bhattacharya wrote:

2. No support for propagating IO completion events to user space
threads using RT signals. User threads need to poll the completion
queue using io_getevents. POSIX specifies that when an AIO
request completes, a signal can be delivered to the application
to indicate the completion of the IO.

POSIX AIO needs to handle SIGEV_NONE, SIGEV_SIGNAL and SIGEV_THREAD
notification. Obviously kernel shouldn't create threads for SIGEV_THREAD
itself, as kernel shouldn't hardcode all the implementation details how a
thread can be created. But it would be good if AIO signalling e.g. handled
both SIGEV_SIGNAL and SIGEV_SIGNAL | SIGEV_THREAD_ID, with the same usage as
e.g. timer_* syscalls. If kernel makes sure SI_ASYNCIO si_code is set in
the notification signal siginfos, glibc could even use just one helper
thread for timer_*/[al]io_* and maybe in the future other SIGEV_THREAD
notification.

See chapter 2.2. AIO completion event.

The libposix-aio written by Sébastien and I manages all these cases:

http://www.bullopensource.org/posix/

There is a patch allowing kernel to send signal to a given process on
aio event completion:

http://cvs.sourceforge.net/viewcvs.py/paiol/kernel-patches/2.6.12/aioevent.patch?rev=1.1.1.1view=auto

With the help of an helper thread in the user space, the libposix-aio is
able to manage SIGEV_THREAD and create new thread by using user space
code (and thus implementation dependent calls):

http://cvs.sourceforge.net/viewcvs.py/paiol/libposix-aio/src/aio_read.c?view=markup
http://cvs.sourceforge.net/viewcvs.py/paiol/libposix-aio/src/aio_thread_create.c?view=markup

Sébastien wrote this part of libposix-aio (So I'm not an expert on this
part :-P ), but I think his helper thread is made like the glibc timer
helper thread is made. And thus, if we want to merge libposix-aio in
glibc, we should use existing mechanism, and it should be easy to put
POSIX AIO helper thread portions inside the timer helper thread.

But only the glibc maintainer can answer to this question:

should we mixe timer and AIO code ?

Laurent
--
-- Laurent Vivier ---
mailto:[EMAIL PROTECTED] BULL/FREC:B1-226
phone: (+33) 476 29 7213 Bullcom: 229-7213
--[ DT/OSwRD/AIX ]--
http://www.bullopensource.org/ext4

signature.asc
Description: Ceci est une partie de message numériquement signée.

dnotify/inotify and vfs questions

2005-08-23 Thread Asser Femø

Hi,

I'm currently implementing change notification support for the linux
cifs client as part of Google's Summer of Code program.

In cifs, change notification works pretty much the same as dnotify does
in the kernel, and you cancel the notification by sending a NT_CANCEL
request. 

According to the fcntl manual you can cancel a notification by doing
fcntl(fd, F_NOTIFY, 0) (ie. sending 0 as the notification mask), but
looking in the kernel code fcntl_dirnotify() immediately calls
dnotify_flush() with neither telling the vfs module about it. Is there a
reason for this?  Otherwise I'd propose calling
filp-f_op-dir_notify(filp, 0) at some point in this scenario.

Regarding inotify, inotify_add_watch doesn't seem to pass on the request
either, which works fine for local filesystem operations as they call
fsnotify_* functions every time, but that isn't really feasible for
filesystems like cifs because we'd have to request change notification
on everything. Is there plans for implementing a mechanism to let vfs
modules get watch requests too?

cheers,
Asser



pgps8E5TYYiFC.pgp
Description: PGP signature

[PATCH] FUTEX_WAKE_OP (pthread_cond_signal speedup)

2005-08-23 Thread Jakub Jelinek

Hi!

ATM pthread_cond_signal is unnecessarily slow, because it wakes one
waiter (which at least on UP usually means an immediate context switch
to one of the waiter threads).  This waiter wakes up and after a few
instructions it attempts to acquire the cv internal lock, but that lock
is still held by the thread calling pthread_cond_signal.  So it goes
to sleep and eventually the signalling thread is scheduled in, unlocks
the internal lock and wakes the waiter again.

Now, before 2003-09-21 NPTL was using FUTEX_REQUEUE in pthread_cond_signal
to avoid this performance issue, but it was removed when locks were
redesigned to the 3 state scheme (unlocked, locked uncontended, locked
contended).

Following scenario shows why simply using FUTEX_REQUEUE in
pthread_cond_signal together with using lll_mutex_unlock_force
in place of lll_mutex_unlock is not enough and probably why it
has been disabled at that time:

The number is value in cv-__data.__lock.
thr1thr2thr3
0   pthread_cond_wait
1   lll_mutex_lock (cv-__data.__lock)
0   lll_mutex_unlock (cv-__data.__lock)
0   lll_futex_wait (cv-__data.__futex, futexval)
0   pthread_cond_signal
1   lll_mutex_lock (cv-__data.__lock)
1   pthread_cond_signal
2   lll_mutex_lock (cv-__data.__lock)
2 lll_futex_wait (cv-__data.__lock, 2)
2   lll_futex_requeue (cv-__data.__futex, 0, 1, 
cv-__data.__lock)
  # FUTEX_REQUEUE, not FUTEX_CMP_REQUEUE
2   lll_mutex_unlock_force (cv-__data.__lock)
0 cv-__data.__lock = 0
0 lll_futex_wake (cv-__data.__lock, 1)
1   lll_mutex_lock (cv-__data.__lock)
0   lll_mutex_unlock (cv-__data.__lock)
  # Here, lll_mutex_unlock doesn't know there are threads waiting
  # on the internal cv's lock

Now, I believe it is possible to use FUTEX_REQUEUE in pthread_cond_signal,
but it will cost us not one, but 2 extra syscalls and, what's worse, one
of these extra syscalls will be done for every single waiting loop in
pthread_cond_*wait.
We would need to use lll_mutex_unlock_force in pthread_cond_signal
after requeue and lll_mutex_cond_lock in pthread_cond_*wait after
lll_futex_wait.

Another alternative is to do the unlocking pthread_cond_signal needs
to do (the lock can't be unlocked before lll_futex_wake, as that is racy)
in the kernel.

I have implemented both variants, futex-requeue-glibc.patch is the
first one and futex-wake_op{,-glibc}.patch is the unlocking
inside of the kernel.  The kernel interface allows userland to specify
how exactly an unlocking operation should look like (some atomic
arithmetic operation with optional constant argument and comparison
of the previous futex value with another constant).

It has been implemented just for ppc*, x86_64 and i?86, for other
architectures I'm including just a stub header which can be used as
a starting point by maintainers to write support for their arches
and ATM will just return -ENOSYS for FUTEX_WAKE_OP.  The requeue
patch has been (lightly) tested just on x86_64, the wake_op patch
on ppc64 kernel running 32-bit and 64-bit NPTL and x86_64 kernel running
32-bit and 64-bit NPTL.

With the following benchmark on UP x86-64 I get:

for i in nptl-orig nptl-requeue nptl-wake_op; do echo time elf/ld.so 
--library-path .:$i /tmp/bench; \
for j in 1 2; do echo ( time elf/ld.so --library-path .:$i /tmp/bench ) 21; 
done; done
time elf/ld.so --library-path .:nptl-orig /tmp/bench
real 0m0.655s user 0m0.253s sys 0m0.403s
real 0m0.657s user 0m0.269s sys 0m0.388s
time elf/ld.so --library-path .:nptl-requeue /tmp/bench
real 0m0.496s user 0m0.225s sys 0m0.271s
real 0m0.531s user 0m0.242s sys 0m0.288s
time elf/ld.so --library-path .:nptl-wake_op /tmp/bench
real 0m0.380s user 0m0.176s sys 0m0.204s
real 0m0.382s user 0m0.175s sys 0m0.207s

The benchmark is at:
http://sourceware.org/ml/libc-alpha/2005-03/txt1.txt
Older futex-requeue-glibc.patch version is at:
http://sourceware.org/ml/libc-alpha/2005-03/txt2.txt
Older futex-wake_op-glibc.patch version is at:
http://sourceware.org/ml/libc-alpha/2005-03/txt3.txt
Will post a new version (just x86-64 fixes so that the patch
applies against pthread_cond_signal.S) to libc-hacker ml soon.

Attached is the kernel FUTEX_WAKE_OP patch as well as a simple-minded
testcase that will not test the atomicity of the operation, but at least
check if the threads that should have been woken up are woken up and
whether the arithmetic operation in the kernel gave the expected results.

Jakub
--- linux-2.6.12/include/linux/futex.h.jj   2005-06-17 21:48:29.0 
+0200
+++ linux-2.6.12/include/linux/futex.h  2005-08-23 11:11:41.0 +0200
@@ -4,14 +4,40 @@
 /* Second argument to futex syscall */
 
 
-#define FUTEX_WAIT (0)
-#define

irq 11: nobody cared

2005-08-23 Thread Nigel Rantor



Hail,

I posted a report a while back, no answer.

Who should I be talking to wrt to the irq 11: nobody cared issue?

I'm happy to provide as much info as possible but need to know what info 
is required.


I'm happily running 2.6.7, tried the latest and greatest (2.6.12) and 
found the problem, then started by looking at 2.6.8 and found the 
problem there too.


It happens on boot, is a showstopper and I'm wondering what, if anything 
useful I can provide you guys.


Throw me a bone...

  Nige

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: debug a high load average

2005-08-23 Thread Erik Mouw

On Tue, Aug 23, 2005 at 04:38:36PM +0530, Rajesh wrote:
 I have a case occasionally when I copy data from a usb storage (ipod) to 
 my hard drive the load average goes up from 0.4 to about 15.0, and the 
 system becomes very unusable till I kill the cp command. I have checked 
 the CPU usage, bytes read from usb device, byte written to hard drive 
 etc, and all these values are low like CPU usage is at a maximum of 30%, 
 disk read bytes is at an average of 1.5 MiB/s, disk write bytes is at 
 1.5 MiB/s, number of processes is at 110, etc, during this high load.

1.5 MB/s suggests you're using an IDE drive in PIO mode. Switch to DMA
mode (hdparm -d 1 /dev/hda) and see if it gets any better.


Erik

-- 
+-- Erik Mouw -- www.harddisk-recovery.com -- +31 70 370 12 90 --
| Lab address: Delftechpark 26, 2628 XH, Delft, The Netherlands
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: IRQ problem with PCMCIA

2005-08-23 Thread Erik Mouw

On Tue, Aug 23, 2005 at 11:31:58AM +0100, Alan Cox wrote:
 On Maw, 2005-08-23 at 09:49 +0200, Erik Mouw wrote:
  Is there any place where we can get your current patches?
 
 Which ones - the PATA IDE ones are in 2.6.11-ac, a subset in Fedora
 (other changes in the core IDE code make forward porting stuff for
 hotplug really tricky past 2.6.11).

I know about those and have been using them on my laptop.

 The SATA ones I can certainly put up if there is interest. I don't want
 to put them somewhere too available yet because this right now is stuff
 you only want to use under controlled circumstances for development
 until both they and the core SATA layer have some improvements.

That's the one I'm interested in. Yes, I do understand it can erase all
my partitions, etc.


Erik

-- 
+-- Erik Mouw -- www.harddisk-recovery.com -- +31 70 370 12 90 --
| Lab address: Delftechpark 26, 2628 XH, Delft, The Netherlands
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Ext3 Errors on Dell RAID

2005-08-23 Thread Matt Domsch

On Tue, Aug 23, 2005 at 09:05:27AM -0400, Jess Balint wrote:
 Problem:
 I get massive ext3 errors once every few days. See errors on console
 section below. Almost all commands return I/O error. I have to power
 cycle the machine to get it running again. Upon reboot, there are
 usually 3 orphan inodes deleted and everything is fine. See messages
 on reboot below.
 
 Configuration:
 System: Dell PowerEdge 6300/500, 4 CPU SMP w/2GB memory
 Discs: 3 SCSI discs in a controller-managed striped configuration
 Controller: Dell PERC-2
 kernel messages in kernel boot messages below

This looks very familiar, and given the firmware versions you mention,
is probably a known issue.  The controller firmware goes to do a cache
flush, but that doesn't complete in a sane amount of time, and
eventually the SCSI midlayer starts aborting commands and taking the
file system offline.

I don't believe a firmware update was released for your add-in PERC2
quad-channel card.  Firmware 6091 was released for the PERC3/Di ROMBs
which addresses this exact case, though other failures have been
reported on [EMAIL PROTECTED] (subscribe and read archives at
http://lists.us.dell.com) even with newer firmware.

The workarounds include:
1) disable the read and write cache using afacli.
2) mount file systems using 'noatime'.
3) backup your data, replace the controller with something newer
(disks on the onboard aic7xxx controller combined with Linux Software
RAID works quite well), recreate your RAID array on the new
controller, and restore your data from backups.

Thanks,
Matt

-- 
Matt Domsch
Software Architect
Dell Linux Solutions linux.dell.com  www.dell.com/linux
Linux on Dell mailing lists @ http://lists.us.dell.com
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: what does scsi sense means?

2005-08-23 Thread Erik Mouw

On Tue, Aug 23, 2005 at 05:07:12PM +0800, jeff shia wrote:
 in the file of aic7.c ,what is the function of the structure of
 scsi_sense?here what is the meaning of  sense?just like probe?

Return value of a failed command. Normally commands just succeed, but
if it fails, you can get sense information which tells you more about
why a particular command failed.


Erik

-- 
+-- Erik Mouw -- www.harddisk-recovery.com -- +31 70 370 12 90 --
| Lab address: Delftechpark 26, 2628 XH, Delft, The Netherlands
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] fix whitespace handling on sysfs attributes

2005-08-23 Thread Jon Smirl

The first version of this patch didn't allow for the request firmware
case which does multiple parsing passes on the parameter. This was
discussed in the thread '2.6.13-rc6-mm1'

gregkh-driver-sysfs-strip_leading_trailing_whitespace-3.patch
  should replace in 2.6.13-rc6-mm1
gregkh-driver-sysfs-strip_leading_trailing_whitespace.patch

Signed-off-by: Jon Smirl [EMAIL PROTECTED]

-- 
Jon Smirl
[EMAIL PROTECTED]
diff --git a/fs/sysfs/file.c b/fs/sysfs/file.c
--- a/fs/sysfs/file.c
+++ b/fs/sysfs/file.c
@@ -6,6 +6,7 @@
 #include linux/fsnotify.h
 #include linux/kobject.h
 #include linux/namei.h
+#include linux/ctype.h
 #include asm/uaccess.h
 #include asm/semaphore.h
 
@@ -207,8 +208,41 @@ flush_write_buffer(struct dentry * dentr
 	struct attribute * attr = to_attr(dentry);
 	struct kobject * kobj = to_kobj(dentry-d_parent);
 	struct sysfs_ops * ops = buffer-ops;
+	size_t ws_count = count, leading = 0;
+	int ret = 0;
+	char *x;
 
-	return ops-store(kobj,attr,buffer-page,count);
+	/* locate trailing white space */
+	while ((ws_count  0)  isspace(buffer-page[ws_count - 1]))
+		ws_count--;
+	if (ws_count == 0)
+		return count;
+
+	/* locate leading white space */
+	x = buffer-page;
+	while (isspace(*x))
+		x++;
+	leading = x - buffer-page;
+	ws_count -= leading;
+
+	/* interface is still ambigous about this */
+	/* string is both passed by length and terminated */
+	if (ws_count != PAGE_SIZE)
+		x[ws_count] = '\0';
+
+	ret = ops-store(kobj, attr, x, ws_count);
+
+	/* is it an error? */
+	if (ret  0) 
+		return ret;
+
+	/* the whole string was consumed */
+	if (ret == ws_count)
+		return count;
+
+	/* only part of the string was consumed */
+	/* return count can not include trailing space */
+	return leading + ret;
 }

Re: [PATCH] mm: return ENOBUFS instead of ENOMEM in generic_file_buffered_write

2005-08-23 Thread Dmitry Torokhov

On 8/23/05, Christoph Hellwig [EMAIL PROTECTED] wrote:
 On Tue, Aug 23, 2005 at 11:46:33AM +0300, Pekka J Enberg wrote:
  As noticed by Dmitry Torokhov, write() can not return ENOMEM:
 
  http://www.opengroup.org/onlinepubs/95399/functions/write.html
 
  Therefore fixup generic_file_buffered_write() in mm/filemap.c (pointed out 
  by
  Nathan Scott).
 
 We had this discussion before, for EACCESS then.  We've always been returning
 more errnos than SuS mentioned and Linus declared it's fine.
 

So does that mean that any error code is allowed? I would love to be
able to return ENODEV from a sysfs attribute if its device happens to
be removed in process. Is there a list of valid errnos for Linux that
supercedes SuS?

-- 
Dmitry
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

RE: CONFIG_PRINTK_TIME woes

2005-08-23 Thread Luck, Tony

I'd hate to have to test for something for CONFIG_PRINTK_TIME
every time sched_clock() is being called.

Me too.

The quick fix would seem to be to only allow CONFIG_PRINTK_TIME
from kernel cmdline to make it happen a bit later. So basically
make int printk_time = 0 until command line is evaluated.

Good thought, but this won't work for ia64 in the hot-plug cpu case.
There are a couple of printk() calls by new cpus as they boot before
they have set-up their per-cpu areas.  So there is no global state
that can be checked to decide whether it is safe for printk() to
call sched_clock().

-Tony
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] Posix file attribute support on VFAT (take #2)

2005-08-23 Thread Lennart Sorensen

On Mon, Aug 22, 2005 at 01:46:29PM +0200, Pavel Machek wrote:
 Unfortunately, it makes sense. If you have compact flash card, you
 really want to have VFAT there, so that it is a) compatible with
 windows and b) so that you don't kill the hardware.

VFAT is plenty good at killing hardware.  It's a terrible filesystem for
flash cards (if they don't do their own wear leveling properly).  Most
of the linux filesystems may not be any better but they are also no
worse.  Windows compatibility is completely irrelevant if the card is
being used as your root filesystem since any extensions you make to vfat
wouldn't be understood by windows anyhow, so at best it makes a mess of
it.

 I guess being able to use CF card for root filesystem is usefull,
 too

I run ext3 on CF and so far, no problems.  I run with noatime and try to
avoid writing in general as much as possible.  VFAT would be crap since,
well, I run linux on the system.

Len Sorensen
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

another Followup on 2.6.13-rc3 ACPI processor C-state regression

2005-08-23 Thread Daniel Nofftz

(It looks like my first try to send this message as a reply to the Followup
... didn't work. If it worked: sorry for double-post)

I use 2.6.13-rc6-mm1 which includes the patch as far as i can see, but
the C2 idle state (which my processor definetly supports) isn't
detected . it also isn't detected with 2.6.13-rc6 or 2.6.12.5 . but it definetly
worked with some older 2.6.x kernel.

is there any way to enforce using c2 ? so that you could say that the
acpi system uses c2 even if it is unable to detect that it is supported
?

daniel
(please CC me, cause i am not on the list at the moment)

-- 
# Daniel Nofftz .. #



This message was sent using IMP, the Internet Messaging Program.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: PROBLEM: Incorrect RAM Detected at kernel init

2005-08-23 Thread Lennart Sorensen

On Sun, Aug 21, 2005 at 11:27:51PM -0400, Terry wrote:
 Not sure if I have provided enough info, or to much info, but here it goes:
 
 [1.] One line summary of the problem:
 Not Detecting all the memory installed in the system.
 
 [2.] Full description of the problem/report:
 I have Linux Kernel 2.4.31 running on a Compaq 5000R server with 2 PPro 200
 processors, 768M RAM, RealTeck 8139 Network Card, and Compaq Smart 2 Raid
 controller with 5 9.1G drives in Raid 5 configuration.
 The kernel appears to compile perfectly, installs fine, but after reboot it
 is only reporting 16M of RAM. I have tried with and without the mem=768M
 boot up option in the lilo.conf script. All other modules and boot up
 includes appear to run perfectly fine. I had a 2.4.18 kernel running on this
 box just fine, detected all 768M of RAM and ran perfectly. The 2.4.31 Kernel
 runs almost perfectly, the only hold back is the false detection of memory.

Compaq machines of that era are known to have non standard bios methods
for identifying ram.  Do a google search for how to pass memory maps to
2.6 kernels on a compaq.

ie something like:

mem=exactmap [EMAIL PROTECTED] [EMAIL PROTECTED]

Add that to the kernel command line when booting and see what happens.

Len Sorensen
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] FUTEX_WAKE_OP (pthread_cond_signal speedup)