Re: ESP DMA and sbus (was: HyperSPARC patches (?))

2006-05-04 Thread Martin Habets
On Thu, Apr 27, 2006 at 06:39:57PM +0100, Martin Habets wrote:
 I plan to try the eh locking patch posted next, and maybe something like
 2.6.8 after that if it is still not working. Unless someone has a better
 proposal.

An update on this: the eh locking patch did not help. 2.6.8 caused an
oops on boot. I tried 2.6.17-rc3, but loading any module fails with:
  module crc32: Unknown relocation: 17
so that is not usable. I sure hope someone will fix this before 2.6.17
proper...

Some other pieces of random info:

- After reading another thread here I switched from gcc 3.3.5 to gcc 3.4.4.

- The problem only affects one controller at a time, so for people seeing
this on hard disks I would recommend moving one to another controller
(and report if that helped).

- Another important piece of information to gather is the DMA
version. This is displayed when the kernel boot, in my case:
 dma2: ESC Revision 1

I have attached my changes to esp.c, which dump more data.
The changes to ESPDATA and ESPSTAT show how you can restrict the debug
output to just one controller.

My latest error output with those changes:
palantir9:~# tar tvf /dev/nst0
drwxr-xr-x root/root 0 2006-04-27 03:48:05 boot/
... more good output ...
-rw-r--r-- root/root   2320481 2004-05-20 17:53:25 boot/vmlinux-2.4.26-sparc32
esp1: DMA error 4004430e
esp1: dumping state
esp1: dma -- cond_reg4004430e addrf040 count1fde
dma: running0 allocated1 addr0 nbytes real
esp1: SW [sreg01 sstep04 ireg10]
STEPCMD_SENT_OK INTREG BSERV 
esp1: HW reread [sreg01 sstepc4 ireg00]
STEPCMD_SENT_OK INTREG 
esp1: current command [tgt00 lun00 pphaseCLUELESS cphaseDATAIN]
esp1: disconnected 
esp1: Resetting scsi bus
esp1: SCSI bus reset interrupt
st0: Error 8 (sugg. bt 0x0, driver bt 0x0, host bt 0x8).
tar: /dev/nst0: Cannot read: Input/output error
esp1: Warning, live target 0 not responding to selection.
st0: Error 6 (sugg. bt 0x0, driver bt 0x0, host bt 0x6).
tar: /dev/nst0: Cannot read: Input/output error

The dma address inside sbus_dma being zero looks not right to me.

Any hints/tips welcome...
-- 
Martin
--- esp.c.orig  2006-04-28 13:49:41.0 +0100
+++ esp.c   2006-05-04 17:00:33.0 +0100
@@ -56,11 +56,11 @@
 /*#define DEBUG_ESP_DATA*/
 /* #define DEBUG_ESP_QUEUE */
 /*#define DEBUG_ESP_DISCONNECT*/
-/*#define DEBUG_ESP_STATUS*/
+/* #define DEBUG_ESP_STATUS */
 /* #define DEBUG_ESP_PHASES */
 /* #define DEBUG_ESP_WORKBUS */
 /* #define DEBUG_STATE_MACHINE */
-/* #define DEBUG_ESP_CMDS */
+/*#define DEBUG_ESP_CMDS*/
 /* #define DEBUG_ESP_IRQS */
 /* #define DEBUG_SDTR */
 /* #define DEBUG_ESP_SG */
@@ -84,7 +84,7 @@
 #endif
 
 #if defined(DEBUG_ESP_DATA)
-#define ESPDATA(foo)  printk foo
+#define ESPDATA(foo)  if (esp-esp_id == 1) printk foo
 #else
 #define ESPDATA(foo)
 #endif
@@ -102,7 +102,7 @@
 #endif
 
 #if defined(DEBUG_ESP_STATUS)
-#define ESPSTAT(foo)  printk foo
+#define ESPSTAT(foo)  if (esp-esp_id == 1) printk foo
 #else
 #define ESPSTAT(foo)
 #endif
@@ -386,8 +386,10 @@
 #ifdef DEBUG_ESP_CMDS
 static inline void esp_cmd(struct esp *esp, u8 cmd)
 {
+   if (esp-esp_id == 1) {
esp-espcmdlog[esp-espcmdent] = cmd;
esp-espcmdent = (esp-espcmdent + 1)  31;
+   }
sbus_writeb(cmd, esp-eregs + ESP_CMD);
 }
 #else
@@ -490,6 +492,7 @@
if (esp-dma-revision != dvmahme) {
tmp = sbus_readl(esp-dregs + DMA_CSR);
sbus_writel(tmp | DMA_RST_SCSI, esp-dregs + DMA_CSR);
+   __delay(400);   /* let the bits set ;) */
sbus_writel(tmp  ~DMA_RST_SCSI, esp-dregs + DMA_CSR);
}
switch (esp-dma-revision) {
@@ -1888,22 +1891,42 @@
 static void esp_dump_state(struct esp *esp)
 {
struct scsi_cmnd *SCptr = esp-current_SC;
+   struct sbus_dma *dma_ptr = esp-dma;
+   int sstep, ireg;
+   __u32   addr;
 #ifdef DEBUG_ESP_CMDS
int i;
 #endif
 
ESPLOG((esp%d: dumping state\n, esp-esp_id));
-   ESPLOG((esp%d: dma -- cond_reg%08x addr%08x\n,
+   addr = sbus_readl(esp-dregs + DMA_ADDR);
+   ESPLOG((esp%d: dma -- cond_reg%08x addr%08x count%08x\n,
esp-esp_id,
sbus_readl(esp-dregs + DMA_CSR),
-   sbus_readl(esp-dregs + DMA_ADDR)));
+   addr,
+   sbus_readl(esp-dregs + DMA_COUNT)));
+   ESPLOG((dma: running%d allocated%d addr%lx nbytes%08x 
real%08x\n,
+   dma_ptr-running, dma_ptr-allocated, dma_ptr-addr, 
dma_ptr-nbytes, dma_ptr-realbytes));
ESPLOG((esp%d: SW [sreg%02x sstep%02x ireg%02x]\n,
esp-esp_id, esp-sreg, esp-seqreg, esp-ireg));
+#ifdef DEBUG_ESP
+   esp_print_seqreg(esp-seqreg);
+   ESPLOG(( ));
+   esp_print_ireg(esp-ireg);
+   ESPLOG((\n));
+#endif
+   sstep = sbus_readb(esp-eregs + ESP_SSTEP);
+   ireg = sbus_readb(esp-eregs + ESP_INTRPT);
ESPLOG((esp%d: HW reread [sreg%02x sstep%02x ireg%02x]\n,

Re: HyperSPARC patches (?)

2006-04-27 Thread BERTRAND Joël

Jurij Smakov a écrit :

On Mon, 17 Apr 2006, BERTRAND Joël wrote:

Thanks. But I cannot try on my SS20 (HyperSPARC) due to a bug in 
initramfs-tools (HyperSPARC cannot use a ramfs to boot). Can you send 
to me a patch file with your fixes ?



I'm booting my SS20 (with HyperSPARC CPU) quite happily with initramfs 
generated by yaird, after adding esp explicitly to 
/etc/yaird/Default.cfg. I tried initramfs-tools and it does look broken, 
could you please file a bug so that we can keep track of this problem?


Anyway, the patch in question is attached. Note that it's the actual 
patch downloaded from git, so you need to apply it with -R (reversed) to 
undo its effect.


Hello,

	I have booted a SS20 with dual SM71 (and internal raid1). Kernel is an 
official 2.6.16.10 with esp patch and this morning, it hangs with the 
same error when I launched apt-get update.


Regards,

JKB


--
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]



Re: HyperSPARC patches (?)

2006-04-25 Thread BERTRAND Joël

Jurij Smakov a écrit :

On Mon, 24 Apr 2006, BERTRAND Joël wrote:

I have tested your patch with a SuperSPARC-II/smp workstation and 
I cannot obtain any error. I have seen that the last release candidate 
(2.6.17-rc2) is given with this patch. Thus, I have tried to boot in 
smp configuration, but kernel panics very early.



Ok, just to be absolutely clear: when you say with your patch, do you 
mean with the locking change reverted? Does it work fine with UP 
kernel on an SMP workstation? At this point I don't care too much about 
SMP kernels :-), just want to positively establish that reverting the 
locking change does have positive effects on UP kernels.


	I use a patched kernel tree (2.6.17-rc1 with smp patch and locking 
change reverted). I have built this kernel without smp and test it with 
two SuperSPARC-II, 384 Mbytes and a CG6 framebuffer. With smp support 
(up to 4 processors), the same kernel boots fine. It panics when I 
remove the CG6 framebuffer to use a CG14.


	With one or two HyperSPARC/200, the same kernel boots but is very 
instable : tar xvfj linux-2.6.16.10.tar.bz2 for exemple returns a 
ssegmentation fault. Some ESP DMA errors are now returned by kernel... 
If I only use 128 MBytes without HIGHMEM, this kernel is more stable, 
but not very very stable ;-)


	I have tried to boot the following configuration : two HyperSPARC/200, 
kernel that works on UP-SuperSPARC with lock reverted, and cachesize=0 
on the kernel command line. Same result. Thus, I'm pretty sure that the 
bug I can seen come from HyperSPARC specific MMU and not from the cache. 
I expect that cachesize option works on sparc kernel...


	I don't know where we can use a Sparc(/STATION/SERVER/) sun4m with 
HyperSPARC. Thus, I will install SuperSPARC on one of my SS20 to test 
the last stable kernel with the lock patch. Have you seen that the 
rc-1/2 can not be used with CG14 (or prom console...) ?


Regards,

JKB


--
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]



Re: HyperSPARC patches (?)

2006-04-24 Thread BERTRAND Joël

Jurij Smakov a écrit :

On Thu, 20 Apr 2006, BERTRAND Joël wrote:

No, I'm not able to get a backtrace or any other log because my 
workstation does not respond (all physical disks are deconnected). It 
can do an Oops, but it prints on console a lot of SCSI error, thus I 
do not have the time to read the backtrace.



Sorry for the mixup, I was responding to Ludovic who reported that he 
sees a kernel panic, but didn't make it explicitly clear :-P.


I can try. Can you give me a tarball of the same kernel you test ? 
Thus, we are sure that we test the same kernel.



I'm using the Debian's kernels from unstable, just apt-get source 
linux-2.6 to get the tree. If we can positively confirm that reverting 
the locking change fixes the problems at least on UP machines, I'll push 
this information upstream so that this change can be reviewed.


	I have tested your patch with a SuperSPARC-II/smp workstation and I 
cannot obtain any error. I have seen that the last release candidate 
(2.6.17-rc2) is given with this patch. Thus, I have tried to boot in smp 
configuration, but kernel panics very early.


Regards,

JKB


--
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]



Re: HyperSPARC patches (?)

2006-04-24 Thread Jurij Smakov

On Mon, 24 Apr 2006, BERTRAND Joël wrote:

	I have tested your patch with a SuperSPARC-II/smp workstation and I 
cannot obtain any error. I have seen that the last release candidate 
(2.6.17-rc2) is given with this patch. Thus, I have tried to boot in smp 
configuration, but kernel panics very early.


Ok, just to be absolutely clear: when you say with your patch, do you 
mean with the locking change reverted? Does it work fine with UP kernel 
on an SMP workstation? At this point I don't care too much about SMP 
kernels :-), just want to positively establish that reverting the locking 
change does have positive effects on UP kernels.


Thanks a lot for testing!

Jurij Smakov[EMAIL PROTECTED]
Key: http://www.wooyd.org/pgpkey/   KeyID: C99E03CC

Re: HyperSPARC patches (?)

2006-04-24 Thread Jurij Smakov

On Sun, 23 Apr 2006, Ludovic Court�s wrote:


I tried running 2.6.16 UP (from the package currently in the archive)
but ran into various problems which I'll describe in another mail...


Please do, if it's the problem with esp not included in initrd, it's a 
known issue, you just need to add the module manually at this point.



Sorry for not being more responsive and helpful.  I'll need to go ahead
and cross-compile my sparc32 kernels from my U5 at some point.  ;-)


The official kernels for sparc32 are built on sparc64, so if you'll get 
the official source package, you can apply your patches and rebuild it 
just for the sparc32 flavour (see kernel-handbook.alioth.debian.org for 
details).



Or, can you make 2.6.14 UP kernel packages available somewhere?


Sorry, I don't think I've ever built 2.6.14. You can try putting the 
debian directory from 2.6.16 into the 2.6.14 tree and see what happens, if 
you feel like an adventure :-).


Best regards,

Jurij Smakov[EMAIL PROTECTED]
Key: http://www.wooyd.org/pgpkey/   KeyID: C99E03CC

Re: HyperSPARC patches (?)

2006-04-23 Thread Ludovic Courtès
Hi Jurij,

3 days, 15 hours, 21 minutes, 17 seconds ago, 
Jurij Smakov wrote:
 On Tue, 18 Apr 2006, Ludovic Court?s wrote:
 
 In fact, I was using 2.6.14 (with Bob Breuer's SMP patch) which does
 _not_ have this lock sequence change.
 
 Now, I tried to apply this lock sequence change to my 2.6.14 tree to see
 what happens and, well, the kernel does not hang as early as before but
 I'm still getting DMA error messages and it panics soon enough as
 well.
 
 Where does it panic? Where you able to get a backtrace?

No, sorry, I don't have a backtrace.  I need to recompile it with the
doubtful locking change and we'll see (I'm compiling natively...).

 So far the reports on kernels with locking change reverted look like this:
 
 Me, HyperSPARC processors, UP kernel  - DMA problems gone.
 Bertrand, HyperSPARC CPUs, SMP kernel - DMA problems still present.
 Bertrand, SuperSPARC CPUs, SMP kernel - DMA problems gone.
 You, HyperSparc CPUs, SMP kernel  - DMA problems still present.

[With You == Ludovic and Me == Jurij]

I have SuperSPARCs (390Z50), not HyperSPARCs.

 From this data set it appears that DMA problems is some combination of the 
 locking patch and SMP patch on HyperSPARCs only. It would be really useful 
 if you could try running HyperSPARC with UP kernel and locking change 
 reverted, that would at least make it clear that locking change is 
 responsible for at least a part of the problem.

I tried running 2.6.16 UP (from the package currently in the archive)
but ran into various problems which I'll describe in another mail...

Sorry for not being more responsive and helpful.  I'll need to go ahead
and cross-compile my sparc32 kernels from my U5 at some point.  ;-)

Or, can you make 2.6.14 UP kernel packages available somewhere?

Thanks,
Ludovic.


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]



Re: HyperSPARC patches (?)

2006-04-20 Thread BERTRAND Joël

Jurij Smakov a écrit :

On Tue, 18 Apr 2006, Ludovic Court�s wrote:


In fact, I was using 2.6.14 (with Bob Breuer's SMP patch) which does
_not_ have this lock sequence change.

Now, I tried to apply this lock sequence change to my 2.6.14 tree to see
what happens and, well, the kernel does not hang as early as before but
I'm still getting DMA error messages and it panics soon enough as
well.



Where does it panic? Where you able to get a backtrace?


	No, I'm not able to get a backtrace or any other log because my 
workstation does not respond (all physical disks are deconnected). It 
can do an Oops, but it prints on console a lot of SCSI error, thus I do 
not have the time to read the backtrace.



So far the reports on kernels with locking change reverted look like this:

Me, HyperSPARC processors, UP kernel  - DMA problems gone.
Bertrand, HyperSPARC CPUs, SMP kernel - DMA problems still present.
Bertrand, SuperSPARC CPUs, SMP kernel - DMA problems gone.
You, HyperSparc CPUs, SMP kernel  - DMA problems still present.

From this data set it appears that DMA problems is some combination of 
the locking patch and SMP patch on HyperSPARCs only. It would be really 
useful if you could try running HyperSPARC with UP kernel and locking 
change reverted, that would at least make it clear that locking change 
is responsible for at least a part of the problem.


	I can try. Can you give me a tarball of the same kernel you test ? 
Thus, we are sure that we test the same kernel.


Regards,

JKB


--
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]



Re: HyperSPARC patches (?)

2006-04-20 Thread Jurij Smakov

On Thu, 20 Apr 2006, BERTRAND Joël wrote:

	No, I'm not able to get a backtrace or any other log because my 
workstation does not respond (all physical disks are deconnected). It can do 
an Oops, but it prints on console a lot of SCSI error, thus I do not have the 
time to read the backtrace.


Sorry for the mixup, I was responding to Ludovic who reported that he sees 
a kernel panic, but didn't make it explicitly clear :-P.


	I can try. Can you give me a tarball of the same kernel you test ? 
Thus, we are sure that we test the same kernel.


I'm using the Debian's kernels from unstable, just apt-get source 
linux-2.6 to get the tree. If we can positively confirm that reverting the 
locking change fixes the problems at least on UP machines, I'll push this 
information upstream so that this change can be reviewed.


Thanks for testing,

Jurij Smakov[EMAIL PROTECTED]
Key: http://www.wooyd.org/pgpkey/   KeyID: C99E03CC

Re: HyperSPARC patches (?)

2006-04-19 Thread Jurij Smakov

On Tue, 18 Apr 2006, Ludovic Court�s wrote:


In fact, I was using 2.6.14 (with Bob Breuer's SMP patch) which does
_not_ have this lock sequence change.

Now, I tried to apply this lock sequence change to my 2.6.14 tree to see
what happens and, well, the kernel does not hang as early as before but
I'm still getting DMA error messages and it panics soon enough as
well.


Where does it panic? Where you able to get a backtrace?

So far the reports on kernels with locking change reverted look like this:

Me, HyperSPARC processors, UP kernel  - DMA problems gone.
Bertrand, HyperSPARC CPUs, SMP kernel - DMA problems still present.
Bertrand, SuperSPARC CPUs, SMP kernel - DMA problems gone.
You, HyperSparc CPUs, SMP kernel  - DMA problems still present.

From this data set it appears that DMA problems is some combination of the 
locking patch and SMP patch on HyperSPARCs only. It would be really useful 
if you could try running HyperSPARC with UP kernel and locking change 
reverted, that would at least make it clear that locking change is 
responsible for at least a part of the problem.


Best regards,

Jurij Smakov[EMAIL PROTECTED]
Key: http://www.wooyd.org/pgpkey/   KeyID: C99E03CC

Re: HyperSPARC patches (?)

2006-04-18 Thread Ludovic Courtès
Hi,

One day, 23 hours, 10 minutes, 12 seconds ago, 
Jurij Smakov wrote:
 Ok, I did some digging. Since people also see this bug on Ultra1, it must 
 be esp-specific. Most recent change to esp.c dates to February 2006, all 
 other changes are over a year old, so it's pretty unlikely that they are 
 the cause of this problem. The recent change [0], on the other hand, looks 
 pretty suspicious to me, since it changes the locking order. I've built a 
 kernel with this change reverted, and was not able to crash the machine by 
 running two concurrent loops of unpacking/removing the kernel tree for an 
 hour. I'd appreciate if you could test this kernel, it is available at 
 [1]. By the way, this problem has been also reported to kernel's bugzilla
 as bug 6344 [2].
 
 [0] 
 http://www.kernel.org/git/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=a6ceda7457b2303dcb07d3c472b25d52bbdb5a29
 [1] 
 http://www.wooyd.org/debian/kernels/linux-image-2.6.16-1-sparc32_2.6.16-6_sparc.deb
 [2] http://bugzilla.kernel.org/show_bug.cgi?id=6344

In fact, I was using 2.6.14 (with Bob Breuer's SMP patch) which does
_not_ have this lock sequence change.

Now, I tried to apply this lock sequence change to my 2.6.14 tree to see
what happens and, well, the kernel does not hang as early as before but
I'm still getting DMA error messages and it panics soon enough as
well.

I also tried to remove on out of the two SCSI disks without any success.

Thanks,
Ludovic.


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]



Re: HyperSPARC patches (?)

2006-04-17 Thread BERTRAND Joël

Jurij Smakov a écrit :

On Tue, 11 Apr 2006, Ludovic Court�s wrote:


Just in case it is of interest to you, I got the same sort of problem
with an SS20 equipped with 2 SuperSPARCs, 2 hard disks and a 2.6 kernel:

 http://article.gmane.org/gmane.linux.ports.sparc/5979

I haven't had time to dig into it any further so far.



Ok, I did some digging. Since people also see this bug on Ultra1, it 
must be esp-specific. Most recent change to esp.c dates to February 
2006, all other changes are over a year old, so it's pretty unlikely 
that they are the cause of this problem. The recent change [0], on the 
other hand, looks pretty suspicious to me, since it changes the locking 
order. I've built a kernel with this change reverted, and was not able 
to crash the machine by running two concurrent loops of 
unpacking/removing the kernel tree for an hour. I'd appreciate if you 
could test this kernel, it is available at [1]. By the way, this problem 
has been also reported to kernel's bugzilla

as bug 6344 [2].


	Thanks. But I cannot try on my SS20 (HyperSPARC) due to a bug in 
initramfs-tools (HyperSPARC cannot use a ramfs to boot). Can you send to 
me a patch file with your fixes ?


Regards,

JKB


--
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]



Re: HyperSPARC patches (?)

2006-04-17 Thread Jurij Smakov

On Mon, 17 Apr 2006, BERTRAND Joël wrote:

	Thanks. But I cannot try on my SS20 (HyperSPARC) due to a bug in 
initramfs-tools (HyperSPARC cannot use a ramfs to boot). Can you send to me a 
patch file with your fixes ?


I'm booting my SS20 (with HyperSPARC CPU) quite happily with initramfs 
generated by yaird, after adding esp explicitly to /etc/yaird/Default.cfg. 
I tried initramfs-tools and it does look broken, could you please file a 
bug so that we can keep track of this problem?


Anyway, the patch in question is attached. Note that it's the actual patch 
downloaded from git, so you need to apply it with -R (reversed) to undo 
its effect.


Best regards,

Jurij Smakov[EMAIL PROTECTED]
Key: http://www.wooyd.org/pgpkey/   KeyID: C99E03CCFrom: Christoph Hellwig [EMAIL PROTECTED]
Date: Wed, 22 Feb 2006 22:35:52 + (-0800)
Subject: [SCSI] esp: fix eh locking
X-Git-Tag: v2.6.16-rc5
X-Git-Url: 
http://www.kernel.org/git/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=a6ceda7457b2303dcb07d3c472b25d52bbdb5a29

[SCSI] esp: fix eh locking

esp_reset didn't get fixed when the EH locking changed.
-eh_bus_reset_handler is now called without the host lock held.

Signed-off-by: Christoph Hellwig [EMAIL PROTECTED]
Signed-off-by: David S. Miller [EMAIL PROTECTED]
---

--- a/drivers/scsi/esp.c
+++ b/drivers/scsi/esp.c
@@ -2068,14 +2068,12 @@ static int esp_reset(struct scsi_cmnd *S
 {
struct esp *esp = (struct esp *) SCptr-device-host-hostdata;
 
+   spin_lock_irq(esp-ehost-host_lock);
(void) esp_do_resetbus(esp);
-
spin_unlock_irq(esp-ehost-host_lock);
 
wait_event(esp-reset_queue, (esp-resetting_bus == 0));
 
-   spin_lock_irq(esp-ehost-host_lock);
-
return SUCCESS;
 }
 


Re: HyperSPARC patches (?)

2006-04-16 Thread Jurij Smakov

On Tue, 11 Apr 2006, Ludovic Court�s wrote:


Just in case it is of interest to you, I got the same sort of problem
with an SS20 equipped with 2 SuperSPARCs, 2 hard disks and a 2.6 kernel:

 http://article.gmane.org/gmane.linux.ports.sparc/5979

I haven't had time to dig into it any further so far.


Ok, I did some digging. Since people also see this bug on Ultra1, it must 
be esp-specific. Most recent change to esp.c dates to February 2006, all 
other changes are over a year old, so it's pretty unlikely that they are 
the cause of this problem. The recent change [0], on the other hand, looks 
pretty suspicious to me, since it changes the locking order. I've built a 
kernel with this change reverted, and was not able to crash the machine by 
running two concurrent loops of unpacking/removing the kernel tree for an 
hour. I'd appreciate if you could test this kernel, it is available at 
[1]. By the way, this problem has been also reported to kernel's bugzilla

as bug 6344 [2].

[0] 
http://www.kernel.org/git/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=a6ceda7457b2303dcb07d3c472b25d52bbdb5a29
[1] 
http://www.wooyd.org/debian/kernels/linux-image-2.6.16-1-sparc32_2.6.16-6_sparc.deb
[2] http://bugzilla.kernel.org/show_bug.cgi?id=6344

Best regards,

Jurij Smakov[EMAIL PROTECTED]
Key: http://www.wooyd.org/pgpkey/   KeyID: C99E03CC

Re: HyperSPARC patches (?)

2006-04-11 Thread Ludovic Courtès
Hi,

Today, 16 hours, 59 minutes, 24 seconds ago, Jurij Smakov wrote:
 Right, I was able to reproduce the problem on my SS20. While apt-get
 updating/upgrading the following messages started scrolling through 
 console:
 
 esp0: DMA error a440030f
 esp0: Resetting scsi bus
 esp0: DMA error a440030f
 esp0: Resetting scsi bus
 esp0: DMA error a440030f
 esp0: Resetting scsi bus

Just in case it is of interest to you, I got the same sort of problem
with an SS20 equipped with 2 SuperSPARCs, 2 hard disks and a 2.6 kernel:

  http://article.gmane.org/gmane.linux.ports.sparc/5979

I haven't had time to dig into it any further so far.

Thanks,
Ludovic.


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]



Re: HyperSPARC patches (?)

2006-04-10 Thread Jurij Smakov

On Fri, 7 Apr 2006, BERTRAND Joël wrote:


I have tested with one, two (raid1), three (raid5) or seven
(raid1+raid5) SCSI disks. The observed mistake seems to come from the
esp adapter, not from the HyperSPARC (but quickly appears with
HyperSPARC than Super- or MicroSPARC). In /var/log/message, I can see :

Apr  6 20:36:01 lebegue kernel: esp0: DMA error a440030e
Apr  6 20:36:01 lebegue kernel: esp0: Resetting scsi bus
Apr  6 20:36:01 lebegue kernel: esp0: SCSI bus reset interrupt
Apr  6 20:36:01 lebegue kernel: esp0: SCSI bus reset interrupt
Apr  6 20:36:01 lebegue kernel: esp0: Warning, live target 3 not
responding to selection.
Apr  6 20:36:01 lebegue kernel: esp0: Warning, live target 1 not
responding to selection.
Apr  6 20:36:02 lebegue kernel: esp0: Resetting scsi bus
Apr  6 20:36:02 lebegue kernel: esp0: SCSI bus reset interrupt
Apr  6 20:36:02 lebegue kernel: esp0: SCSI bus reset interrupt
Apr  6 20:36:02 lebegue kernel: esp0: Warning, live target 3 not
responding to selection.
Apr  6 20:36:02 lebegue kernel: esp0: Warning, live target 1 not
responding to selection.
Apr  6 20:36:03 lebegue kernel: esp0: Resetting scsi bus
Apr  6 20:36:03 lebegue kernel: esp0: SCSI bus reset interrupt
Apr  6 20:36:03 lebegue kernel: esp0: SCSI bus reset interrupt
Apr  6 20:36:03 lebegue kernel: esp0: Warning, live target 3 not
responding to selection.
Apr  6 20:36:03 lebegue kernel: esp0: Resetting scsi bus
Apr  6 20:36:03 lebegue kernel: esp0: SCSI bus reset interrupt
Apr  6 20:36:03 lebegue kernel: esp0: SCSI bus reset interrupt
Apr  6 20:36:13 lebegue kernel: sd 0:0:3:0: scsi: Device offlined - not
ready after error recovery


Right, I was able to reproduce the problem on my SS20. While apt-get
updating/upgrading the following messages started scrolling through 
console:


esp0: DMA error a440030f
esp0: Resetting scsi bus
esp0: DMA error a440030f
esp0: Resetting scsi bus
esp0: DMA error a440030f
esp0: Resetting scsi bus

I'll see what I can do about it.

Best regards,

Jurij Smakov[EMAIL PROTECTED]
Key: http://www.wooyd.org/pgpkey/   KeyID: C99E03CC

Re: HyperSPARC patches (?)

2006-04-07 Thread BERTRAND Joël

Hendrik Sattler a écrit :

Am Freitag, 7. April 2006 04:42 schrieb Jurij Smakov:


Hm, I believe that no patches should be necessary. I've finally managed to
install 2.6.16 on my ss20 with hypersparc processor, it worked ok so far,
but I wasn't really stressing it. From experience, it is really picky
about memory layout, and a total of 448MB strikes me as slightly odd. Try
leaving something like 128MB and see whether the situation will improve.




448MB (7x 64MB) is the best you get when the VSIMM is installed to be able to 
use the CG14 SX frame buffer.
There's only one memory slot you can use for the VSIMM, so there is not much 
you can do wrong with memory layout.


Thanks for your answer. I have tested some configurations with a lot of
sun4m workstations (SS5 with MicroSPARC-II/85, SS20 with one or two
SuperSPARC-II/75, SS20 with one, two or four HyperSPARC/200). Memory
configuration were :
- 256 MBytes
- 448 MBytes + VSIMM
- 512 MBytes

I have tested with one, two (raid1), three (raid5) or seven
(raid1+raid5) SCSI disks. The observed mistake seems to come from the
esp adapter, not from the HyperSPARC (but quickly appears with
HyperSPARC than Super- or MicroSPARC). In /var/log/message, I can see :

Apr  6 20:36:01 lebegue kernel: esp0: DMA error a440030e
Apr  6 20:36:01 lebegue kernel: esp0: Resetting scsi bus
Apr  6 20:36:01 lebegue kernel: esp0: SCSI bus reset interrupt
Apr  6 20:36:01 lebegue kernel: esp0: SCSI bus reset interrupt
Apr  6 20:36:01 lebegue kernel: esp0: Warning, live target 3 not
responding to selection.
Apr  6 20:36:01 lebegue kernel: esp0: Warning, live target 1 not
responding to selection.
Apr  6 20:36:02 lebegue kernel: esp0: Resetting scsi bus
Apr  6 20:36:02 lebegue kernel: esp0: SCSI bus reset interrupt
Apr  6 20:36:02 lebegue kernel: esp0: SCSI bus reset interrupt
Apr  6 20:36:02 lebegue kernel: esp0: Warning, live target 3 not
responding to selection.
Apr  6 20:36:02 lebegue kernel: esp0: Warning, live target 1 not
responding to selection.
Apr  6 20:36:03 lebegue kernel: esp0: Resetting scsi bus
Apr  6 20:36:03 lebegue kernel: esp0: SCSI bus reset interrupt
Apr  6 20:36:03 lebegue kernel: esp0: SCSI bus reset interrupt
Apr  6 20:36:03 lebegue kernel: esp0: Warning, live target 3 not
responding to selection.
Apr  6 20:36:03 lebegue kernel: esp0: Resetting scsi bus
Apr  6 20:36:03 lebegue kernel: esp0: SCSI bus reset interrupt
Apr  6 20:36:03 lebegue kernel: esp0: SCSI bus reset interrupt
Apr  6 20:36:13 lebegue kernel: sd 0:0:3:0: scsi: Device offlined - not
ready after error recovery

I have seen these messages on a sparc64 U1 (not a U1E) with ESP
(FAS100A) and on the SS5/SS20 I use with 2.6.16.1 or 2.6.17-rc1 kernel.
No problem on a very loaded U1E (with HME ESP) and 8 disks (raid1+raid6).

I have tried to find a workaround without any result. I think that the
trouble come from the Sbus support, but I do not understand why all but
esp FAS100A sbus modules work fine.

Regards,

JKB


--
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]



Re: HyperSPARC patches (?)

2006-04-06 Thread Jurij Smakov

On Mon, 3 Apr 2006, BERTRAND Jo�l wrote:


Hello,

	I have a SparcSTATION 20 with dual SuperSPARC-II/75, 448 MB of RAM, 
and a 4MB-VSIMM for the internal CG14. It runs fine with a 2.6.16.1 linux 
kernel (official / no-SMP). If I replace both processors by one (or two) 
HyperSPARC/200 (from ROSS), the workstation boots but... dpkg returns 
segfaults, esp driver seems to be broken (SCSI bus reset...)... Is there 
anywhere a patch to fix this ? I have seens some patches for 2.6.11 kernel 
(sparc32-hypersparc-srmmu.patch), but no one for recent kernel.


Any idea ?


Hm, I believe that no patches should be necessary. I've finally managed to 
install 2.6.16 on my ss20 with hypersparc processor, it worked ok so far, 
but I wasn't really stressing it. From experience, it is really picky 
about memory layout, and a total of 448MB strikes me as slightly odd. Try 
leaving something like 128MB and see whether the situation will improve.


Best regards,

Jurij Smakov[EMAIL PROTECTED]
Key: http://www.wooyd.org/pgpkey/   KeyID: C99E03CC


Re: HyperSPARC patches (?)

2006-04-06 Thread Hendrik Sattler
Am Freitag, 7. April 2006 04:42 schrieb Jurij Smakov:
 Hm, I believe that no patches should be necessary. I've finally managed to
 install 2.6.16 on my ss20 with hypersparc processor, it worked ok so far,
 but I wasn't really stressing it. From experience, it is really picky
 about memory layout, and a total of 448MB strikes me as slightly odd. Try
 leaving something like 128MB and see whether the situation will improve.


448MB (7x 64MB) is the best you get when the VSIMM is installed to be able to 
use the CG14 SX frame buffer.
There's only one memory slot you can use for the VSIMM, so there is not much 
you can do wrong with memory layout.

HS

-- 
Mein GPG-Key ist auf meiner Homepage verfügbar: http://www.hendrik-sattler.de
oder über pgp.net

PingoS - Linux-User helfen Schulen: http://www.pingos.org


pgp7OdNvRCDKL.pgp
Description: PGP signature


HyperSPARC patches (?)

2006-04-03 Thread BERTRAND Joël

Hello,

	I have a SparcSTATION 20 with dual SuperSPARC-II/75, 448 MB of RAM, and 
a 4MB-VSIMM for the internal CG14. It runs fine with a 2.6.16.1 linux 
kernel (official / no-SMP). If I replace both processors by one (or two) 
HyperSPARC/200 (from ROSS), the workstation boots but... dpkg returns 
segfaults, esp driver seems to be broken (SCSI bus reset...)... Is there 
anywhere a patch to fix this ? I have seens some patches for 2.6.11 
kernel (sparc32-hypersparc-srmmu.patch), but no one for recent kernel.


Any idea ?

Regards,

JKB


--
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]