Re: Amd64 Unstable Areca

2007-04-13 Thread Scott Long

Phillip N. wrote:

Im getting the following on Releng_6 from Apr  4.

Is this related to the areca driver?

thanks.



It's not directly coming from the areca driver.  It could be that there 
is some memory or disk corruption that is triggering these panics, but

that's just a wild guess.

Scott

___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: Amd64 Unstable Areca

2007-04-13 Thread Phillip N.
Im getting the following on Releng_6 from Apr  4.

Is this related to the areca driver?

thanks.
[EMAIL PROTECTED] /usr/obj/usr/src/sys/WORM]# kgdb kernel.debug 
/var/crash/vmcore.0 
[GDB will not be able to debug user-mode threads: /usr/lib/libthread_db.so: 
Undefined symbol "ps_pglobal_lookup"]
GNU gdb 6.1.1 [FreeBSD]
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for details.
This GDB was configured as "amd64-marcel-freebsd".

Unread portion of the kernel message buffer:


Fatal trap 12: page fault while in kernel mode
cpuid = 0; apic id = 00
fault virtual address   = 0x38
fault code  = supervisor write data, page not present
instruction pointer = 0x8:0x8046c33f
stack pointer   = 0x10:0xb776c790
frame pointer   = 0x10:0xa0cf1358
code segment= base 0x0, limit 0xf, type 0x1b
= DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags= interrupt enabled, resume, IOPL = 0
current process = 1793 (bmon)
trap number = 12
panic: page fault
cpuid = 0
Uptime: 1d19h58m47s
Dumping 2047 MB (2 chunks)
  chunk 0: 1MB (156 pages) ... ok
  chunk 1: 2047MB (524016 pages) 2031 2015 1999 1983 1967 1951 1935 1919 1903 
1887 1871 1855 1839 1823 1807 1791 1775 1759 1743 1727 1711 1695 1679 1663 1647 
1631 1615 1599 1583 1567

Fatal trap 12: page fault while in kernel mode
cpuid = 1; apic id = 01
fault virtual address   = 0xa2
fault code  = supervisor write data, page not present
instruction pointer = 0x8:0x805c67a0
stack pointer   = 0x10:0xb499d970
frame pointer   = 0x10:0xff0005e301f0
code segment= base 0x0, limit 0xf, type 0x1b 1551
= DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags= interrupt enabled, resume, IOPL = 0
current process = 40 (syncer)
trap number = 12
 1535 1519 1503 1487 1471 1455 1439 1423 1407 1391 1375 1359 1343 1327 1311 
1295 1279 1263 1247 1231 1215 1199 1183 1167 1151 1135 1119 1103 1087 1071 1055 
1039 1023 1007 991 975 959 943 927 911 895 879 863 847 831 815 799 783 767 751 
735 719 703 687 671 655 639 623 607 591 575 559 543 527 511 495 479 463 447 431 
415 399 383 367 351 335 319 303 287 271 255 239 223 207 191 175 159 143 127 111 
95 79 63 47 31 15

#0  doadump () at pcpu.h:172
172 __asm __volatile("movq %%gs:0,%0" : "=r" (td));
(kgdb) bt
#0  doadump () at pcpu.h:172
#1  0x0004 in ?? ()
#2  0x804156d7 in boot (howto=260)
at /usr/src/sys/kern/kern_shutdown.c:409
#3  0x80415d71 in panic (fmt=0xff0072a78980 "X�Kl")
at /usr/src/sys/kern/kern_shutdown.c:565
#4  0x8063a0cf in trap_fatal (frame=0xff0072a78980, 
eva=18446742976014820184) at /usr/src/sys/amd64/amd64/trap.c:668
#5  0x8063a44c in trap_pfault (frame=0xb776c6e0, usermode=0)
at /usr/src/sys/amd64/amd64/trap.c:580
#6  0x8063a703 in trap (frame=
  {tf_rdi = -1597041832, tf_rsi = -1097588045440, tf_rdx = -1097405784824, 
tf_rcx = 4, tf_r8 = -1098265798384, tf_r9 = -1098676372624, tf_rax = 4, tf_rbx 
= 4, tf_rbp = -1597041832, tf_r10 = -1097492439552, tf_r11 = -1097588045440, 
tf_r12 = 0, tf_r13 = -1597041832, tf_r14 = -1098676373072, tf_r15 = 
-1597041832, tf_trapno = 12, tf_addr = 56, tf_flags = -2142769062, tf_err = 2, 
tf_rip = -2142846145, tf_cs = 8, tf_rflags = 66050, tf_rsp = -1216952416, tf_ss 
= 16})
at /usr/src/sys/amd64/amd64/trap.c:353
#7  0x80622dab in calltrap ()
at /usr/src/sys/amd64/amd64/exception.S:168
#8  0x8046c33f in vfs_setdirty (bp=0xa0cf1358) at atomic.h:139
#9  0x80470773 in bdwrite (bp=0xa0cf1358)
at /usr/src/sys/kern/vfs_bio.c:963
---Type  to continue, or q  to quit---
#10 0x805a3514 in ffs_write (ap=0xb776ca30)
at /usr/src/sys/ufs/ffs/ffs_vnops.c:772
#11 0x80691c4b in VOP_WRITE_APV (vop=0x808e5900, 
a=0xb776ca30) at vnode_if.c:698
#12 0x8049137a in vn_write (fp=0xff004b7a7708, 
uio=0xb776cb50, active_cred=0xff007d849d08, flags=0, 
td=0xff0072a78980) at vnode_if.h:372
#13 0x80440d67 in dofilewrite (td=0xff0072a78980, fd=3, 
fp=0xff004b7a7708, auio=0xb776cb50, offset=-1098265798384, 
flags=0) at file.h:253
#14 0x804410d0 in kern_writev (td=0xff0072a78980, fd=3, 
auio=0xb776cb50) at /usr/src/sys/kern/sys_generic.c:402
#15 0x804411c8 in write (td=0xa0cf1358, uap=0xff0072a78980)
at /usr/src/sys/kern/sys_generic.c:326
#16 0x8063af81 in syscall (frame=
  {tf_rdi = 3, tf_rsi = 5382144, tf_rdx = 4096, tf_rcx 

Re: Amd64 Unstable Areca

2007-03-31 Thread KillFill
El vie, 30-03-2007 a las 17:09 -0500, Nikolas Britton escribió:
> Have you tried making it crash?
> 

Well, i have the same process running in cron as i did before, and it
not crashing..

raidtest seem to run just fine[1]

Maybe you could recomend me a better stress case..


[1] raid5 device:
Read 5 requests from raidtest.data.
Number of READ requests: 24900.
Number of WRITE requests: 25100.
Number of bytes to transmit: 3287726080.
Number of processes: 10.
Bytes per second: 9979088
Requests per second: 151

thanks!
-- 
KillFill <[EMAIL PROTECTED]>

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: Amd64 Unstable Areca

2007-03-30 Thread Scott Long

Nikolas Britton wrote:

On 3/30/07, Scott Long <[EMAIL PROTECTED]> wrote:

Nikolas Britton wrote:
> On 3/30/07, Phillip Neumann <[EMAIL PROTECTED]> wrote:
>> El dom, 25-03-2007 a las 10:11 -0600, Scott Long escribió:
>> > Please try the following patch against the latest 6-STABLE driver
>> > sources:  http://people.freebsd.org/~scottl/arcmsr.simq.diff.
>> >
>> > Scott
>>
>> Just in case you mind, the problem does not seem to be present with 
the

>> patch.
>>
>> Havent had crash in days.
>>
>> (When one occur, ill notify)
>>
>> thanks!!
>>
>
> Have you tried making it crash?

Erich Chen pointed a problem with the patch I generated, but I think
it's mostly harmless.  Good to know that it seems to be helping the
problem.

Scott



And??? don't leave us hanging, what's the problem?


My patch added a line with "bus_dmamap_unload(...)" that isn't needed.
I don't believe that it'll cause problems, though.  The whole line will
be removed when I commit the patch to CVS.

Scott

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: Amd64 Unstable Areca

2007-03-30 Thread Nikolas Britton

On 3/30/07, Scott Long <[EMAIL PROTECTED]> wrote:

Nikolas Britton wrote:
> On 3/30/07, Phillip Neumann <[EMAIL PROTECTED]> wrote:
>> El dom, 25-03-2007 a las 10:11 -0600, Scott Long escribió:
>> > Please try the following patch against the latest 6-STABLE driver
>> > sources:  http://people.freebsd.org/~scottl/arcmsr.simq.diff.
>> >
>> > Scott
>>
>> Just in case you mind, the problem does not seem to be present with the
>> patch.
>>
>> Havent had crash in days.
>>
>> (When one occur, ill notify)
>>
>> thanks!!
>>
>
> Have you tried making it crash?

Erich Chen pointed a problem with the patch I generated, but I think
it's mostly harmless.  Good to know that it seems to be helping the
problem.

Scott



And??? don't leave us hanging, what's the problem?
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: Amd64 Unstable Areca

2007-03-30 Thread Scott Long

Nikolas Britton wrote:

On 3/30/07, Phillip Neumann <[EMAIL PROTECTED]> wrote:

El dom, 25-03-2007 a las 10:11 -0600, Scott Long escribió:
> Please try the following patch against the latest 6-STABLE driver
> sources:  http://people.freebsd.org/~scottl/arcmsr.simq.diff.
>
> Scott

Just in case you mind, the problem does not seem to be present with the
patch.

Havent had crash in days.

(When one occur, ill notify)

thanks!!



Have you tried making it crash?


Erich Chen pointed a problem with the patch I generated, but I think
it's mostly harmless.  Good to know that it seems to be helping the
problem.

Scott

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: Amd64 Unstable Areca

2007-03-30 Thread Nikolas Britton

On 3/30/07, Phillip Neumann <[EMAIL PROTECTED]> wrote:

El dom, 25-03-2007 a las 10:11 -0600, Scott Long escribió:
> Please try the following patch against the latest 6-STABLE driver
> sources:  http://people.freebsd.org/~scottl/arcmsr.simq.diff.
>
> Scott

Just in case you mind, the problem does not seem to be present with the
patch.

Havent had crash in days.

(When one occur, ill notify)

thanks!!



Have you tried making it crash?
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: Amd64 Unstable Areca

2007-03-30 Thread Phillip Neumann
El dom, 25-03-2007 a las 10:11 -0600, Scott Long escribió:
> Please try the following patch against the latest 6-STABLE driver 
> sources:  http://people.freebsd.org/~scottl/arcmsr.simq.diff.
> 
> Scott 

Just in case you mind, the problem does not seem to be present with the
patch.

Havent had crash in days.

(When one occur, ill notify)

thanks!!

-- 
Phillip Neumann <[EMAIL PROTECTED]>

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: Amd64 Unstable Areca

2007-03-26 Thread Phillip Neumann
El dom, 25-03-2007 a las 10:11 -0600, Scott Long escribió:
> Nikolas Britton wrote:
> > Yeah are hardware is nearly identical. I don't remember what I did to
> > my custom driver, I know I fixed some syntax errors and merged in
> > changes that were made on top of the 1.20.00.02 code base. I'm not
> > sure but I think most of those changes were thrown out with the import
> > of 1.20.00.13 and 14.
> > 
> > Yes ..0.13 is the driver from 6.2-RELEASE-p2 and no I don't see any
> > g_vfs errors, I do see a crap load of httpd errors that someone needs
> > to investigate, lucky me. :-/   It's not likely I'd see them anyhow
> > because the business slows way down during this time of year:
> 
> Please try the following patch against the latest 6-STABLE driver 
> sources:  http://people.freebsd.org/~scottl/arcmsr.simq.diff.
> 
> Scott


Scott unfortunatly, yesterday the box crashed again, with your patches
dont know if the problem is FS related tho.

I still didnt do what Jan sudgested, i.e. reformat the partitions. i may
do this tonight.


thanks!



Unread portion of the kernel message buffer:
<6>pid 94219 (sh), uid 0: exited on signal 11
<6>pid 94241 (sh), uid 0: exited on signal 11


Fatal trap 12: page fault while in kernel mode
cpuid = 0; apic id = 00
fault virtual address   = 0x38
fault code  = supervisor write data, page not present
instruction pointer = 0x8:0x805bda49
stack pointer   = 0x10:0xb7aab9a0
frame pointer   = 0x10:0x4
code segment= base 0x0, limit 0xf, type 0x1b
= DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags= interrupt enabled, resume, IOPL = 0
current process = 94245 (sh)
trap number = 12
panic: page fault
cpuid = 0
Uptime: 5h19m40s
Dumping 2047 MB (2 chunks)
  chunk 0: 1MB (156 pages) ... ok
  chunk 1: 2047MB (524016 pages) 2031 2015 1999 1983 1967 1951 1935 1919 1903 
1887 1871 1855 1839 1823 1807 1791 1775 1759 1743 1727 1711 1695 1679 1663 1647 
1631 1615 1599 1583 1567 1551 1535 1519 1503 1487 1471 1455 1439 1423 1407 1391 
1375 1359 1343 1327 1311 1295 1279 1263 1247 1231 1215 1199 1183 1167 1151 1135 
1119 1103 1087 1071 1055 1039 1023 1007 991 975 959 943 927 911 895 879 863 847 
831 815 799 783 767 751 735 719 703 687 671 655 639 623 607 591 575 559 543 527 
511 495 479 463 447 431 415 399 383 367 351 335 319 303 287 271 255 239 223 207 
191 175 159 143 127 111 95 79 63 47 31 15

#0  doadump () at pcpu.h:172
172 __asm __volatile("movq %%gs:0,%0" : "=r" (td));
(kgdb) bt
#0  doadump () at pcpu.h:172
#1  0x0004 in ?? ()
#2  0x80409447 in boot (howto=260) at 
/usr/src/sys/kern/kern_shutdown.c:409
#3  0x80409ae1 in panic (fmt=0xff003b32e260 "�\226�A")
at /usr/src/sys/kern/kern_shutdown.c:565
#4  0x8062b17f in trap_fatal (frame=0xff003b32e260, 
eva=18446742975299032752)
at /usr/src/sys/amd64/amd64/trap.c:668
#5  0x8062b4fc in trap_pfault (frame=0xb7aab8f0, usermode=0)
at /usr/src/sys/amd64/amd64/trap.c:580
#6  0x8062b7b3 in trap (frame=
  {tf_rdi = -1097399663144, tf_rsi = 130, tf_rdx = -2140101835, tf_rcx = 
-1099185419872, tf_r8 = -2137937120, tf_r9 = -1098518437280, tf_rax = 
-1098518437280, tf_rbx = 0, tf_rbp = 4, tf_r10 = -2137799976, tf_r11 = 
-1098518437280, tf_r12 = -1097399663144, tf_r13 = -2140101835, tf_r14 = 
-1213547824, tf_r15 = -1097492989984, tf_trapno = 12, tf_addr = 56, tf_flags = 
-1098518437280, tf_err = 2, tf_rip = -2141463991, tf_cs = 8, tf_rflags = 66050, 
tf_rsp = -1213548112, tf_ss = 0})
at /usr/src/sys/amd64/amd64/trap.c:353
#7  0x806167eb in calltrap () at 
/usr/src/sys/amd64/amd64/exception.S:168
#8  0x805bda49 in vm_page_sleep_if_busy (m=0xff007de205d8, 
also_m_busy=130, 
msg=0x8070a335 "vmpfw") at atomic.h:139
#9  0x805b09e7 in vm_fault (map=0xff0013532440, 
vaddr=140737488338944, 
fault_type=2 '\002', fault_flags=8) at /usr/src/sys/vm/vm_fault.c:397
#10 0x8062b3b7 in trap_pfault (frame=0xb7aabc40, usermode=1)
at /usr/src/sys/amd64/amd64/trap.c:557
#11 0x8062b943 in trap (frame=
  {tf_rdi = 5358048, tf_rsi = 0, tf_rdx = 1, tf_rcx = 34369249980, tf_r8 = 
-1098728259232, tf_r9 = 140737488343064, tf_rax = 0, tf_rbx = 140737488343408, 
tf_rbp = 5423104, tf_r10 = -2035732464, tf_r11 = 0, tf_r12 = 0, tf_r13 = 2, 
tf_r14 = 1, tf_r15 = 5399624, tf_trapno = 12, tf_addr = 140737488343016, 
tf_flags = 12, tf_err = 7, tf_rip = 4254692, tf_cs = 43, tf_rflags = 66118, 
tf_rsp = 140737488343024, tf_ss = 35}) at /usr/src/sys/amd64/amd64/trap.c:283
---Type  to continue, or q  to quit---
#12 0x806167eb in calltrap () at 
/usr/src/sys/amd64/amd64/exception.S:168
#13 0x0040ebe4 in ?? ()
Previous frame inner to this frame (corrupt stack?)

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/f

Re: Amd64 Unstable Areca

2007-03-26 Thread Nikolas Britton

On 3/25/07, Scott Long <[EMAIL PROTECTED]> wrote:

Nikolas Britton wrote:
> Yeah are hardware is nearly identical. I don't remember what I did to
> my custom driver, I know I fixed some syntax errors and merged in
> changes that were made on top of the 1.20.00.02 code base. I'm not
> sure but I think most of those changes were thrown out with the import
> of 1.20.00.13 and 14.
>
> Yes ..0.13 is the driver from 6.2-RELEASE-p2 and no I don't see any
> g_vfs errors, I do see a crap load of httpd errors that someone needs
> to investigate, lucky me. :-/   It's not likely I'd see them anyhow
> because the business slows way down during this time of year:

Please try the following patch against the latest 6-STABLE driver
sources:  http://people.freebsd.org/~scottl/arcmsr.simq.diff.



Thanks Scott... Unfortunately, or fortunately depending on your
outlook, spring break ends today and I've got a crap load of studying
to do, so no time to play guinea pig...  But because I'm such a nice
guy I've compiled i386 and amd64 binaries so the others can play
along.  I compiled everything against 6.2-RELEASE-p2...

Download them here: http://www.nbritton.org/uploads/areca/fb62/

Do you have any testing scripts so the group can do repeatable
comparison tests using the old and new kernel modules? Thanks. Oh and
here's a file list but gmail is probably going to mutilate it:

arcmsr.c.1824: ASCII C program text
arcmsr.h.1142: ISO-8859 C program text
arcmsr.kld.fb62.i386.032607:   ELF 32-bit LSB relocatable, Intel
80386, version 1 (FreeBSD), not stripped
arcmsr.ko.debug.fb62.amd64.032607: ELF 64-bit LSB relocatable, AMD
x86-64, version 1 (FreeBSD), not stripped
arcmsr.ko.fb62.amd64.032607:   ELF 64-bit LSB relocatable, AMD
x86-64, version 1 (FreeBSD), not stripped
arcmsr.ko.fb62.i386.032607:ELF 32-bit LSB shared object, Intel
80386, version 1 (FreeBSD), not stripped
arcmsr.o.fb62.amd64.032607:ELF 64-bit LSB relocatable, AMD
x86-64, version 1 (FreeBSD), not stripped
arcmsr.o.fb62.i386.032607: ELF 32-bit LSB relocatable, Intel
80386, version 1 (FreeBSD), not stripped
i386_build_log.txt:ASCII text, with very long lines
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: Amd64 Unstable Areca

2007-03-25 Thread Scott Long

Nikolas Britton wrote:

Yeah are hardware is nearly identical. I don't remember what I did to
my custom driver, I know I fixed some syntax errors and merged in
changes that were made on top of the 1.20.00.02 code base. I'm not
sure but I think most of those changes were thrown out with the import
of 1.20.00.13 and 14.

Yes ..0.13 is the driver from 6.2-RELEASE-p2 and no I don't see any
g_vfs errors, I do see a crap load of httpd errors that someone needs
to investigate, lucky me. :-/   It's not likely I'd see them anyhow
because the business slows way down during this time of year:


Please try the following patch against the latest 6-STABLE driver 
sources:  http://people.freebsd.org/~scottl/arcmsr.simq.diff.


Scott

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: Amd64 Unstable Areca

2007-03-24 Thread Nikolas Britton

On 3/24/07, Jan Mikkelsen <[EMAIL PROTECTED]> wrote:

Nikolas Britton wrote:
> On 3/24/07, Jan Mikkelsen <[EMAIL PROTECTED]> wrote:
> > Hi,
> >
> > Nikolas Britton wrote:
> > > If that doesn't work move back down to 1.20.00.12:
> > > http://www.nbritton.org/uploads/areca/
> >
> > I could consistently make 1.20.00.12 corrupt data.  If you
> are going to go
> > back, that's probably a bad choice. 1.20.00.02 didn't seem to have
> > corruption problems.
> >
>
> The 1.20.00.12 driver I pointed to was a custom hack I did for my
> servers, It worked fine for the 7 months I was using it... I'm
> assuming we're talking about I/O load, the servers rarely see high cpu
> loads... the hardware:
>
> http://www.supermicro.com/products/motherboard/Xeon1333/5000P/X7DBE.cfm
>
> arcmsr0:  ARECA RAID ADAPTER0: Driver Version 1.20.00.13 2006-8-18
> ARECA RAID ADAPTER0: FIRMWARE VERSION V1.41 2006-5-24
> pass1 at arcmsr0 bus 0 target 16 lun 0
> pass1:  Fixed Processor SCSI-0 device
> da0 at arcmsr0 bus 0 target 0 lun 0
> da0:  Fixed Direct Access SCSI-3 device
> da0: 166.666MB/s transfers (83.333MHz, offset 32, 16bit), Tagged
> Queueing Enabled
> da0: 1430511MB (2929687040 512 byte sectors: 255H 63S/T 182364C)
>
> I just upgraded them for the DST change.  :-/

Interesting.  How did you modify the 1.20.00.12 driver from Areca?

Is the 1.20.00.13 driver mentioned above the one from 6.2-RELEASE?  If so,
do you ever see g_vfs_done errors on this machine when it is under heavy I/O
load?

From the machine I used to test the corruption issue (currently running
6-STABLE):

arcmsr0:  mem 0xc850-0xc8500fff,0xc8c0-0xc8ff irq 16 at device 14.0 on
pci10
ARECA RAID ADAPTER0: Driver Version 1.20.00.14 2007-2-05
ARECA RAID ADAPTER0: FIRMWARE VERSION V1.42 2006-10-13
pass4 at arcmsr0 bus 0 target 16 lun 0
pass4:  Fixed Processor SCSI-0 device

This is an ARC-1220 in a Supermicro X7DB8 based machine.



Yeah are hardware is nearly identical. I don't remember what I did to
my custom driver, I know I fixed some syntax errors and merged in
changes that were made on top of the 1.20.00.02 code base. I'm not
sure but I think most of those changes were thrown out with the import
of 1.20.00.13 and 14.

Yes ..0.13 is the driver from 6.2-RELEASE-p2 and no I don't see any
g_vfs errors, I do see a crap load of httpd errors that someone needs
to investigate, lucky me. :-/   It's not likely I'd see them anyhow
because the business slows way down during this time of year:


uptime:

11:53PM  up 3 days,  8:56, 1 user, load averages: 0.00, 0.00, 0.00

I remember seeing that g_vfs error one time when one of the sata
cables came loose. All hell broke loose and FreeBSD had a major brain
fart, I've had several other sata cable 'incidents' but that's the
only time FreeBSD croaked (with an areca card). The cable incidents
happen with all are sata raid gear and it's not specific to areca
products. I've come to the conclusion that the sata cables simply
vibrate loose. I'm debating at the moment if we should move to sas
multi-lane backplanes. The biggest problem I forsee is if one of the
multi-lane cables comes loose, if that happened the array is toast ..
anyhow... I simply rebooted the server, rebuilt the array, verified
the array, and ran fsck. The file system corruption was typical of a
power failure during disk write.
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


RE: Amd64 Unstable Areca

2007-03-24 Thread Jan Mikkelsen
Nikolas Britton wrote:
> On 3/24/07, Jan Mikkelsen <[EMAIL PROTECTED]> wrote:
> > Hi,
> >
> > Nikolas Britton wrote:
> > > If that doesn't work move back down to 1.20.00.12:
> > > http://www.nbritton.org/uploads/areca/
> >
> > I could consistently make 1.20.00.12 corrupt data.  If you 
> are going to go
> > back, that's probably a bad choice. 1.20.00.02 didn't seem to have
> > corruption problems.
> >
> 
> The 1.20.00.12 driver I pointed to was a custom hack I did for my
> servers, It worked fine for the 7 months I was using it... I'm
> assuming we're talking about I/O load, the servers rarely see high cpu
> loads... the hardware:
> 
> http://www.supermicro.com/products/motherboard/Xeon1333/5000P/X7DBE.cfm
>
> arcmsr0:  ARECA RAID ADAPTER0: Driver Version 1.20.00.13 2006-8-18
> ARECA RAID ADAPTER0: FIRMWARE VERSION V1.41 2006-5-24
> pass1 at arcmsr0 bus 0 target 16 lun 0
> pass1:  Fixed Processor SCSI-0 device
> da0 at arcmsr0 bus 0 target 0 lun 0
> da0:  Fixed Direct Access SCSI-3 device
> da0: 166.666MB/s transfers (83.333MHz, offset 32, 16bit), Tagged
> Queueing Enabled
> da0: 1430511MB (2929687040 512 byte sectors: 255H 63S/T 182364C)
> 
> I just upgraded them for the DST change.  :-/

Interesting.  How did you modify the 1.20.00.12 driver from Areca?

Is the 1.20.00.13 driver mentioned above the one from 6.2-RELEASE?  If so,
do you ever see g_vfs_done errors on this machine when it is under heavy I/O
load?

>From the machine I used to test the corruption issue (currently running
6-STABLE):

arcmsr0:  mem 0xc850-0xc8500fff,0xc8c0-0xc8ff irq 16 at device 14.0 on
pci10
ARECA RAID ADAPTER0: Driver Version 1.20.00.14 2007-2-05
ARECA RAID ADAPTER0: FIRMWARE VERSION V1.42 2006-10-13
pass4 at arcmsr0 bus 0 target 16 lun 0
pass4:  Fixed Processor SCSI-0 device

This is an ARC-1220 in a Supermicro X7DB8 based machine.

Jan.

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: Amd64 Unstable Areca

2007-03-24 Thread Nikolas Britton

On 3/24/07, Jan Mikkelsen <[EMAIL PROTECTED]> wrote:

Hi,

Nikolas Britton wrote:
> If that doesn't work move back down to 1.20.00.12:
> http://www.nbritton.org/uploads/areca/

I could consistently make 1.20.00.12 corrupt data.  If you are going to go
back, that's probably a bad choice. 1.20.00.02 didn't seem to have
corruption problems.



The 1.20.00.12 driver I pointed to was a custom hack I did for my
servers, It worked fine for the 7 months I was using it... I'm
assuming we're talking about I/O load, the servers rarely see high cpu
loads... the hardware:

http://www.supermicro.com/products/motherboard/Xeon1333/5000P/X7DBE.cfm

arcmsr0:  Fixed Processor SCSI-0 device
da0 at arcmsr0 bus 0 target 0 lun 0
da0:  Fixed Direct Access SCSI-3 device
da0: 166.666MB/s transfers (83.333MHz, offset 32, 16bit), Tagged
Queueing Enabled
da0: 1430511MB (2929687040 512 byte sectors: 255H 63S/T 182364C)

I just upgraded them for the DST change.  :-/
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


RE: Amd64 Unstable Areca

2007-03-24 Thread Jan Mikkelsen
KillFill wrote:
> Are you sudgesting to newfs FS's?

Yes, using the 1.20.00.14 driver (ie: from 6-STABLE).

I don't know whether the corruption is coming from the current driver, or
from blocks the previous driver wrote.  Doing a newfs will help if the
corrupt blocks came from the previous driver.
 
Regards,

Jan.

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


RE: Amd64 Unstable Areca

2007-03-24 Thread Jan Mikkelsen
Hi,

Nikolas Britton wrote:
> If that doesn't work move back down to 1.20.00.12:
> http://www.nbritton.org/uploads/areca/

I could consistently make 1.20.00.12 corrupt data.  If you are going to go
back, that's probably a bad choice. 1.20.00.02 didn't seem to have
corruption problems.

Regards,

Jan.

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: Amd64 Unstable Areca

2007-03-24 Thread Scott Long

KillFill wrote:

El vie, 23-03-2007 a las 21:01 -0500, Nikolas Britton escribió:

A newer version of the driver has been release to fix this problem (I
think):
http://www.freebsd.org/cgi/cvsweb.cgi/src/sys/dev/arcmsr/

If that doesn't work move back down to 1.20.00.12:
http://www.nbritton.org/uploads/areca/

Added erich and scott to the cc list.



Well, you may notice im using RELENG_6

im using the latest versions for releng_6:

* $FreeBSD: src/sys/dev/arcmsr/arcmsr.c,v 1.8.2.3
* $FreeBSD: src/sys/dev/arcmsr/arcmsr.h,v 1.1.4.2

unfortunatly cannot move the box to -CURRENT to test arcmsr.c 1.20 and
arcmsr.h 1.4

thanks.


It's possible that I might have botched the merge from -CURRENT.  Your 
panics definitely point to data corruption, which implicates the driver.

I'll take a look.

Scott

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: Amd64 Unstable Areca

2007-03-24 Thread KillFill
El vie, 23-03-2007 a las 21:01 -0500, Nikolas Britton escribió:
> A newer version of the driver has been release to fix this problem (I
> think):
> http://www.freebsd.org/cgi/cvsweb.cgi/src/sys/dev/arcmsr/
> 
> If that doesn't work move back down to 1.20.00.12:
> http://www.nbritton.org/uploads/areca/
> 
> Added erich and scott to the cc list.
> 

Well, you may notice im using RELENG_6

im using the latest versions for releng_6:

* $FreeBSD: src/sys/dev/arcmsr/arcmsr.c,v 1.8.2.3
* $FreeBSD: src/sys/dev/arcmsr/arcmsr.h,v 1.1.4.2

unfortunatly cannot move the box to -CURRENT to test arcmsr.c 1.20 and
arcmsr.h 1.4

thanks.
-- 
KillFill <[EMAIL PROTECTED]>

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: Amd64 Unstable Areca

2007-03-23 Thread Nikolas Britton

A newer version of the driver has been release to fix this problem (I think):
http://www.freebsd.org/cgi/cvsweb.cgi/src/sys/dev/arcmsr/

If that doesn't work move back down to 1.20.00.12:
http://www.nbritton.org/uploads/areca/

Added erich and scott to the cc list.


On 3/22/07, Phillip Neumann <[EMAIL PROTECTED]> wrote:

Dear FreeBSD-stable...

My amd64 box is not very stable.
In its hardware list, you can see there is an areca 1210 card, wich
suffer the errata of 6.2-release (high load crash)

Last week or so, i saw a commit where the areca bugs were fixed, so i
updated the system.

I can still see the mashine crashing under load

attached are dmesg -a, and a simple 'bt' of kgdb.

sometimes (under load) i see this message:
Interrupt storm detected on "swi2:"; throttling interrupt source

i get the same behaviour with sched_bsd or sched_ule


Is this info useful to determine where the problem is?
If so, where is it?  :-)
Has this something to do with the fs? or the scheduler?


thanks!


killfill.



[EMAIL PROTECTED] /usr/obj/usr/src/sys/WORM]# kgdb kernel.debug 
/var/crash/vmcore.1
[GDB will not be able to debug user-mode threads: /usr/lib/libthread_db.so: Undefined 
symbol "ps_pglobal_lookup"]
GNU gdb 6.1.1 [FreeBSD]
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for details.
This GDB was configured as "amd64-marcel-freebsd".

Unread portion of the kernel message buffer:
panic: handle_workitem_remove: lost inodedep
cpuid = 0
Uptime: 2h19m9s
Dumping 2047 MB (2 chunks)
  chunk 0: 1MB (156 pages) ... ok
  chunk 1: 2047MB (524016 pages) 2031 2015 1999 1983 1967 1951 1935 1919 1903 
1887 1871 1855 1839 1823 1807 1791 1775 1759 1743 1727 1711 1695 1679 1663 1647 
1631 1615 1599 1583 1567 1551 1535 1519 1503 1487 1471 1455 1439 1423 1407 1391 
1375 1359 1343 1327 1311 1295 1279 1263 1247 1231 1215 1199 1183 1167 1151 1135 
1119 1103 1087 1071 1055 1039 1023 1007 991 975 959 943 927 911 895 879 863 847 
831 815 799 783 767 751 735 719 703 687 671 655 639 623 607 591 575 559 543 527 
511 495 479 463 447 431 415 399 383 367 351 335 319 303 287 271 255 239 223 207 
191 175 159 143 127 111 95 79 63 47 31 15

#0  doadump () at pcpu.h:172
172 __asm __volatile("movq %%gs:0,%0" : "=r" (td));
(kgdb) bt
#0  doadump () at pcpu.h:172
#1  0x0004 in ?? ()
#2  0x804085e7 in boot (howto=260) at 
/usr/src/sys/kern/kern_shutdown.c:409
#3  0x80408c81 in panic (fmt=0xff007a9cd980 "�VTx")
at /usr/src/sys/kern/kern_shutdown.c:565
#4  0x80589fc6 in handle_workitem_remove (dirrem=0xff001a6c7b40, 
xp=0x0)
at /usr/src/sys/ufs/ffs/ffs_softdep.c:3599
#5  0x8058a3dc in process_worklist_item (mp=0xff0031482630, flags=0)
at /usr/src/sys/ufs/ffs/ffs_softdep.c:962
#6  0x8058fd3d in softdep_process_worklist (mp=0xff0031482630, 
full=0)
at /usr/src/sys/ufs/ffs/ffs_softdep.c:851
#7  0x80590031 in softdep_flush () at 
/usr/src/sys/ufs/ffs/ffs_softdep.c:762
#8  0x803ed647 in fork_exit (callout=0x8058fee0 
, arg=0x0,
frame=0xb49abc50) at /usr/src/sys/kern/kern_fork.c:821
#9  0x80615b0e in fork_trampoline () at 
/usr/src/sys/amd64/amd64/exception.S:394
#10 0x in ?? ()
#11 0x in ?? ()
#12 0x0001 in ?? ()
#13 0x in ?? ()
#14 0x in ?? ()
#15 0x in ?? ()
#16 0x in ?? ()
#17 0x in ?? ()
#18 0x in ?? ()
#19 0x in ?? ()
#20 0x in ?? ()
#21 0x in ?? ()
#22 0x in ?? ()
#23 0x in ?? ()
#24 0x in ?? ()
#25 0x in ?? ()
#26 0x in ?? ()
#27 0x in ?? ()
#28 0x in ?? ()
#29 0x in ?? ()
#30 0x in ?? ()
#31 0x in ?? ()
#32 0x in ?? ()
---Type  to continue, or q  to quit---
#33 0x in ?? ()
#34 0x in ?? ()
#35 0x in ?? ()
#36 0x in ?? ()
#37 0x in ?? ()
#38 0x in ?? ()
#39 0x in ?? ()
#40 0x in ?? ()
#41 0x in ?? ()
#42 0x00b7d000 in ?? ()
#43 0x in ?? ()
#44 0xff002d080600 in ?? ()
#45 0x in ?? ()
#46 0xff00785456b0 in ?? ()
#47 0xff007a9f2000 in ?? ()
#48 0xb49ab878 in ?? ()
#49 0xff007a9cd980 in ?? ()
#50 0x8041ed56 in sched_switch (td=0x0, newtd=0x0, flags=1)
at /usr/src/sys/kern/sched_4bsd.c:973
#51 0x in ?? ()
#52 0x in ?? ()
#53 0x in ?? ()
#54 0x00

RE: Amd64 Unstable Areca

2007-03-23 Thread KillFill
Hello...

El vie, 23-03-2007 a las 11:12 +1100, Jan Mikkelsen escribió: 
> Hi,
> 
> Phillip Neumann wrote:
> > My amd64 box is not very stable.
> > In its hardware list, you can see there is an areca 1210 card, wich 
> > suffer the errata of 6.2-release (high load crash)
> > 
> > Last week or so, i saw a commit where the areca bugs were fixed, so i 
> > updated the system.
> > 
> > I can still see the mashine crashing under load
> 
> How heavy is the load?  I can't make 6-STABLE crash, but I could make
> 6.2-RELEASE crash.  My guess is that you have filesystem corruption
> introduced with the earlier driver which is now causing problems even
> though the driver now works.
> 

Im attaching a new backtrace, when the system crashed, iostat reported
actually not very high load:

   00 21.80 171  3.64   0.00   0  0.00  16.00   2  0.03   6  0  3  0
91
   00 25.00  92  2.24   0.00   0  0.00   0.00   0  0.00   7  0  6  0
87
   00 13.99 143  1.95   8.00   5  0.04   0.00   0  0.00   3  0  7  0
90
   00 18.25 331  5.89   0.00   0  0.00  16.00   1  0.02   0  0  1  1
97
   00  9.84 766  7.36   9.00   8  0.07  23.32  74  1.68   1  0  3  1
95
   00  5.05 551  2.72   9.00   8  0.07  11.31 113  1.25   4  0  1  0
94


when it crashed, jailed-apache was the most moving process in the
system...

The real load is coused by tinderbox, wich uses the disks and one of
both CPUs present in the system.

Are you sudgesting to newfs FS's?

Actually i used plain 6.2 install disks to do that...

> Have you done an fsck in single user mode, not a background fsck?
>  

yes, sometimes (after panic) i need to fsck in single user mode...


> > sometimes (under load) i see this message:
> > Interrupt storm detected on "swi2:"; throttling interrupt source
> 
> I see this too.  It seems to be benign.
>  
okey.

> Regards,
> 
> Jan Mikkelsen

How would i get more info?

Any tips are welcome, Thanks!

-- 
KillFill <[EMAIL PROTECTED]>
[EMAIL PROTECTED] /usr/obj/usr/src/sys/WORM]# kgdb kernel.debug 
/var/crash/vmcore.0
[GDB will not be able to debug user-mode threads: /usr/lib/libthread_db.so: 
Undefined symbol "ps_pglobal_lookup"]
GNU gdb 6.1.1 [FreeBSD]
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for details.
This GDB was configured as "amd64-marcel-freebsd".

Unread portion of the kernel message buffer:
/usr/local: bad dir ino 363554 at offset 512: mangled entry
panic: ufs_dirbad: bad dir
cpuid = 0
Uptime: 19h46m33s
Dumping 2047 MB (2 chunks)
  chunk 0: 1MB (156 pages) ... ok
  chunk 1: 2047MB (524016 pages) 2031 2015 1999 1983 1967 1951 1935 1919 1903 
1887 1871 1855 1839 1823 1807 1791 1775 1759 1743 1727 1711 1695 1679 1663 1647 
1631 1615 1599 1583 1567 1551 1535 1519 1503 1487 1471 1455 1439 1423 1407 1391 
1375 1359 1343 1327 1311 1295 1279 1263 1247 1231 1215 1199 1183 1167 1151 1135 
1119 1103 1087 1071 1055 1039 1023 1007 991 975 959 943 927 911 895 879 863 847 
831 815 799 783 767 751 735 719 703 687 671 655 639 623 607 591 575 559 543 527 
511 495 479 463 447 431 415 399 383 367 351 335 319 303 287 271 255 239 223 207 
191 175 159 143 127 111 95 79 63 47 31 15

#0  doadump () at pcpu.h:172
172 __asm __volatile("movq %%gs:0,%0" : "=r" (td));
(kgdb) bt
#0  doadump () at pcpu.h:172
#1  0x0004 in ?? ()
#2  0x804085e7 in boot (howto=260) at 
/usr/src/sys/kern/kern_shutdown.c:409
#3  0x80408c81 in panic (fmt=0xff005e027000 "\b�L^")
at /usr/src/sys/kern/kern_shutdown.c:565
#4  0x8059b230 in ufs_dirbad (ip=0x0, offset=0, how=0x0)
at /usr/src/sys/ufs/ufs/ufs_lookup.c:599
#5  0x8059b657 in ufs_lookup (ap=0xb76867a0)
at /usr/src/sys/ufs/ufs/ufs_lookup.c:287
#6  0x806807fa in VOP_CACHEDLOOKUP_APV (vop=0x0, a=0x0) at 
vnode_if.c:150
#7  0x80465bd5 in vfs_cache_lookup (ap=0x0) at vnode_if.h:82
#8  0x8068153d in VOP_LOOKUP_APV (vop=0x808d4540, 
a=0xb7686890)
at vnode_if.c:99
#9  0x8046a555 in lookup (ndp=0xb7686990) at vnode_if.h:56
#10 0x8046b285 in namei (ndp=0xb7686990)
at /usr/src/sys/kern/vfs_lookup.c:216
#11 0x8047c4b4 in kern_lstat (td=0xff005e027000, path=0x0, 
pathseg=UIO_USERSPACE, sbp=0xb7686af0) at 
/usr/src/sys/kern/vfs_syscalls.c:2141
#12 0x8047c9a7 in lstat (td=0x0, uap=0xb7686bc0)
at /usr/src/sys/kern/vfs_syscalls.c:2124
#13 0x8062ae91 in syscall (frame=
  {tf_rdi = 5275880, tf_rsi = 5275760, tf_rdx = 0, tf_rcx = 0, tf_r8 = 
-140737483074239, tf_r9 = 128, tf_rax = 190, tf_rbx = 5275648, tf_rbp = 
5275760, tf_r10 = 0, tf_r11 = 0, tf_r12 = 5259264, tf_r13 = 0, tf_r14 = 
5271552, tf_r15 = 0, tf_trapno = 12, tf_addr = 5275744, tf_flags = 0, tf_

RE: Amd64 Unstable Areca

2007-03-22 Thread Jan Mikkelsen
Hi,

Phillip Neumann wrote:
> My amd64 box is not very stable.
> In its hardware list, you can see there is an areca 1210 card, wich 
> suffer the errata of 6.2-release (high load crash)
> 
> Last week or so, i saw a commit where the areca bugs were fixed, so i 
> updated the system.
> 
> I can still see the mashine crashing under load

How heavy is the load?  I can't make 6-STABLE crash, but I could make
6.2-RELEASE crash.  My guess is that you have filesystem corruption
introduced with the earlier driver which is now causing problems even
though the driver now works.

Have you done an fsck in single user mode, not a background fsck?
 
> sometimes (under load) i see this message:
> Interrupt storm detected on "swi2:"; throttling interrupt source

I see this too.  It seems to be benign.
 
Regards,

Jan Mikkelsen

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"