[GEOM] Disk IO error when resyncing gmirror - massive hang in D state

2015-04-13 Thread Dmitry Morozovsky
Dear colleagues,

unfortunately, the machine in question is in productin, so I have no clear 
reproduce case. I do have console logs, however.

prerequisites:
- rather fresh stable/10, amd64, SuperMicro MicroCloud 1150, X10SLD-F/HF
- su+j ufs2 on top of gmirror of two SATA Toshiba drives
- one disk died some time ago, so gmirror works in degraded state

trouble:
- inserted new drive, labelled, started gmirror resync
- apparently remaining drive also has read issues:
(ada0:ahcich1:0:0:0): READ_FPDMA_QUEUED. ACB: 60 00 10 b2 c3 40 01 00 00 01 00 
00
(ada0:ahcich1:0:0:0): CAM status: ATA Status Error
(ada0:ahcich1:0:0:0): ATA status: 41 (DRDY ERR), error: 40 (UNC )
(ada0:ahcich1:0:0:0): RES: 41 40 04 b3 c3 40 01 00 00 00 01
(ada0:ahcich1:0:0:0): Error 5, Retries exhausted
GEOM_MIRROR: Request failed (error=5). ada0a[READ(offset=6566445056, 
length=131072)]
GEOM_MIRROR: Synchronization request failed (error=5). 
mirror/m0a[READ(offset=6566445056, length=131072)]

at this point, all requests to disk I/O are stalled, all cron jobs, syslogd, 
dchpd, etc.

Situation reproduce itself at least two times, then as an emergency new drive 
had been labelled independently and rsynced over.

Any thoughts?

Thanks in advance!


-- 
Sincerely,
D.Marck [DM5020, MCK-RIPE, DM3-RIPN]
[ FreeBSD committer: ma...@freebsd.org ]

*** Dmitry Morozovsky --- D.Marck --- Wild Woozle --- ma...@rinet.ru ***

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


msk msk0 watchdog timeout freeze hang lock stop problem

2015-04-13 Thread Alnis Morics
Hm... I patched if_msk.c with if_msk.c.rev262524.dma.diff 
(attachment-001.bin) and if_mskreg.h with if_mskreg.h.rev264442.dma.diff 
(attachment-002.bin), and nothing changed: scp'ing 50 MB soon got 
stalled and ended up with broken pipe, as it was before.


I have 10.1-RELEASE-p9 amd64

pciconf -lv:
[..]
mskc0@pci0:9:0:0:class=0x02 card=0xc072144d chip=0x435411ab 
rev=0x00 hdr=0x00

vendor = 'Marvell Technology Group Ltd.'
device = '88E8040 PCI-E Fast Ethernet Controller'
class  = network
subclass   = ethernet

Alnis
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: msk msk0 watchdog timeout freeze hang lock stop problem

2015-04-13 Thread Yonghyeon PYUN
On Sun, Apr 12, 2015 at 05:57:34PM +, Gareth Wyn Roberts wrote:
 I've run in to problems using the msk device where initially it works well 
 enough to set DHCP etc. but stops/freezes as soon as any appreciable network 
 traffic occurs . There are several threads describing similar symptoms over 
 the past two years or more.  I've been following several false leads but have 
 finally found a solution (at least it solves my problem).
 
 I'm running a standard FreeBSD 10.1-RELEASE and the NIC is detected as:
 
 mskc0: Marvell Yukon 88E8057 Gigabit Ethernet mem 0xfa00-0xfa003fff irq 
 19 at device 0.0 on pci6
 msk0: Marvell Technology Group Ltd. Yukon Ultra 2 Id 0xba Rev 0x00 on mskc0
 msk0: Ethernet address: 00:13:77:e9:df:eb
 miibus0: MII bus on msk0
 e1000phy0: Marvell 88E1149 Gigabit PHY PHY 0 on miibus0
 e1000phy0:  none, 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseT, 
 1000baseT-master, 1000baseT-FDX, 1000baseT-FDX-ma
 ster, auto, auto-flow
 
 The network worked when using the i386 release, but failed for the amd64 
 release (as reported previously) which prompted me to disable 64-bit DMA (the 
 patch for this is attached below).  This worked for the first kernel built 
 but mysteriously failed when another unrelated part of the kernel was changed 
 (a usb driver) and the kernel recompiled.  So identical msk driver code 
 worked in one kernel but not the second! This suggested that alignment 
 differences between the two kernels were causing the msk driver to fail. 
 Others have reported varying behaviour depending on different circumstances.
 
 It transpires that changing just one value in the if_mskreg.h file solved all 
 my problems.  Subsequently I have not been able to make it fail under heavy 
 network traffic in either 32-bit or 64-bit mode.
 I'm working on 10.1-RELEASE source, i.e. if_msk.c revision 262524 and 
 if_mskreg.h revision 264442.

Thanks for letting me know your findings.  I really appreciate
that.
I recall that the alignment requirement of status LEs(List Elements
in Marvell terms) is 2048 and the maximum size of the status LEs is
4096 bytes(Actual alignment seems to be much lower value like 32 or
64 bytes, but alignment 2048 is chosen to avoid silicon bugs).
Later experiments showed some variants of Yukon II require 4096
bytes alignment and I changed the alignment to 4096 in the past.
It seems your finding indicates msk(4) needs 8192 alignment for
status LEs.

However this does not explain how and why the same code in 8.x/9.x
works well.  In addition, it's not common to require alignment size
greater than PAGE_SIZE on x86 given that the maximum size of DMA
buffer is 4096 bytes.  I have to check whether there was a change
in bus_dma(9) between 8.x/9.x and 10.x but it needs more time due
to lack of spare time.  Probably you can verify the DMA address of
status LEs meets the following requirements both on i386 and amd64.
  - Alignment is 4096.
  - Number of DMA segment is 1.
  - DMA segment base address plus DMA segment size does not cross
a PAGE_SIZE boundary.

 
 Here's the patch to if_mskreg.h
 --- if_mskreg.h-orig2014-11-11 20:02:58.0 +
 +++ if_mskreg.h 2015-04-12 18:47:20.0 +0100
 @@ -2179,9 +2179,11 @@
   * At first I guessed 8 bytes, the size of a single descriptor, would be
   * required alignment constraints. But, it seems that Yukon II have 4096
   * bytes boundary alignment constraints.
 + * And it seems that the DMA status region for the Yukon Ultra 2 (88E8057)
 + * requires 8192 byte alignment to prevent locking.
   */
  #define MSK_RING_ALIGN 4096
 -#defineMSK_STAT_ALIGN  4096
 +#defineMSK_STAT_ALIGN  8192
 
 
 The patches to both files which also implement a MSK_64BIT_DMA_DISABLE flag 
 are attached.  Perhaps the developers would consider committing these as it 
 may be useful for future debugging.
 

If you have more than 4GB memory installed and disables 64bit DMA
addressing, msk(4) shall use bounce buffers.  Passing packets
through bounce buffers involves copy operation and it costs a lot.
You can check hw.busdma sysctl node to see whether there are
drivers that use bounce buffers.  And if you want to disable 64bit
DMA on 64bit architectures, add '#undef MSK_64BIT_DMA' just below
BUS_SPACE_MAXADDR check in if_mskreg.h.
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: Problem building world (amd64) this morning

2015-04-13 Thread Richard Kuhns
On 04/10/15 16:42, Dimitry Andric wrote:
 On 10 Apr 2015, at 21:32, Michael Grimm trash...@odo.in-berlin.de wrote:

 Dimitry Andric d...@freebsd.org wrote:

 On 10 Apr 2015, at 20:55, Michael Grimm trash...@odo.in-berlin.de wrote:

 I can confirm that. r281288 compiles without failing, r281289 fails.

 I've tried all possible ways of reproducing this problem, but it always
 works for me.  Can somebody who experiences the problem please do a
 clean build using script(1), and post the full build log somewhere?
 Preferably a make buildworld without -j, so commands are not
 interspersed.

 Compilation at r281289 is on its way. I'll send you a link after completion.
 
 Thanks, but you can stop that compilation now. :)  I finally managed to
 reproduce the problem, and it turns out I also had to MFC r272814 and
 r272815, which I have done in r281382.  That should really fix it
 properly... Sorry for the breakage.
 
 -Dimitry
 

Checking back in after being offline for a couple of days, I find it's
fixed :-)

Many thanks!
-- 
Richard Kuhns r...@wintek.com Main Number:  765-742-8428
Wintek Corporation Direct:   765-269-8541
427 N 6th Street   Internet Support: 765-269-8503
Lafayette, IN 47901-2211   Consulting:   765-269-8504
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Jenkins build is back to normal : FreeBSD_stable_9 #737

2015-04-13 Thread jenkins-admin
See https://jenkins.freebsd.org/job/FreeBSD_stable_9/737/

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org