Re: CURRENT: re(4) crashing system

2016-10-23 Thread YongHyeon PYUN
On Sun, Oct 23, 2016 at 01:25:38PM +0200, Hartmann, O. wrote:
> I tried to report earlier here that CURRENT does have some serious
> problems right now and one of those problems seems to be triggered by
> the recent re(4) driver. The problem is also present in recen 11-STABLE!
> 
> Below, you'll find pciconf-output reagrding the device on a Lenovo E540
> Laptop I can test on and trigger the problem.
> 
> The phenomenon is that this NIC does not negotiate 1000baseTX, it is
> always falling back to 100baseTX although the device claims to be a 1
> GBit capable device.
> 
> When I try to put the device manually into 1000basTX mode via
> 
> ifconfig re0 media 1000baseTX mediaopt full-duplex (with re(4) driver)
> 
> it is possible to crash the system. The system also crashes when
> plugging/unplugging the LAN cord - I guess the renegotiation is
> triggering this crash immediately.
> 
> I tried with several switches and routers capable of 1 GBit and it
> seems to be independent from the network hardware in use.
> 
> I tried to capture a backtrace when the kernel crashes, but I do not
> know how to save the the kernel debugger output. Although I configured
> according the handbook debugging, there is no coredump at all.
> 
> Advice is appreciated - if anybody is interesetd in solving this. 
> 

There were several instability reports on re(4).  I vaguely guess
it would be related with some missing initializations for certain
controllers.  Unfortunately, there is no publicly available
datasheet for those controllers and it's not likely to get access
to it in near future.  It seems vendor's FreeBSD driver accesses
lots of magic registers as well as loading DSP fixups.  I have no
idea what it wants to do and re(4) used to heavily rely on power-on
default register values.  Engineering samples I have do not show
instabilities so it wouldn't be easy to identify the issue.

Probably the first step to address the issue would be identifying
those chips and narrowing down the scope of guessing.  Would you
show me the dmesg output(re(4) and regphy(4) only)?  pciconf(8)
output is useless here since RealTek uses the same PCI id for
PCIe variants.

BTW, I was told that the vendor's FreeBSD driver seems to work fine
for normal usage pattern.  The vendor's driver triggered an instant
panic and lacked H/W offloading features in the past.  It might
have changed though.
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: was: CURRENT [r307305]: r307823 still crashing

2016-10-23 Thread Benjamin Kaduk
On Sun, 23 Oct 2016, O. Hartmann wrote:

> How can I track a memory leak?

I think I did not read enough of the context, but vmstat and top can track
memory usage as a general thing.

> How can I write to disk the backtrace given by the debugger when
> crashing? My box I can freely test is using the nVidia BLOB and vt(), so
> I can not see the backtrace. I got a very bad screenshot on one of my
> laptops, but its so ugly/unreadable, I think it is unsuable to be
> presented within this list at a reasonable size (200 kB max ist too
> small).

The backtrace should be part of the crash dump that is written to the
(directly connected, non-encrypted, non-USB) swap device.  "call doadump"
at the debugger prompt (even typing blind) is supposed to make sure
there's a dump taken.

With respect to the screenshot, you should be able to post the image on an
external site and send a link to the list, at least.

-Ben
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


was: CURRENT [r307305]: r307823 still crashing

2016-10-23 Thread O. Hartmann
Am Sat, 15 Oct 2016 12:13:21 +0200
"O. Hartmann"  schrieb:

> Am Sat, 15 Oct 2016 10:22:42 +0200
> "O. Hartmann"  schrieb:
> 
> > Am Fri, 14 Oct 2016 10:48:33 +0200
> > "O. Hartmann"  schrieb:
> >   
> > > Systems I updated to recent CURRENT start crashing spontaneously.
> > > 
> > > recent crashing system is on
> > > 12.0-CURRENT FreeBSD 12.0-CURRENT #11 r307305: Fri Oct 14 08:37:59 CEST 
> > > 2016
> > > 
> > > other (no access since it is remote and not accessible until later the 
> > > day) has
> > > been updated ~ 12 hours ago and it is alos rebooting/crashing without any
> > > warnings. Can be triggered on heavy load.
> > > 
> > > Only system with r307263 and stable so far is an older two-socket XEON
> > > Core2Duao based machine, all crashing boxes have CPUs newer or equal than
> > > IvyBridge.
> > > 
> > > Does anyone also see these crashes? I tried to compile a debug kernel on 
> > > one
> > > host, but that's the remote machine I have access to later, it failed 
> > > compiling
> > > the kernel - under load it crashed often. After ZFS scrubbing kickied in, 
> > > it
> > > vanished from the net ;-/
> > > 
> > > kind regards,
> > > oh
> > > ___
> > > freebsd-current@freebsd.org mailing list
> > > https://lists.freebsd.org/mailman/listinfo/freebsd-current
> > > To unsubscribe, send any mail to 
> > > "freebsd-current-unsubscr...@freebsd.org"
> > 
> > Still 307341 is crashing undpredicted ( FreeBSD 12.0-CURRENT #5 r307341: 
> > Sat Oct 15
> > 09:36:16 CEST 2016).
> > 
> > I'm back to r307157, which seems to be "stable".
> >   
> 
> Seems, I'm the only one at the moment having those problems :-(
> 
> I now have a laptop avalable and start putting debugging options into the 
> kernel. But
> the laptop, so far, doesn't expose the problems of crashes  described above. 
> The laptop
> is the only system so far without ZFS!
> 
> The most frequent crashing box is a CURRENT server with the largest ZFS 
> volume. When on
> most recent CURRENT (>r307157, see above), starting a scrubbing on a RAIDZ 
> volume with ~
> 12 TB brutto size AND running a poudriere job, triggers the crash every 1 - 
> 18 minutes.
> Another box with only /home as ZFS volume on a dedicated hdd crashes after 
> minutes or
> hours. A laptop, also CURRENT (now at r307349) without ZFS is working stable 
> as long as
> I do not pull the LAN wire (a problem I described also in the list, I try to 
> capture the
> screen when crashing right now).

I spent now the last three days trying to figure out whether my custom config 
is faulty
or CURRENT has a serious bug. Even with GENERIC and in single user mode (it 
takes then
longer) CURRENT, now at  r307823, is crashing. The crashes seem to be unrelated 
to X11,
but I can trigger this crash faster when using firefox. I also can trigger it 
faster when
doing a "svn update" on a ZFS pool containing /usr/ports. Everyone who uses ZFS
on /usr/src or /usr/ports and updates via subversion knows that over time the 
update
process takes 10 - 15 minutes on ZFS volumes - compared to several minutes on 
UFS. And
while svn traverses the folder /usr/ports, the crash occurs.

I'm still wondering about the fact nobody else is facing such a periodically 
crashing.
The crash is, I already reported this, with CURRENT on several boxes with or 
without ZFS.

How can I track a memory leak?

How can I write to disk the backtrace given by the debugger when crashing? My 
box I can
freely test is using the nVidia BLOB and vt(), so I can not see the backtrace. 
I got a
very bad screenshot on one of my laptops, but its so ugly/unreadable, I think 
it is
unsuable to be presented within this list at a reasonable size (200 kB max ist 
too small).



pgpQuQqcwcZwq.pgp
Description: OpenPGP digital signature


CURRENT: re(4) crashing system

2016-10-23 Thread Hartmann, O.
I tried to report earlier here that CURRENT does have some serious
problems right now and one of those problems seems to be triggered by
the recent re(4) driver. The problem is also present in recen 11-STABLE!

Below, you'll find pciconf-output reagrding the device on a Lenovo E540
Laptop I can test on and trigger the problem.

The phenomenon is that this NIC does not negotiate 1000baseTX, it is
always falling back to 100baseTX although the device claims to be a 1
GBit capable device.

When I try to put the device manually into 1000basTX mode via

ifconfig re0 media 1000baseTX mediaopt full-duplex (with re(4) driver)

it is possible to crash the system. The system also crashes when
plugging/unplugging the LAN cord - I guess the renegotiation is
triggering this crash immediately.

I tried with several switches and routers capable of 1 GBit and it
seems to be independent from the network hardware in use.

I tried to capture a backtrace when the kernel crashes, but I do not
know how to save the the kernel debugger output. Although I configured
according the handbook debugging, there is no coredump at all.

Advice is appreciated - if anybody is interesetd in solving this. 

Thank you very much in advance and kind regards,

Oliver

[...]
re0@pci0:3:0:0: class=0x02 card=0x502817aa chip=0x816810ec rev=0x10
hdr=0x00 vendor = 'Realtek Semiconductor Co., Ltd.'
device = 'RTL8111/8168/8411 PCI Express Gigabit Ethernet
Controller' class  = network
subclass   = ethernet
bar   [10] = type I/O Port, range 32, base 0x3000, size 256, enabled
bar   [18] = type Memory, range 64, base 0xf0d04000, size 4096,
enabled bar   [20] = type Memory, range 64, base 0xf0d0, size
16384, enabled cap 01[40] = powerspec 3  supports D0 D1 D2 D3  current
D0 cap 05[50] = MSI supports 1 message, 64 bit 
cap 10[70] = PCI-Express 2 endpoint MSI 1 max data 128(128) RO
 link x1(x1) speed 2.5(2.5) ASPM disabled(L0s/L1)
cap 11[b0] = MSI-X supports 4 messages, enabled
 Table in map 0x20[0x0], PBA in map 0x20[0x800]
cap 03[d0] = VPD
ecap 0001[100] = AER 2 0 fatal 0 non-fatal 0 corrected
ecap 0002[140] = VC 1 max VC0
ecap 0003[160] = Serial 1 0100684ce000
ecap 0018[170] = LTR 1
ecap 001e[178] = unknown 1
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: installworld fails on missing tzsetup when WITHOUT_DIALOG is set

2016-10-23 Thread Baptiste Daroussin
On Sun, Oct 23, 2016 at 11:41:23AM +0300, Guy Yur wrote:
> On Sat, Oct 22, 2016 at 7:23 PM, Baptiste Daroussin  wrote:
> > On Sat, Oct 22, 2016 at 06:51:28PM +0300, Guy Yur wrote:
> >> Hi,
> >> ...
> >
> > My proposal is a bit different: build tzsetup without dialog support :)
> >
> > https://reviews.freebsd.org/D8325
> >
> > Best regards,
> > Bapt
> 
> Thanks.

FYI it is in

Best regards,
Bapt


signature.asc
Description: PGP signature


Re: installworld fails on missing tzsetup when WITHOUT_DIALOG is set

2016-10-23 Thread Guy Yur
On Sat, Oct 22, 2016 at 7:23 PM, Baptiste Daroussin  wrote:
> On Sat, Oct 22, 2016 at 06:51:28PM +0300, Guy Yur wrote:
>> Hi,
>> ...
>
> My proposal is a bit different: build tzsetup without dialog support :)
>
> https://reviews.freebsd.org/D8325
>
> Best regards,
> Bapt

Thanks.

Guy
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"