On Mon, Sep 07, 2015 at 08:03:51PM +0200, carles.fenoll...@gmail.com wrote:
> >Synopsis:    Weekly network disconnect with G4 Mac Mini (gem0)
> >Category:    powerpc
> >Environment:
>       System      : OpenBSD 5.7
>       Details     : OpenBSD 5.7-stable (GENERIC) #2: Wed Aug 12 23:45:47 CEST 
> 2015
>                        root@mini:/usr/src/sys/arch/macppc/compile/GENERIC
> 
>       Architecture: OpenBSD.macppc
>       Machine     : macppc
> >Description:
> 
>       Hello,
> 
>       I'm experiencing a very strange bug with a headless G4 Mac Mini with 
> the gem0 network driver. The network disconnects by itself and the machine 
> loses all internet connectivity. It doesn't respond to pings/ssh even inside 
> the local network. The rest of the machines in my network seem unaffected so 
> it's not an issue regarding my router.
> 
> >How-To-Repeat:
> 
> I've narrowed it down to the following conditions:
> 
> - It usually happens about a week of regular usage. My G4 has a fairly 
> consistent usage pattern so it makes sense that the bug also appears with a 
> pattern.
> Here are some sample dates where the bug was triggered:
>       - Restart on 12/Aug 04:15, happens again on 19/Aug 15:15
>       - Restart on 22/Aug 23:10, happens again on 31/Aug 12:46
>       - Restart on 31/Aug 15:10, happens again on 5/Sep 16:11
> 
> - It once happened after just a couple hours heavily downloading data 
> (BitTorrent, so it can either be a number of connections issue or an absolute 
> tx/rx amount issue)
> 
> - It can be fixed with with "ifconfig gem0 down && ifconfig gem0 up", but not 
> unplugging and replugging the cable. A system restart also solves the issue. 
> 
> 
> There are no error logs. The closest I can get to an error log is the fact 
> that afpd times out, and I used this timestamp to establish the exact time of 
> the issue. 
> 
> I also run an internet-dependent cron job which starts to fail consistently 
> with the afpd error message, so I'm confident that the bug trigger time is 
> correct.
> 
> Here is what I can see on /var/log/messages for the time when the bug is 
> triggered:
> 
> Aug 22 23:09:57 mini afpd[8461]: afp_alarm: child timed out, entering 
> disconnected state
> Aug 22 23:09:57 mini afpd[8461]: dsi_disconnect: entering disconnected state
> Aug 22 23:09:57 mini afpd[8461]: dsi_disconnect: entering disconnected state
> 
> Another one:
> 
> Aug 31 12:46:19 mini afpd[24528]: afp_alarm: child timed out, entering 
> disconnected state
> Aug 31 12:46:19 mini afpd[24528]: dsi_disconnect: entering disconnected state
> Aug 31 12:46:19 mini afpd[24528]: dsi_wrtreply: Bad file descriptor
> Aug 31 12:46:19 mini afpd[24528]: dsi_disconnect: entering disconnected state
> 
> This one is from yesterday:
> 
> Sep  5 16:10:50 mini ntpd[6258]: 2 out of 4 peers valid
> Sep  5 16:10:50 mini ntpd[6258]: bad peer from pool pool.ntp.org 
> (46.17.142.10)
> Sep  5 16:10:50 mini ntpd[6258]: bad peer from pool pool.ntp.org 
> (194.140.131.21)
> 
> 
> I then try to grep on /var/log for timestamps which are close to that date, 
> but there are no other error messages.
> 
> The machine is running headless so I can't see if there are any error 
> messages on screen.
> 
> >Fix:
> 
> ifconfig gem0 down && ifconfig gem0 up
> 
> As to a permanent fix, here are some hyphotheses:
> 
> - It is clearly a network issue, since it's solved by an ifconfig down+up
> - It is probably something driver-related, since I googled and looked at the 
> mailing lists, and there is nobody experiencing the same issue. I guess there 
> are few people using OpenBSD on a G4 with the gem0 driver, so this may be an 
> untested corner case of the driver. If it were a system-wide issue, somebody 
> else would probably have noticed it.
> - This may be a data overflow. It can be either in a counter of absolute 
> tx/rx data, or number of connections. The weird weekly periodicity has 
> probably something to do with it. Or maybe connections aren't properly 
> cleaned up and eventually they fill up some buffer? This is my best guess
> - It does not seem to affect the kernel/other processes since there are no 
> dmesg messages and the system doesn't require a restart.
> 
> 
> Can anybody give me more pointers to further narrow down the issue?

I cant help you on the issue itself, but i can confirm you that i've
been seeing the exact same issue with gem0 on my g4 mac mini here, and
since some releases. randomly, gem0 just doesnt receive/send pkts
anymore and needs to be downed/upped.

Landry

Reply via email to