Hi again, I think, I have found a fix for this. With the following two patches applied to NetworkManager 1.12.2, the issue seems to be gone (final confirmation pending):
https://gitlab.freedesktop.org/NetworkManager/NetworkManager/commit/1913a4d2590d42413b04c0cf86321286bb2e2dbd <https://gitlab.freedesktop.org/NetworkManager/NetworkManager/commit/1913a4d2590d42413b04c0cf86321286bb2e2dbd> https://gitlab.freedesktop.org/NetworkManager/NetworkManager/commit/3392c439582b4ba5da19423fbaeed5843aeae9a9 <https://gitlab.freedesktop.org/NetworkManager/NetworkManager/commit/3392c439582b4ba5da19423fbaeed5843aeae9a9> I suspect, the latter made the difference because pppd basically hangs when the bus method call doesn’t return. Sven > On 23 Oct 2018, at 11:37, Sven Schwermer <s...@svenschwermer.de> wrote: > > Hi Aleksander, Matthew, > > I looked into this and the issue that Aleksander referred to in the other > thread (https://bugzilla.gnome.org/show_bug.cgi?id=734347 > <https://bugzilla.gnome.org/show_bug.cgi?id=734347>). In the issue, > Aleksander mentions that MM needs to be informed about the broken connection > after pppd has terminated so that the CLOCAL setting is restored when MM > interacts with the TTY again. However, if I interpret my logs correctly, MM > detects itself that the connection is broken based on the modem’s response to > AT+CGACT? (+CGACT: 1,0). This then triggers the notification signal > MM_PORT_CONNECTED (with connected=false) which makes port_connected try to > acquire the serial port lock which then fails. > > Sven > >> On 22 Oct 2018, at 16:52, Matthew Starr <mst...@hedonline.com >> <mailto:mst...@hedonline.com>> wrote: >> >> Aleksander, >> >> This is the exact same issue I reported a couple months ago: >> https://lists.freedesktop.org/archives/modemmanager-devel/2018-August/006593.html >> >> <https://lists.freedesktop.org/archives/modemmanager-devel/2018-August/006593.html> >> >> I am currently working around it by adding a NetworkManager dispatcher >> script that on ppp0 going down, I restart ModemManager. >> >> I never did find a solution to the issue even after trying some >> modifications to the serial force close code (src/mm-port-serial.c) in >> ModemManager. It looks like the force_closed boolean gets set and is never >> reset, which causes the serial port to never retry communications over the >> serial port again. >> >> Regards, >> Matthew Starr >> >>> Date: Mon, 22 Oct 2018 11:39:37 +0200 >>> From: Sven Schwermer <s...@svenschwermer.de <mailto:s...@svenschwermer.de>> >>> To: Aleksander Morgado <aleksan...@aleksander.es >>> <mailto:aleksan...@aleksander.es>> >>> Cc: modemmanager-devel@lists.freedesktop.org >>> <mailto:modemmanager-devel@lists.freedesktop.org> >>> Subject: Re: Infinite connection loop >>> Message-ID: <41689DBE-1755-4F85-BFA7- >>> 97cf6e588...@svenschwermer.de <mailto:97cf6e588...@svenschwermer.de>> >>> Content-Type: text/plain; charset="utf-8" >>> >>> I looked a little closer into this now. >>> >>> Here’s what I think is happening: >>> 1. My cell operator terminates the LCP after some time of inactivity. This >>> is >>> confirmed by my operator. >>> 2. pppd receives the termination request, acknowledges it and the modem >>> hangs up the connection. >>> 3. ModemManager fails to re-acquire serial port lock because of the hangup. >>> 4. ModemManager sets the connection state to ‘registered’. >>> 5. ModemManager force-closes the serial port. >>> 6. NetworkManager registers the lost connection and tries to re-establish >>> it. >>> 7. The ModemManager fails to re-open serial port because it had been force- >>> closed. >>> >>> pppd doesn’t appear to be re-launched. How is this all supposed to work? >>> I’m sure the software stack should be able to handle a terminated >>> connection and re-dial it? >>> >>> Please find attached the debug-level log. The LCP termination happens at Oct >>> 21 14:53:51. >>> >>> Best regards, >>> Sven >>> >>> -------------- next part -------------- >>> A non-text attachment was scrubbed... >>> Name: excerpt.long.gz >>> Type: application/x-gzip >>> Size: 53662 bytes >>> Desc: not available >>> URL: <https://lists.freedesktop.org/archives/modemmanager- >>> <https://lists.freedesktop.org/archives/modemmanager-> >>> devel/attachments/20181022/1e9e348d/attachment.bin> >>> -------------- next part -------------- >>> >>> >>>> On 20 Oct 2018, at 16:26, Sven Schwermer <s...@svenschwermer.de >>>> <mailto:s...@svenschwermer.de>> >>> wrote: >>>> >>>> Hi again, >>>> >>>> I now managed to capture the event in the journal. I limited the auto >>> connect retried to 4 in order to keep the log size manageable. I didn’t have >>> debug-level logging active for NetworkManager at the time. I’ll enable that >>> now and try to reproduce. Please find the log attached. The error seems to >>> be happening at Oct 20 07:44:26. >>>> >>>> Best regards, >>>> Sven >>>> >>>> <ed6e56cf2ac5b6bdb2e3ca97276ebd6d.journal.log> >>>> >>>>> On 20 Oct 2018, at 13:47, Aleksander Morgado >>> <aleksan...@aleksander.es <mailto:aleksan...@aleksander.es>> wrote: >>>>> >>>>> Hey, >>>>> >>>>>> I am running an embedded system with a u-blox SARA-U270 modem and >>> NetworkManager 1.12.2 / ModemManager 1.8.0. For some reason, the >>> modem connection seems to break over time. I am not quite sure why that’s >>> the case, but when I look into the journal, I see NetworkManager and >>> ModemManager spewing out log messages at a very high rate. This actually >>> causes significant system load (mostly by dbus-daemon). Here’s an excerpt >>> of the messages: >>>>>> >>>>>> Oct 17 12:54:29 cf352c1 ModemManager[1385]: Simple connect started... >>>>>> Oct 17 12:54:29 cf352c1 ModemManager[1385]: Simple connect state >>> (4/8): Wait to get fully enabled >>>>>> Oct 17 12:54:29 cf352c1 ModemManager[1385]: Simple connect state >>> (5/8): Register >>>>>> Oct 17 12:54:29 cf352c1 ModemManager[1385]: Simple connect state >>> (6/8): Bearer >>>>>> Oct 17 12:54:29 cf352c1 ModemManager[1385]: Simple connect started... >>>>>> Oct 17 12:54:29 cf352c1 ModemManager[1385]: Simple connect state >>> (4/8): Wait to get fully enabled >>>>>> Oct 17 12:54:29 cf352c1 ModemManager[1385]: Simple connect state >>> (5/8): Register >>>>>> Oct 17 12:54:29 cf352c1 ModemManager[1385]: Simple connect state >>> (6/8): Bearer >>>>>> Oct 17 12:54:29 cf352c1 NetworkManager[677]: <warn> >>> [1539780869.7776] modem-broadband[ttyACM0]: failed to connect modem: >>> Could not open serial device ttyACM0: it has been forced close >>>>>> Oct 17 12:54:29 cf352c1 NetworkManager[677]: <info> >>> [1539780869.7778] device (ttyACM0): state change: prepare -> failed (reason >>> 'unknown', sys-iface-state: 'managed') >>>>>> Oct 17 12:54:29 cf352c1 NetworkManager[677]: <info> >>> [1539780869.7807] manager: NetworkManager state is now >>> CONNECTED_LOCAL >>>>>> Oct 17 12:54:29 cf352c1 NetworkManager[677]: <warn> >>> [1539780869.7840] device (ttyACM0): Activation: failed for connection >>> 'gprsconnection' >>>>>> Oct 17 12:54:29 cf352c1 NetworkManager[677]: <info> >>> [1539780869.7860] device (ttyACM0): state change: failed -> disconnected >>> (reason 'none', sys-iface-state: 'managed') >>>>>> Oct 17 12:54:29 cf352c1 NetworkManager[677]: <info> >>> [1539780869.7923] policy: auto-activating connection 'gprsconnection' >>> (14e02637-0ffb-3a28-806f-ad60904afe7f) >>>>>> Oct 17 12:54:29 cf352c1 NetworkManager[677]: <info> >>> [1539780869.8005] device (ttyACM0): Activation: starting connection >>> 'gprsconnection' (14e02637-0ffb-3a28-806f-ad60904afe7f) >>>>>> Oct 17 12:54:29 cf352c1 NetworkManager[677]: <info> >>> [1539780869.8015] device (ttyACM0): state change: disconnected -> prepare >>> (reason 'none', sys-iface-state: 'managed') >>>>>> Oct 17 12:54:29 cf352c1 NetworkManager[677]: <info> >>> [1539780869.8043] manager: NetworkManager state is now CONNECTING >>>>>> Oct 17 12:54:29 cf352c1 ModemManager[1385]: Simple connect started... >>>>>> Oct 17 12:54:29 cf352c1 ModemManager[1385]: Simple connect state >>> (4/8): Wait to get fully enabled >>>>>> Oct 17 12:54:29 cf352c1 ModemManager[1385]: Simple connect state >>> (5/8): Register >>>>>> Oct 17 12:54:29 cf352c1 ModemManager[1385]: Simple connect state >>> (6/8): Bearer >>>>>> Oct 17 12:54:29 cf352c1 ModemManager[1385]: Simple connect started... >>>>>> Oct 17 12:54:29 cf352c1 ModemManager[1385]: Simple connect state >>> (4/8): Wait to get fully enabled >>>>>> Oct 17 12:54:29 cf352c1 ModemManager[1385]: Simple connect state >>> (5/8): Register >>>>>> Oct 17 12:54:29 cf352c1 ModemManager[1385]: Simple connect state >>> (6/8): Bearer >>>>>> Oct 17 12:54:29 cf352c1 NetworkManager[677]: <warn> >>> [1539780869.9321] modem-broadband[ttyACM0]: failed to connect modem: >>> Could not open serial device ttyACM0: it has been forced close >>>>>> Oct 17 12:54:29 cf352c1 NetworkManager[677]: <info> >>> [1539780869.9323] device (ttyACM0): state change: prepare -> failed (reason >>> 'unknown', sys-iface-state: 'managed') >>>>>> Oct 17 12:54:29 cf352c1 NetworkManager[677]: <info> >>> [1539780869.9350] manager: NetworkManager state is now >>> CONNECTED_LOCAL >>>>>> Oct 17 12:54:29 cf352c1 NetworkManager[677]: <warn> >>> [1539780869.9399] device (ttyACM0): Activation: failed for connection >>> 'gprsconnection' >>>>>> Oct 17 12:54:29 cf352c1 NetworkManager[677]: <info> >>> [1539780869.9430] device (ttyACM0): state change: failed -> disconnected >>> (reason 'none', sys-iface-state: 'managed') >>>>>> Oct 17 12:54:29 cf352c1 NetworkManager[677]: <info> >>> [1539780869.9490] policy: auto-activating connection 'gprsconnection' >>> (14e02637-0ffb-3a28-806f-ad60904afe7f) >>>>>> Oct 17 12:54:29 cf352c1 NetworkManager[677]: <info> >>> [1539780869.9569] device (ttyACM0): Activation: starting connection >>> 'gprsconnection' (14e02637-0ffb-3a28-806f-ad60904afe7f) >>>>>> Oct 17 12:54:29 cf352c1 NetworkManager[677]: <info> >>> [1539780869.9585] device (ttyACM0): state change: disconnected -> prepare >>> (reason 'none', sys-iface-state: 'managed') >>>>>> Oct 17 12:54:29 cf352c1 NetworkManager[677]: <info> >>> [1539780869.9653] manager: NetworkManager state is now CONNECTING >>>>>> >>>>>> It continues like that forever. My modem network connection config is >>> as follows: >>>>>> >>>>>> [connection] >>>>>> id=gprsconnection >>>>>> type=gsm >>>>>> autoconnect=true >>>>>> autoconnect-retries=0 >>>>>> [gsm] >>>>>> number=*99# >>>>>> [ipv4] >>>>>> dns-search= >>>>>> method=auto >>>>>> may-fail=false >>>>>> >>>>>> I have been looking through the code and what I don’t understand is >>> why the ttyACM0 would ever be forcefully closed. What could be an >>> explanation for this? >>>>>> >>>>>> Any insight is appreciated. >>>>>> >>>>> >>>>> Do you have a full log of when this issue happens? If you can >>>>> reproduce this more or less frequently, could you setup both >>>>> ModemManager and NetworkManager to dump debug logs and try to get >>> it >>>>> reproduced? This issue could be to a wrong synchronization between MM >>>>> and NM/pppd, in particular related to the CLOCAL settings in each case >>>>> (i.e. MM shouldn't get HUPs in the serial port unless for the cases >>>>> where the port is going away). >>>>> >>>>> -- >>>>> Aleksander >>>>> https://aleksander.es <https://aleksander.es/>
_______________________________________________ ModemManager-devel mailing list ModemManager-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/modemmanager-devel