Re: Loss of Network Adress is DHCP Server failed for some hours

2017-10-02 Thread Oliver Freyermuth via networkmanager-list
Hِ! 

Am 02.10.2017 um 19:31 schrieb Thomas Haller:
> On nm-1-8 (which is in latest CentOS) you can instead do:
> 
>   nmcli con mod $CON ipv4.dhcp-timeout 2147483647
> 
> which is as good as setting "infinity".
Many thanks for this additional information, and also the implementation 
details! 

I did now go with the technique of deploying the configuration change. 
In our case, the servers are installed with Foreman / Kickstart, and managed 
with Puppet,
using DHCP reserved to assign the adresses and keep DHCP settings (like list of 
DNS servers etc.) more flexible. 
So the easiest approach for me was indeed the change to the configuration files 
(via Puppet),
affecting all interfaces / connections, as outlined in the mail by Francesco 
sent to Olaf. 

Nevertheless, many thanks and all the best, 
Oliver



signature.asc
Description: OpenPGP digital signature
___
networkmanager-list mailing list
networkmanager-list@gnome.org
https://mail.gnome.org/mailman/listinfo/networkmanager-list


Re: Loss of Network Adress is DHCP Server failed for some hours

2017-10-02 Thread Thomas Haller
On Mon, 2017-10-02 at 18:56 +0200, Oliver Freyermuth via
networkmanager-list wrote:
> Hi! 
> 
> > Put the relevant info in the answer to Olaf.
> 
> Many thanks! 
> > 
> > > Understood. Still I would strongly prefer it if there was an
> > > option
> > > to keep trying forever, as all other network managers I know do
> > > (dhclient, dhcpcd, any device I have encountered so far).
> > 
> > Something is already there on upstream master. You can do:
> > nmcli con mod $CON ipv4.dhcp-timeout infinity
> > 
> > but it is available via nmcli only...
> 
> Like this, it is not useful to me (yet), but good to see it's there! 

Hi,

On nm-1-8 (which is in latest CentOS) you can instead do:

  nmcli con mod $CON ipv4.dhcp-timeout 2147483647

which is as good as setting "infinity".

On master, "infinity" is really only an alias for 2147483647.
What is also different on master, is that NM will not even schedule a
timeout in case of 2147483647. On nm-1-8 instead, it will schedule an
extremely long timeout.
For practical purpose, both behave the same.


best,
Thomas

signature.asc
Description: This is a digitally signed message part
___
networkmanager-list mailing list
networkmanager-list@gnome.org
https://mail.gnome.org/mailman/listinfo/networkmanager-list


Re: Loss of Network Adress is DHCP Server failed for some hours

2017-10-02 Thread Oliver Freyermuth via networkmanager-list
Hi! 

> Put the relevant info in the answer to Olaf.
Many thanks! 
> 
>> Understood. Still I would strongly prefer it if there was an option
>> to keep trying forever, as all other network managers I know do
>> (dhclient, dhcpcd, any device I have encountered so far).
> 
> Something is already there on upstream master. You can do:
> nmcli con mod $CON ipv4.dhcp-timeout infinity
> 
> but it is available via nmcli only...
Like this, it is not useful to me (yet), but good to see it's there! 

> Sorry, link to the same bugzilla as above.
> Summing up, basically, what could be taken into account is to review
> the meaning of ipv4/ipv6.may-fail=yes: we can keep retrying dhcp while
> keeping the connection active.
> This would solve all the complains while leaving the "stop" behavior
> there if one chooses may-fail=no.
Agreed - this would be nice for a future version. 
Until then, at least the description in your mail to Olaf provides me with a 
viable workaround for the servers. 

Many thanks and cheers, 
Oliver



signature.asc
Description: OpenPGP digital signature
___
networkmanager-list mailing list
networkmanager-list@gnome.org
https://mail.gnome.org/mailman/listinfo/networkmanager-list


Re: Loss of Network Adress is DHCP Server failed for some hours

2017-10-02 Thread Francesco Giudici
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256



On 02/10/2017 17:53, Oliver Freyermuth wrote:
> Hi!
> 
> [...] Sadly, also -EPERM here. We use CentOS 7, so there's no RHEL
> support contract. Could you maybe describe the solution given
> there? I would be curious to implement that on our systems.


Sorry, I did not realized access to that bugzilla was restricted.
Put the relevant info in the answer to Olaf.

> 
>> The fact that an ipv4 connection may fail (also one with dhcp) is
>> a feature: this would allow for instance to setup multiple
>> connections with different priorities on the same interface,
>> giving first a try to the dhcp connection and then falling back
>> to another one with static ipv4 address or with 802.1x
>> configuration.
> Understood. Still I would strongly prefer it if there was an option
> to keep trying forever, as all other network managers I know do
> (dhclient, dhcpcd, any device I have encountered so far).

Something is already there on upstream master. You can do:
nmcli con mod $CON ipv4.dhcp-timeout infinity

but it is available via nmcli only...

> I think this "option" I am longing for is the suggestion you
> describe in your last paragraph.
> 
> If there is only one (DHCP) connection configured for the
> interface, I would even expect "trying DHCP forever" to be the
> default behaviour, since there is no fallback to fall back to. Does
> that sound reasonable?
> 
> Alternatively: Is it possible to tell network manager to retry the
> complete activation cycle, i.e. retry all configured connections
> for the interface (in order) again after all have failed? In case
> only one connection (DHCP) is configured, this would effectively
> result in trying DHCP forever.

Yes, this seems reasonable.
Maybe keep periodically trying activation of connection with
auto-activate enabled that point to an interface with no active
connections on top of it.

> 
>> If you really want your dhcp connection to keep trying forever
>> the only viable solution at present seems to be the
>> ipv4.dhcp-timeout property. Maybe we could manage to keep trying
>> also with a brand new value to the ipv4.method... see: 
>> https://bugzilla.redhat.com/show_bug.cgi?id=1350830#c1 Apart from
>> that, there is nothing I would change.
> That sounds like a good idea (for the future). Sadly, -EPERM for
> your link from my side.
Sorry, link to the same bugzilla as above.
Summing up, basically, what could be taken into account is to review
the meaning of ipv4/ipv6.may-fail=yes: we can keep retrying dhcp while
keeping the connection active.
This would solve all the complains while leaving the "stop" behavior
there if one chooses may-fail=no.

Cheers

Francesco

> 
> Cheers and many thanks for your detailed reply! Much appreciated!
> 
> Oliver
> 
-BEGIN PGP SIGNATURE-

iQEzBAEBCAAdFiEEWw0H+TwdTVfQ8jil6Tt6PuC/5W8FAlnSbgoACgkQ6Tt6PuC/
5W9CcAf/dRPhaqIYIQfVjfwFVjcxmmP51KMc1kGjtee517eYD95C9ctjBFEW84Y1
IuAv9Wj4PD0GV5pA2/ugLXBO1+sOKGtNhZGogXPG1QhnAJiUorlmw7g7kCmA0uNJ
oW6iQauzkROEzxTafOVBzlGXElTIeKQ2Ne9GbsxTYL/xcP9UTMDHEj/xZsLEB01o
F4S69XOmD1B4bwip0z7BDIrWKp5TAWuNmIY4wkrLjUwa4nL9+EkgbsWTRYsDSj8l
nIStfjTmudN/8YPys5o7oEmG4FwbF+q97zuZYjoCZijk5vX6CHrSuTPbKtl5RnAA
wYUr0p4shE14Lk7OzBucUZ2H45Vnjw==
=w4aR
-END PGP SIGNATURE-
___
networkmanager-list mailing list
networkmanager-list@gnome.org
https://mail.gnome.org/mailman/listinfo/networkmanager-list


Re: Loss of Network Adress is DHCP Server failed for some hours

2017-10-02 Thread Andrei Borzenkov
02.10.2017 18:22, Olaf Hering пишет:
> On Mon, Oct 02, Francesco Giudici wrote:
> 
>> With that anyway you miss the option of having different connections
>> that could fallback if the "primary" one with dhcp fails.
> 
> How is it a failure if the DHCP server disappears, perhaps right after
> it provided a lease? Well, there is likely some blurb in the RFCs about
> what must be done when the lease expired. 

RFC requires that when upon lease expiration client stop using address
(actually it literally says "stop any network activity") and return to
initial state of acquiring address.

> Defaulting to fail the
> interface from NM point of view is certainly undesired behaviour.
> 

That is what is required by RFC. You cannot continue to use address
allocated by DHCP server after lease has expired unless you succeeded in
extending (renewing) lease.

>> A change in the default NetworkManager.conf can switch it off for
>> default connections leaving the feature there if needed:
>> (https://bugzilla.redhat.com/show_bug.cgi?id=1350830#c7).
> 
> -EPERM
> 
> Olaf
> 
> 
> 
> ___
> networkmanager-list mailing list
> networkmanager-list@gnome.org
> https://mail.gnome.org/mailman/listinfo/networkmanager-list
> 




signature.asc
Description: OpenPGP digital signature
___
networkmanager-list mailing list
networkmanager-list@gnome.org
https://mail.gnome.org/mailman/listinfo/networkmanager-list


Re: Loss of Network Adress is DHCP Server failed for some hours

2017-10-02 Thread Francesco Giudici


On 02/10/2017 17:22, Olaf Hering wrote:
> On Mon, Oct 02, Francesco Giudici wrote:
> 
>> With that anyway you miss the option of having different
>> connections that could fallback if the "primary" one with dhcp
>> fails.
> 
> How is it a failure if the DHCP server disappears, perhaps right
> after it provided a lease? Well, there is likely some blurb in the
> RFCs about what must be done when the lease expired. Defaulting to
> fail the interface from NM point of view is certainly undesired
> behaviour.

Yeah, you can only manage it with the dhcp-timeout value.
> 
>> A change in the default NetworkManager.conf can switch it off
>> for default connections leaving the feature there if needed: 
>> (https://bugzilla.redhat.com/show_bug.cgi?id=1350830#c7).
> 
> -EPERM

Sorry, my fault.
Here what I meant:
"It is possible to set a global default for ipv4.dhcp-timeout, see 'man
NetworkManager.conf' for the details. For example you could add the
following to /etc/NetworkManager/conf.d/10-persistent-dhcp.conf:

 [connection-eth-dhcp-timeout]
 match-device=type:ethernet
 ipv4.dhcp-timeout=2147483647

so that all ethernet connections with unset timeout
(ipv4.dhcp-timeout=0) inherit the new value."

> 
> Olaf
> 
___
networkmanager-list mailing list
networkmanager-list@gnome.org
https://mail.gnome.org/mailman/listinfo/networkmanager-list


Re: Loss of Network Adress is DHCP Server failed for some hours

2017-10-02 Thread Oliver Freyermuth via networkmanager-list
Hi! 

Am 02.10.2017 um 17:02 schrieb Francesco Giudici:
>> [...] increase the DHCP timeout, which
>> seems to translate into the time "NetworkManager keeps dhclient
>> alive before killing it".
> 
> That's right, this is the way we think should be addressed.
Ok - that seems a possible solution. 

>> Sadly, this can only be set per connection, which makes it hard to
>> roll this out onto a large set of servers which should use default
>> DHCP configuration.
> Well, you can set it system wide. See:
> https://bugzilla.redhat.com/show_bug.cgi?id=1350830#c7
Sadly, also -EPERM here. We use CentOS 7, so there's no RHEL support contract. 
Could you maybe describe the solution given there? I would be curious to 
implement that on our systems. 

> The fact that an ipv4 connection may fail (also one with dhcp) is a
> feature: this would allow for instance to setup multiple connections
> with different priorities on the same interface, giving first a try to
> the dhcp connection and then falling back to another one with static
> ipv4 address or with 802.1x configuration.
Understood. Still I would strongly prefer it if there was an option to keep 
trying forever, as all other network managers I know do (dhclient, dhcpcd, any 
device I have encountered so far). 
I think this "option" I am longing for is the suggestion you describe in your 
last paragraph. 

If there is only one (DHCP) connection configured for the interface, I would 
even expect "trying DHCP forever" to be the default behaviour, since there is 
no fallback to fall back to. 
Does that sound reasonable? 

Alternatively: Is it possible to tell network manager to retry the complete 
activation cycle, i.e. retry all configured connections for the interface (in 
order) again after all have failed? 
In case only one connection (DHCP) is configured, this would effectively result 
in trying DHCP forever. 

> If you really want your dhcp connection to keep trying forever the
> only viable solution at present seems to be the ipv4.dhcp-timeout
> property.
> Maybe we could manage to keep trying also with a brand new value to
> the ipv4.method... see:
> https://bugzilla.redhat.com/show_bug.cgi?id=1350830#c1
> Apart from that, there is nothing I would change.
That sounds like a good idea (for the future). Sadly, -EPERM for your link from 
my side. 

Cheers and many thanks for your detailed reply! Much appreciated! 

Oliver



signature.asc
Description: OpenPGP digital signature
___
networkmanager-list mailing list
networkmanager-list@gnome.org
https://mail.gnome.org/mailman/listinfo/networkmanager-list


Re: Loss of Network Adress is DHCP Server failed for some hours

2017-10-02 Thread Olaf Hering
On Mon, Oct 02, Francesco Giudici wrote:

> With that anyway you miss the option of having different connections
> that could fallback if the "primary" one with dhcp fails.

How is it a failure if the DHCP server disappears, perhaps right after
it provided a lease? Well, there is likely some blurb in the RFCs about
what must be done when the lease expired. Defaulting to fail the
interface from NM point of view is certainly undesired behaviour.

> A change in the default NetworkManager.conf can switch it off for
> default connections leaving the feature there if needed:
> (https://bugzilla.redhat.com/show_bug.cgi?id=1350830#c7).

-EPERM

Olaf


signature.asc
Description: PGP signature
___
networkmanager-list mailing list
networkmanager-list@gnome.org
https://mail.gnome.org/mailman/listinfo/networkmanager-list


Re: Loss of Network Adress is DHCP Server failed for some hours

2017-10-02 Thread Francesco Giudici
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256



On 02/10/2017 08:58, Olaf Hering wrote:
> On Sun, Oct 1, Oliver Freyermuth  
> wrote:
> 
>> Is this a bug, or a feature?
> 
> This is a bug.
> 
> I'm sure no standard requires the DHCP server to come back within 3
> minutes. NetworkManager must keep retrying, forever.
> 
> Reported here: http://bugzilla.suse.com/show_bug.cgi?id=584544
> 
> Workaround here: https://build.opensuse.org/request/show/519609
> 
> --- a/src/devices/nm-device.c +++ b/src/devices/nm-device.c @@ 
> -79,7 +79,7 @@ _LOG_DECLARE_SELF (NMDevice); 
> /***/
>
>
>
>
> 
#define DHCP_RESTART_TIMEOUT   120 -#define DHCP_NUM_TRIES_MAX
> 3 +#define DHCP_NUM_TRIES_MAX -1UL #define DEFAULT_AUTOCONNECT 
> TRUE
> 
> /***/
>
>
>
>
> 
Complete fix is to wipe all usage of DHCP_NUM_TRIES_MAX.

Hi Olaf, thanks for sharing your patch.
With that anyway you miss the option of having different connections
that could fallback if the "primary" one with dhcp fails.
I would not trow away that feature.
A change in the default NetworkManager.conf can switch it off for
default connections leaving the feature there if needed:
(https://bugzilla.redhat.com/show_bug.cgi?id=1350830#c7).

Francesco

> 
> 
> Olaf
> 
> 
> 
> ___
> networkmanager-list mailing list networkmanager-list@gnome.org 
> https://mail.gnome.org/mailman/listinfo/networkmanager-list
> 
-BEGIN PGP SIGNATURE-

iQEzBAEBCAAdFiEEWw0H+TwdTVfQ8jil6Tt6PuC/5W8FAlnSV3wACgkQ6Tt6PuC/
5W9cvgf/WmJaiglx0WSn7diS6/qiryfNaJz0FdjrRppd3Ze3la+j1p0q0HUIKAZ7
1gnHkx8RqXLeM2KPEZdcAjcNYgkn+uW+GR8ao0Pylb2PmOorLpzhNoLWjd3S6oBh
q4BX8qSHnj1PChzQ7PBKqi8wq7mRPuaEhSykYT7xUFbhlZa6GNWpv1M2plTIQBX1
w880IqxWswiq8YbyN6Zo7m0IRt9ZGDxZX23OtDXNXDalmW/Ju4P1g7ZPciP0uL0z
cmwztzHvFBvBGB6TA4FGEr7piwgLcPRO5aqlIg35cvYr1qEG89mL3/YPtEjI9rUD
+1Q+7WZFeaZ2LbSQBoUKHcg62iZnYQ==
=6NMk
-END PGP SIGNATURE-
___
networkmanager-list mailing list
networkmanager-list@gnome.org
https://mail.gnome.org/mailman/listinfo/networkmanager-list


Re: Loss of Network Adress is DHCP Server failed for some hours

2017-10-02 Thread Francesco Giudici
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256



On 02/10/2017 15:05, Oliver Freyermuth via networkmanager-list wrote:
> [...] Another option would be to increase the DHCP timeout, which
> seems to translate into the time "NetworkManager keeps dhclient
> alive before killing it".

That's right, this is the way we think should be addressed.

> Sadly, this can only be set per connection, which makes it hard to
> roll this out onto a large set of servers which should use default
> DHCP configuration.
Well, you can set it system wide. See:
https://bugzilla.redhat.com/show_bug.cgi?id=1350830#c7

> 
> It would be really nice if NetworkManager would behave like *any*
> other DHCP client here, or just let dhclient do its job properly
> instead of killing it regularly, breaking the rule of least
> surprise.
> 
> Any thoughts on this? Any help on the matter?

In NetworkManager you have the ipv4.may-fail=yes property.
So, NetworkManager takes into account that ipv4 (and so dhcp) may
fail, and would stop keeping trying the connection, but could mark the
connection as active if ipv6 succeeds.
Please note however that ipv4.may-fail=no would not prevent the
connection to fail: it would just ensure that to activate the
connection you need the success of the ipv4 method (also if an address
for ipv6 is acquired in the meanwhile).
The fact that an ipv4 connection may fail (also one with dhcp) is a
feature: this would allow for instance to setup multiple connections
with different priorities on the same interface, giving first a try to
the dhcp connection and then falling back to another one with static
ipv4 address or with 802.1x configuration.

If you really want your dhcp connection to keep trying forever the
only viable solution at present seems to be the ipv4.dhcp-timeout
property.
Maybe we could manage to keep trying also with a brand new value to
the ipv4.method... see:
https://bugzilla.redhat.com/show_bug.cgi?id=1350830#c1
Apart from that, there is nothing I would change.

Francesco

> 
> Cheers, Oliver
> 
> 
> 
> ___ networkmanager-list
> mailing list networkmanager-list@gnome.org 
> https://mail.gnome.org/mailman/listinfo/networkmanager-list
> 
-BEGIN PGP SIGNATURE-

iQEzBAEBCAAdFiEEWw0H+TwdTVfQ8jil6Tt6PuC/5W8FAlnSVOcACgkQ6Tt6PuC/
5W9CzAf+NCMI4xRaseX0hDa1+LZY4PQRIJ0DZvjuDwzSOI9Y3iMsmoHtIy5GM02Z
cWXPXhgBbFmIBI30nr8QqP3gsXQwWYcqLK0uPcdr6jpAw3QL44fxUuLvEWDOfnIr
FWPdFFhTrdl1EJ0wrVnU9Ttf36+6wvXS1JjL5ryfzIzTbh55mIz6bpKjaBI414/h
AK6A1xF8riOjBfOaXBXuW1agd2osFpuayEIjaDMQETyEvL+T0Y/GhNQIPYtqKYRF
oLB1ePVaW00p1+2JRUocPkYNqnRl3sgeBXZXKJbes7pQk4CTc0/+lQKNuHz5ejFg
ywMGxiGO21LHAJFqnKCL9VO/rs8yyg==
=syfh
-END PGP SIGNATURE-
___
networkmanager-list mailing list
networkmanager-list@gnome.org
https://mail.gnome.org/mailman/listinfo/networkmanager-list


Re: Loss of Network Adress is DHCP Server failed for some hours

2017-10-02 Thread Oliver Freyermuth via networkmanager-list
Hi, 

a small heads-up from my side on this critical bug... Would really like to have 
some upstream developer opinion on this, though. 

Am 02.10.2017 um 09:20 schrieb Oliver Freyermuth:
> Is specifying
>  autoconnect-retries-default=0
> in the [main] section of NetworkManager.conf a working workaround until this 
> fundamental issue is resolved (which would work without recompiling)? 
I tested. It has no effect at all, it seems the hardcoded DHCP_NUM_TRIES_MAX is 
the only thing which is used if DHCP fails. The autoconnect-retries seems to be 
used only for non-DHCP errors. 

Another option would be to increase the DHCP timeout, which seems to translate 
into the time "NetworkManager keeps dhclient alive before killing it". 
Sadly, this can only be set per connection, which makes it hard to roll this 
out onto a large set of servers which should use default DHCP configuration. 

It would be really nice if NetworkManager would behave like *any* other DHCP 
client here, or just let dhclient do its job properly instead of killing it 
regularly, breaking the rule of least surprise. 

Any thoughts on this? Any help on the matter? 

Cheers, 
Oliver



signature.asc
Description: OpenPGP digital signature
___
networkmanager-list mailing list
networkmanager-list@gnome.org
https://mail.gnome.org/mailman/listinfo/networkmanager-list


Re: Loss of Network Adress is DHCP Server failed for some hours

2017-10-02 Thread Oliver Freyermuth via networkmanager-list
Am 02.10.2017 um 08:58 schrieb Olaf Hering:
> On Sun, Oct 1, Oliver Freyermuth  wrote:
> 
>> Is this a bug, or a feature?
> 
> This is a bug.
> 
> I'm sure no standard requires the DHCP server to come back within 3
> minutes. NetworkManager must keep retrying, forever.
Thanks for confirming!
That's exactly what I thought should be the case. 
> 
> Reported here:
> http://bugzilla.suse.com/show_bug.cgi?id=584544
> 
> Workaround here:
> https://build.opensuse.org/request/show/519609
> 
Both these (downstream) reports seem to be closed as WONTFIX. Is this also 
tracked upstream somewhere? 

Is specifying
 autoconnect-retries-default=0
in the [main] section of NetworkManager.conf a working workaround until this 
fundamental issue is resolved (which would work without recompiling)? 

> --- a/src/devices/nm-device.c
> +++ b/src/devices/nm-device.c
> @@ -79,7 +79,7 @@ _LOG_DECLARE_SELF (NMDevice);
>  
> /*/
>  
>  #define DHCP_RESTART_TIMEOUT   120
> -#define DHCP_NUM_TRIES_MAX 3
> +#define DHCP_NUM_TRIES_MAX -1UL
>  #define DEFAULT_AUTOCONNECTTRUE
>  
>  
> /*/
> 
> Complete fix is to wipe all usage of DHCP_NUM_TRIES_MAX.
> 
I agree. Thanks for the workaround, but to get things going into all downstream 
distributions, a fix accepted upstream would be good. 

Cheers and thanks, 
Oliver



signature.asc
Description: OpenPGP digital signature
___
networkmanager-list mailing list
networkmanager-list@gnome.org
https://mail.gnome.org/mailman/listinfo/networkmanager-list


Re: Loss of Network Adress is DHCP Server failed for some hours

2017-10-02 Thread Olaf Hering
On Sun, Oct 1, Oliver Freyermuth  wrote:

> Is this a bug, or a feature?

This is a bug.

I'm sure no standard requires the DHCP server to come back within 3
minutes. NetworkManager must keep retrying, forever.

Reported here:
http://bugzilla.suse.com/show_bug.cgi?id=584544

Workaround here:
https://build.opensuse.org/request/show/519609

--- a/src/devices/nm-device.c
+++ b/src/devices/nm-device.c
@@ -79,7 +79,7 @@ _LOG_DECLARE_SELF (NMDevice);
 /*/
 
 #define DHCP_RESTART_TIMEOUT   120
-#define DHCP_NUM_TRIES_MAX 3
+#define DHCP_NUM_TRIES_MAX -1UL
 #define DEFAULT_AUTOCONNECTTRUE
 
 /*/

Complete fix is to wipe all usage of DHCP_NUM_TRIES_MAX.


Olaf


signature.asc
Description: PGP signature
___
networkmanager-list mailing list
networkmanager-list@gnome.org
https://mail.gnome.org/mailman/listinfo/networkmanager-list