Bug#932769: [moreinfo] DoS via DHCP request

2019-07-24 Thread Tomas Pospisek
So my interpretation of your initial bug report, that the VM would DoS the 
host on which it was running via fast changing of IP addresses on its 
interface was completely off the track?


So what you wanted in fact wanted to say by "DoS'ing the server" was that 
the VM sends huge amounts of DHCP requests to the DHCP server (possibly 
also in addition depleting IP addresses from the DHCP server's IP address 
pool) and *that* amounts to a DoS? Is my interpretation correct?


If that's the case, then I'm reassinging this bug report to 
isc-dhcp-client and merging it with the mentioned bug report #888209.


*t

On Tue, 23 Jul 2019, Mark Hutchison wrote:


Hi fellas,
Apologies for the brevity in the initial bug report.  I was using the reportbug 
tool directly from the console of the VM I was working on, small resolution.  
Allow me to elaborate...

We initially discovered this bug testing our storage product, we had a Debian 
10 VM running in a typical ESXi 6.7 environment with iSCSI backed storage.  The 
VM ran in a VMDK file on a VMFS datastore volume.  While the
VM was running in memory, we removed the storage initiators from ESXi 
purposefully to test something unrelated, to simulate a storage outage.  After 
a couple of minutes the OS will go into R/O mode without its disk,
and at that time dhclient will rapidly request IP's from our ISC DHCP server.  
dhclient will take the IP, consume it from the DHCP pool and then request 
another.  After some period of time this depletes the DHCP pool,
several hours to days depending on the scopes size.  This could also be 
replicated by deleting the hard disk from a running VM in a virtual environment.

When I look at systemctl for the dhclient service, I can see that there's an error, "can't 
create /var/lib/dhcp/dhclient.intname.leases Read Only file system", and then the 
DHCPREQUEST > DHCPACK > DHCPDECLINE sequence
starts every few seconds, and occasionally the service will show "RTNETLINK answers: 
File Exists."

I'm guessing from the error that dhclient has a problem with not being able to 
read / write to the client leases file, declines the IP and requests another, 
but secretly holds on to the IP.

The DHCP server logs will show a final DHCPDECLINE after the ACK, and mark the 
address as abandoned.  The VM will still have the address leased however.  
After a period of time VMware's guest tools will show all the
consumed IP's belonging to that MAC address and virtual interface.  Network 
gear ARP shows the IP's belonging to the same MAC as well.

We've consistently reproduced this bug in our lab, and performed the test 
simultaneously with a Debian 9, Centos and Ubuntu 16 instance to make sure it 
wasn't some kind of NetworkManager thing, or a broader Linux
issue.  

I see that someone reported this similar bug back in 2018 as well, I think they 
may be the same thing.
https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=888209

Thanks, just let me know if you have any questions.



On Tue, Jul 23, 2019 at 4:23 PM Tomáš Pospíšek  wrote:
  Am 23.07.19 um 17:57 schrieb Ben Hutchings:
  > On Tue, 2019-07-23 at 16:51 -0400, Tomas Pospisek wrote:
  >> Package: general
  >> Followup-For: Bug #932769
  >>
  >> Could you privide a recipe on how to reproduce this? There's a lot of
  >> very special setup below, that someone wwould need large amounts of 
time
  >> to reporoduce I feel.
  >>
  >> Is it possible to reduce the problem to something easily 
demonstratable?
  >>
  >> This seems to be an important issue to me.
  >>
  >> I think the problem here *might* be a kernel problem? Re-assign this to
  >> kernel package?
  > [...]
  >
  > So far as I know, the kernel only ever does DHCP if you net-boot
  > without an initramfs.

  My focus was more on this issue here - aparenty:

  Mark Hutchison wrote:

  >> This DoS's the server [due to DHCP changing IPs rapidly
  >> - my interpretation] and the interface attempts to take and discard
  >> IP's in a rapid fashion.

  -> changing IPs of an interface of a *VM* can DoS the server. Which I
  think is not expected, and not terribly funny. It takes a bit of not so
  straightforward circumstances (as far as I can understand the bug
  report), but then an attacker can DoS the server via DHCP. Which is uh,
  I mean ah, um.

  Information is a bit sparse here, though.

  If I may shoot completely off topic for a second: Woah, many thanks
  for your terrific kernel maintenance work Ben. Truly amazing :-o!!!
  Thanks so may times a lot! Woah :-) Thank you! (this doesn't exclude
  the rest of the kernel team - my thanks extend to you all - it's just
  that I have the honor to say thanks to a participating party in this
  email exchange 8v)!
  *t







Bug#932769: [moreinfo] DoS via DHCP request

2019-07-23 Thread Sven Hartge
On Tue, 23 Jul 2019 19:32:04 -0600 Mark Hutchison
 wrote:

> When I look at systemctl for the dhclient service, I can see that there's
> an error, "can't create /var/lib/dhcp/dhclient.intname.leases Read Only
> file system", and then the DHCPREQUEST > DHCPACK > DHCPDECLINE sequence
> starts every few seconds, and occasionally the service will show "RTNETLINK
> answers: File Exists."
> 
> I'm guessing from the error that dhclient has a problem with not being able
> to read / write to the client leases file, declines the IP and requests
> another, but secretly holds on to the IP.

> I see that someone reported this similar bug back in 2018 as well, I think
> they may be the same thing.
> https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=888209
> 
> Thanks, just let me know if you have any questions.

To confirm your findings: We saw the same as well with isc-dhcp-client.
As soon as the filesystem its lease file resides on becomes unreachable
or read-only, it throws a fit and just hammers away at the DHCP
infrastructure.

In our case every client has a fixed DHCP reservation and only ever gets
OFFERed the same IP, which he then declines, but when you have several
hundred clients flooding DHCP reequests at the same time, the load on
the infrastructure, including switches with DHCP Snooping active, is
immense.

I also think that #888209 is the same issue.

Coincidentally it also happened in out VMware cluster when an
iSCSI-backed LUN when down but you should be easily able to reproduce
this with a simple local KVM setup.

Grüße,
Sven.



signature.asc
Description: OpenPGP digital signature


Bug#932769: [moreinfo] DoS via DHCP request

2019-07-23 Thread Mark Hutchison
Hi fellas,

Apologies for the brevity in the initial bug report.  I was using the
reportbug tool directly from the console of the VM I was working on, small
resolution.  Allow me to elaborate...

We initially discovered this bug testing our storage product, we had a
Debian 10 VM running in a typical ESXi 6.7 environment with iSCSI backed
storage.  The VM ran in a VMDK file on a VMFS datastore volume.  While the
VM was running in memory, we removed the storage initiators from ESXi
purposefully to test something unrelated, to simulate a storage outage.
After a couple of minutes the OS will go into R/O mode without its disk,
and at that time dhclient will rapidly request IP's from our ISC DHCP
server.  dhclient will take the IP, consume it from the DHCP pool and then
request another.  After some period of time this depletes the DHCP pool,
several hours to days depending on the scopes size.  This could also be
replicated by deleting the hard disk from a running VM in a virtual
environment.

When I look at systemctl for the dhclient service, I can see that there's
an error, "can't create /var/lib/dhcp/dhclient.intname.leases Read Only
file system", and then the DHCPREQUEST > DHCPACK > DHCPDECLINE sequence
starts every few seconds, and occasionally the service will show "RTNETLINK
answers: File Exists."

I'm guessing from the error that dhclient has a problem with not being able
to read / write to the client leases file, declines the IP and requests
another, but secretly holds on to the IP.

The DHCP server logs will show a final DHCPDECLINE after the ACK, and mark
the address as abandoned.  The VM will still have the address leased
however.  After a period of time VMware's guest tools will show all the
consumed IP's belonging to that MAC address and virtual interface.  Network
gear ARP shows the IP's belonging to the same MAC as well.

We've consistently reproduced this bug in our lab, and performed the test
simultaneously with a Debian 9, Centos and Ubuntu 16 instance to make sure
it wasn't some kind of NetworkManager thing, or a broader Linux issue.

I see that someone reported this similar bug back in 2018 as well, I think
they may be the same thing.
https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=888209

Thanks, just let me know if you have any questions.



On Tue, Jul 23, 2019 at 4:23 PM Tomáš Pospíšek  wrote:

> Am 23.07.19 um 17:57 schrieb Ben Hutchings:
> > On Tue, 2019-07-23 at 16:51 -0400, Tomas Pospisek wrote:
> >> Package: general
> >> Followup-For: Bug #932769
> >>
> >> Could you privide a recipe on how to reproduce this? There's a lot of
> >> very special setup below, that someone wwould need large amounts of time
> >> to reporoduce I feel.
> >>
> >> Is it possible to reduce the problem to something easily demonstratable?
> >>
> >> This seems to be an important issue to me.
> >>
> >> I think the problem here *might* be a kernel problem? Re-assign this to
> >> kernel package?
> > [...]
> >
> > So far as I know, the kernel only ever does DHCP if you net-boot
> > without an initramfs.
>
> My focus was more on this issue here - aparenty:
>
> Mark Hutchison wrote:
>
> >> This DoS's the server [due to DHCP changing IPs rapidly
> >> - my interpretation] and the interface attempts to take and discard
> >> IP's in a rapid fashion.
>
> -> changing IPs of an interface of a *VM* can DoS the server. Which I
> think is not expected, and not terribly funny. It takes a bit of not so
> straightforward circumstances (as far as I can understand the bug
> report), but then an attacker can DoS the server via DHCP. Which is uh,
> I mean ah, um.
>
> Information is a bit sparse here, though.
>
> If I may shoot completely off topic for a second: Woah, many thanks
> for your terrific kernel maintenance work Ben. Truly amazing :-o!!!
> Thanks so may times a lot! Woah :-) Thank you! (this doesn't exclude
> the rest of the kernel team - my thanks extend to you all - it's just
> that I have the honor to say thanks to a participating party in this
> email exchange 8v)!
> *t
>


Bug#932769: [moreinfo] DoS via DHCP request

2019-07-23 Thread Tomáš Pospíšek
Am 23.07.19 um 17:57 schrieb Ben Hutchings:
> On Tue, 2019-07-23 at 16:51 -0400, Tomas Pospisek wrote:
>> Package: general
>> Followup-For: Bug #932769
>>
>> Could you privide a recipe on how to reproduce this? There's a lot of
>> very special setup below, that someone wwould need large amounts of time
>> to reporoduce I feel.
>>
>> Is it possible to reduce the problem to something easily demonstratable?
>>
>> This seems to be an important issue to me.
>>
>> I think the problem here *might* be a kernel problem? Re-assign this to
>> kernel package?
> [...]
> 
> So far as I know, the kernel only ever does DHCP if you net-boot
> without an initramfs.

My focus was more on this issue here - aparenty:

Mark Hutchison wrote:

>> This DoS's the server [due to DHCP changing IPs rapidly
>> - my interpretation] and the interface attempts to take and discard
>> IP's in a rapid fashion.

-> changing IPs of an interface of a *VM* can DoS the server. Which I
think is not expected, and not terribly funny. It takes a bit of not so
straightforward circumstances (as far as I can understand the bug
report), but then an attacker can DoS the server via DHCP. Which is uh,
I mean ah, um.

Information is a bit sparse here, though.

If I may shoot completely off topic for a second: Woah, many thanks
for your terrific kernel maintenance work Ben. Truly amazing :-o!!!
Thanks so may times a lot! Woah :-) Thank you! (this doesn't exclude
the rest of the kernel team - my thanks extend to you all - it's just
that I have the honor to say thanks to a participating party in this
email exchange 8v)!
*t



Bug#932769: [moreinfo] DoS via DHCP request

2019-07-23 Thread Ben Hutchings
On Tue, 2019-07-23 at 16:51 -0400, Tomas Pospisek wrote:
> Package: general
> Followup-For: Bug #932769
> 
> Could you privide a recipe on how to reproduce this? There's a lot of
> very special setup below, that someone wwould need large amounts of time
> to reporoduce I feel.
> 
> Is it possible to reduce the problem to something easily demonstratable?
> 
> This seems to be an important issue to me.
> 
> I think the problem here *might* be a kernel problem? Re-assign this to
> kernel package?
[...]

So far as I know, the kernel only ever does DHCP if you net-boot
without an initramfs.

Ben.

-- 
Ben Hutchings
You can't have everything.  Where would you put it?




signature.asc
Description: This is a digitally signed message part


Bug#932769: [moreinfo] DoS via DHCP request

2019-07-23 Thread Tomas Pospisek
One more question. When you say VNWare integrated product. AFAIK vmware 
have their own networking module in the kernel? Can you reproduce this 
with some other virtualisation technology like kvm, qemu?


And one more question: do depending on who does the DHCP receival in the 
VM (systemd? isc-dhcp-client? [...]?): shouldn't there be some rate 
limiting sanity check in the DHCP client?

*t

On Tue, 23 Jul 2019, Tomas Pospisek wrote:


Package: general
Followup-For: Bug #932769

Could you privide a recipe on how to reproduce this? There's a lot of
very special setup below, that someone wwould need large amounts of time
to reporoduce I feel.

Is it possible to reduce the problem to something easily demonstratable?

This seems to be an important issue to me.

I think the problem here *might* be a kernel problem? Re-assign this to
kernel package?

When you say that it DoS'es the server then what does "top" say? What is
being DoS'ed? Is it the CPU?
*t

It would be truly cool, if you could provide more infos.
*t


To: Debian Bug Tracking System 
Subject: general: DHCP request bug when storage lost
Date: Mon, 22 Jul 2019 14:48:00 -0600

Package: general
Severity: important
Tags: l10n

Dear Maintainer,

While doing unrelated storage testing for our VMware integrated product, we 
purposefully recreated
a storage outage by removing the iSCSI initiators from the backing array 
hosting the vmdk disk
images for the virtual machine.

Upon removal of uplinks to storage, the VM goes into a R/O file system state 
after 5-10 minutes.
When storage initiators are brought back up and the LUNs are rescanned, the VM 
begins to
rapidly request DHCP leases from an ISC DHCP server.  This DoS's the server in 
a way due
to the number of DHCPDECLINE errors, and the interface attempts to take and 
discard IP's in a
rapid fashion.

This only seems to appear on this distribution, and I can't replicate the 
behavior on Debian 9
or in a desktop environment.



-- System Information:
Debian Release: 10.0
  APT prefers stable
  APT policy: (500, 'stable')
Architecture: amd64 (x86_64)

Kernel: Linux 4.19.0-5-amd64 (SMP w/1 CPU core)
Locale: LANG=en_US.UTF-8, LC_CTYPE=en_US.UTF-8 (charmap=UTF-8), 
LANGUAGE=en_US.UTF-8 (charmap=UTF-8)
Shell: /bin/sh linked to /usr/bin/dash
Init: systemd (via /run/systemd/system)
LSM: AppArmor: enabled






-- System Information:
Debian Release: 10.0
 APT prefers stable
 APT policy: (500, 'stable')
Architecture: amd64 (x86_64)
Foreign Architectures: i386

Kernel: Linux 4.19.0-5-amd64 (SMP w/8 CPU cores)
Locale: LANG=de_CH.utf8, LC_CTYPE=de_CH.utf8 (charmap=UTF-8), LANGUAGE=de_CH:de 
(charmap=UTF-8)
Shell: /bin/sh linked to /bin/dash
Init: systemd (via /run/systemd/system)
LSM: AppArmor: enabled





Bug#932769: [moreinfo] DoS via DHCP request

2019-07-23 Thread Tomas Pospisek
Package: general
Followup-For: Bug #932769

Could you privide a recipe on how to reproduce this? There's a lot of
very special setup below, that someone wwould need large amounts of time
to reporoduce I feel.

Is it possible to reduce the problem to something easily demonstratable?

This seems to be an important issue to me.

I think the problem here *might* be a kernel problem? Re-assign this to
kernel package?

When you say that it DoS'es the server then what does "top" say? What is
being DoS'ed? Is it the CPU?
*t

It would be truly cool, if you could provide more infos.
*t

> To: Debian Bug Tracking System 
> Subject: general: DHCP request bug when storage lost
> Date: Mon, 22 Jul 2019 14:48:00 -0600
> 
> Package: general
> Severity: important
> Tags: l10n
> 
> Dear Maintainer,
> 
> While doing unrelated storage testing for our VMware integrated product, we 
> purposefully recreated
> a storage outage by removing the iSCSI initiators from the backing array 
> hosting the vmdk disk 
> images for the virtual machine.
> 
> Upon removal of uplinks to storage, the VM goes into a R/O file system state 
> after 5-10 minutes.
> When storage initiators are brought back up and the LUNs are rescanned, the 
> VM begins to 
> rapidly request DHCP leases from an ISC DHCP server.  This DoS's the server 
> in a way due
> to the number of DHCPDECLINE errors, and the interface attempts to take and 
> discard IP's in a
> rapid fashion. 
> 
> This only seems to appear on this distribution, and I can't replicate the 
> behavior on Debian 9
> or in a desktop environment.
> 
> 
> 
> -- System Information:
> Debian Release: 10.0
>   APT prefers stable
>   APT policy: (500, 'stable')
> Architecture: amd64 (x86_64)
> 
> Kernel: Linux 4.19.0-5-amd64 (SMP w/1 CPU core)
> Locale: LANG=en_US.UTF-8, LC_CTYPE=en_US.UTF-8 (charmap=UTF-8), 
> LANGUAGE=en_US.UTF-8 (charmap=UTF-8)
> Shell: /bin/sh linked to /usr/bin/dash
> Init: systemd (via /run/systemd/system)
> LSM: AppArmor: enabled





-- System Information:
Debian Release: 10.0
  APT prefers stable
  APT policy: (500, 'stable')
Architecture: amd64 (x86_64)
Foreign Architectures: i386

Kernel: Linux 4.19.0-5-amd64 (SMP w/8 CPU cores)
Locale: LANG=de_CH.utf8, LC_CTYPE=de_CH.utf8 (charmap=UTF-8), LANGUAGE=de_CH:de 
(charmap=UTF-8)
Shell: /bin/sh linked to /bin/dash
Init: systemd (via /run/systemd/system)
LSM: AppArmor: enabled