Bonjour Raphael,
Comme l'avait dit Youssef, nous avons communiqué hier sur le canal
privilégié qui est la ML France-IX, suivi par l'ensemble des membres
France-IX. Nous avons posté un rapport plus complet aujourd'hui, voir
ci-dessous.
Nous avons bien vérifié hier soir que les problèmes avec CF ou Google
dans la soirée n'étaient pas associés à France-IX.
Pour ce qui est des communications pendant les incidents, nous
privilégions la ML France-IX et demandons aux membres d'utiliser ce
canal pour un meilleur suivi. Nous allons également créer une page
prochainement pour afficher les maintenances/incidents en cours. Ca
permettra de donner de la visibilité également aux non-membres comme tu
l'indiques.
++
Simon
---
Dear members,
You will find below a report concerning the issue encountered yesterday
in the afternoon:
*12:20 (Paris time):* We started observing some unusual BUM traffic
(Broadcast, Unknown Unicast, Multicast) on PoPs where BUM rate limiting
is performed globally (not per interface): PA7, PAR1 and TH3 PoPs.
We tried to determine the origin of this flooded traffic, looking for
loops, checking MAC addresses consistency on different PoPs. At this
stage, our probe's network (a 10G probe per device) didn't raise any
alert and there was no loss observed by probes. Nonetheless, we had some
members complaining, indicating losses towards France-IX.
The sniffer's captures allowed us to determine that it was unknown
unicast traffic from several sources to few destinations. BUM traffic
reached 10 to 15Mbps. This traffic was observed even if MAC table
entries were OK.
*Around 15:00 (Paris time) :* BUM traffic reached more than 50Mbps,
causing additional impact, mainly on small and medium routers on
customer side. We cleared some MAC address entries where we observed
flooding, with no effect. As we didn't observe any abnormal behaviour on
customer side we started clearing some MPLS/LSPs circuits and shutting
down backbone links one by one to avoid to create additional impact.
This allowed to isolate the problem, issue was located on PAR5 PoP,
clearing MPLS circuits used between PAR5 and PAR1. During these
operations, PAR1 PoP was isolated during 4 minutes between 16:00 and
16:04 in order to find the root cause.
We are in touch with the vendor to understand this behaviour and sharing
logs to find the root cause. We will keep you informed as soon as we
have more information.
---
Location: FranceIX Paris LAN
Incident start: 14th of January 2018, 12:21 (UTC+1, Paris Time)
Incident end: 14th of January 2018, 16:08 (UTC+1, Paris Time)
Customer impact: Some members observed packed loss during this period
---
We share with you the different works in progress to detect this kind of
issues:
- Specific alerts when BUM traffic threshold is reached on every PoP
(*already done since yesterday*)
- Enhancement on QoS probes to be as close as possible to member
configuration : BGP router configured on each probe and permanent
traffic generated. This will be deployed in Q1-2019
- We plan to test 18.R1 firmware soon. This version enhances the way
of processes and memory are managed in the platform. This will be tested
during Q1-2019 and probably deployed during Q2-2019
- EVPN : For long term, we plan to activate EVPN, and BUM traffic
will be better controlled
- Definition of a specific process to react quickly if the issue
occurs again
We apologize again for such issue. Sorry if you considered we didn't
communicate enough during the incident, we communicated as soon as we
had new information to provide
Le 14/01/2019 à 21:40, Raphael Mazelier a écrit :
On 14/01/2019 20:59, Radu-Adrian Feurdean wrote:
Presque certainement pas. Le traffic avait disparu aussi via
Equinix-IX pour passer (apres une chute brutale) entierement sur du
transit. Actuellement ca a l'air de preprendre un peu cote Equinix.
Cote FranceIX, je sais pas (je fais du prepend), mais le "festival
Akamai" a bien commence son episode de cette soiree (traffic qui
bascule de PNI vers France-IX).
OK merci de la précision. Ce qui me faisait penser à ca c'était des
reports de personne qui avait perdu 8.8.8.8 aussi. Sinon il y a quoi
qui tabasse les CDNs en ce moment pour qu'ils doivent re-router ?
--
Raphael Mazelier
---------------------------
Liste de diffusion du FRnOG
http://www.frnog.org/
--
Simon Muyal
CTO
FranceIX
Tél: +33 (0)1 70 61 97 74
Mob: +33 (0)6 21 17 29 51
---------------------------
Liste de diffusion du FRnOG
http://www.frnog.org/