Bonjour Raphael,

Comme l'avait dit Youssef, nous avons communiqué hier sur le canal privilégié qui est la ML France-IX, suivi par l'ensemble des membres France-IX. Nous avons posté un rapport plus complet aujourd'hui, voir ci-dessous.

Nous avons bien vérifié hier soir que les problèmes avec CF ou Google dans la soirée n'étaient pas associés à France-IX.

Pour ce qui est des communications pendant les incidents, nous privilégions la ML France-IX et demandons aux membres d'utiliser ce canal pour un meilleur suivi. Nous allons également créer une page prochainement pour afficher les maintenances/incidents en cours. Ca permettra de donner de la visibilité également aux non-membres comme tu l'indiques.


++

Simon


---

Dear members,

You will find below a report concerning the issue encountered yesterday in the afternoon:

*12:20 (Paris time):* We started observing some unusual BUM traffic (Broadcast, Unknown Unicast, Multicast) on PoPs where BUM rate limiting is performed globally (not per interface): PA7, PAR1 and TH3 PoPs.

We tried to determine the origin of this flooded traffic, looking for loops, checking MAC addresses consistency on different PoPs. At this stage, our probe's network (a 10G probe per device) didn't raise any alert and there was no loss observed by probes. Nonetheless, we had some members complaining, indicating losses towards France-IX.

The sniffer's captures allowed us to determine that it was unknown unicast traffic from several sources to few destinations. BUM traffic reached 10 to 15Mbps. This traffic was observed even if MAC table entries were OK.

*Around 15:00 (Paris time) :* BUM traffic reached more than 50Mbps, causing additional impact, mainly on small and medium routers on customer side. We cleared some MAC address entries where we observed flooding, with no effect. As we didn't observe any abnormal behaviour on customer side we started clearing some MPLS/LSPs circuits and shutting down backbone links one by one to avoid to create additional impact. This allowed to isolate the problem, issue was located on PAR5 PoP, clearing MPLS circuits used between PAR5 and PAR1. During these operations, PAR1 PoP was isolated during 4 minutes between 16:00 and 16:04 in order to find the root cause.

We are in touch with the vendor to understand this behaviour and sharing logs to find the root cause. We will keep you informed as soon as we have more information.

---
Location: FranceIX Paris LAN

Incident start: 14th of January 2018, 12:21 (UTC+1, Paris Time)
Incident end: 14th of January 2018, 16:08 (UTC+1, Paris Time)

Customer impact: Some members observed packed loss during this period
---

We share with you the different works in progress to detect this kind of issues:

  - Specific alerts when BUM traffic threshold is reached on every PoP (*already done since yesterday*)   - Enhancement on QoS probes to be as close as possible to member configuration : BGP router configured on each probe and permanent traffic generated. This will be deployed in Q1-2019   - We plan to test 18.R1 firmware soon. This version enhances the way of processes and memory are managed in the platform. This will be tested during Q1-2019 and probably deployed during Q2-2019   - EVPN : For long term, we plan to activate EVPN, and BUM traffic will be better controlled   - Definition of a specific process to react quickly if the issue occurs again We apologize again for such issue. Sorry if you considered we didn't communicate enough during the incident, we communicated as soon as we had new information to provide


Le 14/01/2019 à 21:40, Raphael Mazelier a écrit :
On 14/01/2019 20:59, Radu-Adrian Feurdean wrote:

Presque certainement pas. Le traffic avait disparu aussi via Equinix-IX pour passer (apres une chute brutale) entierement sur du transit. Actuellement ca a l'air de preprendre un peu cote Equinix. Cote FranceIX, je sais pas (je fais du prepend), mais le "festival Akamai" a bien commence son episode de cette soiree (traffic qui bascule de PNI vers France-IX).


OK merci de la précision. Ce qui me faisait penser à ca c'était des reports de personne qui avait perdu 8.8.8.8 aussi. Sinon il y a quoi qui tabasse les CDNs en ce moment pour qu'ils doivent re-router ?

--
Raphael Mazelier



---------------------------
Liste de diffusion du FRnOG
http://www.frnog.org/

--
Simon Muyal
CTO
FranceIX
Tél: +33 (0)1 70 61 97 74
Mob: +33 (0)6 21 17 29 51


---------------------------
Liste de diffusion du FRnOG
http://www.frnog.org/

Répondre à