And now Singtel have returned serve and are denying it was them. https://www.zdnet.com/article/singtel-refutes-reports-that-its-system-upgrade-caused-optus-outage/
It's like watching kids trying to blame each other for who broke the window with the cricket ball. D On Wed, 15 Nov 2023 at 11:01, Luke Thompson <luk...@tncrew.com.au> wrote: > > They've blamed Singtel Internet Exchange (STiX) for the international peering > route updates, at least going by anonymous sources cited by SMH. > > https://www.smh.com.au/technology/identity-of-third-party-who-brought-down-optus-network-revealed-20231114-p5ejy1.html > > Luke > > On 14 November 2023 12:37:30 pm Ben Buxton <bb.aus...@bb.cactii.net> wrote: >> >> >> Blaming routing updates from peers is a scapegoat and never is the cause of >> an outage - public BGP is the wild west and you're always getting broken >> information - it's your responsibility to filter those updates and (unless >> it's a zero-day poison packet bug) you only have yourself to blame if you >> fall over from them. >> >> If I were an optus business customer, reading that outage page would just >> make me even more determined to move elsewhere. >> >> They vaguely categorised the "what" of the outage into a big bucket >> (software upgrade related), but gave absolutely no useful information or >> explain the "why" which would regain my confidence. >> >> Why did this upgrade trigger an outage? >> - Was there a behaviour/feature change they neglected to take into account? >> - Did the upgrade require a config change that broke? >> - Were they neglectful in following config best practices? (filtering, >> prefix limits, restarts, etc?) >> - Did the new software have an unidentified bug? >> - Why did testing not catch this problem (they do test changes...right?) >> - How did progressive rollout still lead to this impact? (they do >> progressive rollouts over N days/weeks...right?) >> >> Why did mitigation take so long? >> - What detection/telemetry measures led them to realise the scope of the >> outage? (news reports dont count) >> - Were they dependent on the downed network for oncall paging & comms? >> - Why did their rollback plan fail? (they had a rollback plan...right?) >> - Why was remote console/power access not working? (they have >> both...right?) >> - Were they dependent on the downed network for said access? >> - Were their playbooks/credential access dependent on the downed network? >> >> "We have made changes to the network to address this issue so that it cannot >> occur again." ... this smells like "whoops forgot to set max-prefix (with >> restart!)". >> >> Bugs, config stuff-ups, etc happen, and they will continue to happen - it is >> a lie to state that outages will never happen again. This is the culmination >> of monumental failures in the trigger, prevention and mitigation measures >> which cannot be fixed in a couple of days, it sounds like much deeper >> architectural and organisational issues need addressing. >> >> Many of the above failures are things that a young network will experience >> and learn from, but for Optus these should all be well planned for already. >> >> I suspect any government investigation will simply add more bureaucracy and >> boxes to tick rather than effect meaningful change, but one can always be >> hopeful... >> >> BB >> >> On Tue, 14 Nov 2023 at 13:02, Michael Bethune <m...@ozonline.com.au> wrote: >>> >>> "Optus network received changes to routing information from an >>> international peering network following a software upgrade" >>> >>> I note they are very careful to avoid nominating whose software upgrade. >>> >>> I also note that when they say they received routing updates, >>> don't they limit the number of prefixes accepted by their BGP from >>> any given peer? >>> >>> Sounds like a carefully crafted statement to enable them to point fingers >>> elsewhere, not unexpected. >>> >>> - Michael. >>> >>> Quoting francisfi...@mailup.net: >>> >>> > Looks like it was a software upgrade: >>> > https://www.abc.net.au/news/2023-11-13/optus-identifies-cause-of-nationwide-outage-software-upgrade/103099902 >>> > >>> > Nothing in their media centre, just appears as a new box on their >>> > outage response page: https://www.optus.com.au/notices/outage-response >>> > >>> > Cheers >>> > >>> > ---- >>> > Text: >>> > >>> > "We have been working to understand what caused the outage on >>> > Wednesday, and we now know what the cause was and have taken steps >>> > to ensure it will not happen again. We apologise sincerely for >>> > letting our customers down and the inconvenience it caused. >>> > >>> > At around 4.05am Wednesday morning, the Optus network received >>> > changes to routing information from an international peering network >>> > following a software upgrade. These routing information changes >>> > propagated through multiple layers in our network and exceeded >>> > preset safety levels on key routers. This resulted in those routers >>> > disconnecting from the Optus IP Core network to protect themselves. >>> > >>> > The restoration required a large-scale effort of the team and in >>> > some cases required Optus to reconnect or reboot routers physically, >>> > requiring the dispatch of people across a number of sites in >>> > Australia. This is why restoration was progressive over the afternoon. >>> > >>> > Given the widespread impact of the outage, our investigations into >>> > the issue took longer than we would have liked as we examined >>> > several different paths to restoration. The restoration of the >>> > network was at all times our priority and we subsequently >>> > established the cause working together with our partners. We have >>> > made changes to the network to address this issue so that it cannot >>> > occur again. >>> > >>> > We are committed to learning from what has occurred and continuing >>> > to work with our international vendors and partners to increase the >>> > resilience of our network. We will also support and fully cooperate >>> > with the reviews being undertaken by the Government and the Senate. >>> > >>> > We continue to invest heavily to improve the resiliency of our >>> > network and services." >>> > >>> > -- >>> > >>> > francisfi...@mailup.net >>> > >>> > On Thu, Nov 9, 2023, at 07:15, DaZZa wrote: >>> >> I have all three you're asking about. >>> >> >>> >> But I'm very small potatoes compared to most of the members of this >>> >> list, and my required remote footprint is correspondingly small, so >>> >> it's easy to maintain. >>> >> >>> >> D >>> >> >>> >> On Thu, 9 Nov 2023 at 06:18, Phillip Grasso >>> >> <phillip.gra...@gmail.com> wrote: >>> >>>> >>> >>>> I mean come on, it's nearly 2024 and a [major] telco does not >>> >>>> have remote console access? >>> >>> >>> >>> >>> >>> If we send a poll out to this community, how many would be able to >>> >>> genuinely honestly answer: >>> >>> >>> >>> Do you have a console or appropriate control plane access into all >>> >>> your critical infrastructure? >>> >>> Do you have independant out of band that does not share any >>> >>> infrastructure with your current system(s) - with exemption for >>> >>> physical location and power. >>> >>> Do you have the ability to remote power control your devices? >>> >>> >>> >>> We know from the facebook outage in 2021 that they probably didn't >>> >>> have the above, so its not entirely uncommon for folks to have >>> >>> *proper independant* console and remote access. >>> >>> >>> >>> >>> >>> I empathize with the Optus team and their customers who have been >>> >>> negatively impacted by this incident. I sincerely hope that some >>> >>> positive outcomes can emerge from this situation, including: >>> >>> >>> >>> - Attention to critical infrastructure resilience >>> >>> - BGP clue increases >>> >>> - Incident management improves >>> >>> (I'm sure there's more). >>> >>> >>> >>> Network is a black box to most people and I think a large chunk of >>> >>> Australia now knows what it feels like to not have it. >>> >>> >>> >>> >>> >>> On Wed, 8 Nov 2023 at 11:06, Ben Buxton <bb.aus...@bb.cactii.net> wrote: >>> >>>> >>> >>>> >>> >>>> >>> >>>> On Wed, 8 Nov 2023 at 10:14, DaZZa <dazzagi...@gmail.com> wrote: >>> >>>>> >>> >>>>> Yeah, I'd be willing to bet that it's a change which wasn't thoroughly >>> >>>>> tested before being rolled out, and which had an inadequate backout >>> >>>>> plan. >>> >>>> >>> >>>> >>> >>>> Also, "Our on-site technician is actively prioritising >>> >>>> establishing a console connection.". >>> >>>> >>> >>>> I mean come on, it's nearly 2024 and a [major] telco does not >>> >>>> have remote console access? Whilst I'm >>> >>>> looking forward to enthusiastically reading the PM, I'll have to >>> >>>> book a physio appointment in advance due to >>> >>>> neck strain from all the head shaking it'll likely induce. >>> >>>> >>> >>>> BB >>> >>>> >>> >>>> >>> >>>>> >>> >>>>> >>> >>>>> Interestingly, my Optus mobile actually had a valid connection for a >>> >>>>> short time - wasn't able to actually DO anything, but was connected to >>> >>>>> the OPtus network - but it's now gone to "SOS" mode. >>> >>>>> >>> >>>>> D >>> >>>>> >>> >>>>> On Wed, 8 Nov 2023 at 10:01, John Edwards <jaedwa...@gmail.com> wrote: >>> >>>>> > >>> >>>>> > The 4am Wednesday morning outage start looks suspiciously like >>> >>>>> a firmware upgrade window. >>> >>>>> > >>> >>>>> > I note that Optus devices where I am are showing "SoS" which >>> >>>>> indicates the tower is unable to reach the location register, >>> >>>>> which presumably is on a private network and indicative of a >>> >>>>> pretty major fault rather than just IP. >>> >>>>> > >>> >>>>> > John >>> >>>>> > >>> >>>>> > >>> >>>>> > On Wed, 8 Nov 2023 at 09:10, DaZZa <dazzagi...@gmail.com> wrote: >>> >>>>> >> >>> >>>>> >> The Optus hamster finally died of old age. >>> >>>>> >> >>> >>>>> >> I would suggest your SMS issues would be caused by whoever is >>> >>>>> >> issuing >>> >>>>> >> the SMS using Optus - not so much by the Telstra end receiving it. >>> >>>>> >> >>> >>>>> >> Anecdotally, Optus enterprise/wholesale appears to be still >>> >>>>> >> functional >>> >>>>> >> - at least my link appears to be working fine - and my BGP >>> >>>>> >> advertisements are still being seen overseas - seems to be only NBN >>> >>>>> >> and mobile based services which are busted >>> >>>>> >> >>> >>>>> >> D >>> >>>>> >> >>> >>>>> >> On Wed, 8 Nov 2023 at 09:27, <francisfi...@mailup.net> wrote: >>> >>>>> >> > >>> >>>>> >> > Morning all, >>> >>>>> >> > Hope the chaos isn't too hard on your work/family. >>> >>>>> >> > I have had trouble with a couple of SMS verifications >>> >>>>> coming through to me, my Telstra number. Is this related? >>> >>>>> >> > >>> >>>>> >> > Any general banter around the downtime would be fine too - >>> >>>>> looks like it all began at 4.07am AEDT? >>> >>>>> >> > >>> >>>>> >> > Cheers >>> >>>>> >> > >>> >>>>> >> > -- >>> >>>>> >> > >>> >>>>> >> > francisfi...@mailup.net >>> >>>>> >> > _______________________________________________ >>> >>>>> >> > AusNOG mailing list >>> >>>>> >> > AusNOG@lists.ausnog.net >>> >>>>> >> > https://lists.ausnog.net/mailman/listinfo/ausnog >>> >>>>> >> >>> >>>>> >> >>> >>>>> >> >>> >>>>> >> -- >>> >>>>> >> veg·e·tar·i·an: >>> >>>>> >> Ancient tribal slang for the village idiot who can't hunt, >>> >>>>> fish or ride >>> >>>>> >> _______________________________________________ >>> >>>>> >> AusNOG mailing list >>> >>>>> >> AusNOG@lists.ausnog.net >>> >>>>> >> https://lists.ausnog.net/mailman/listinfo/ausnog >>> >>>>> >>> >>>>> >>> >>>>> >>> >>>>> -- >>> >>>>> veg·e·tar·i·an: >>> >>>>> Ancient tribal slang for the village idiot who can't hunt, fish or >>> >>>>> ride >>> >>>>> _______________________________________________ >>> >>>>> AusNOG mailing list >>> >>>>> AusNOG@lists.ausnog.net >>> >>>>> https://lists.ausnog.net/mailman/listinfo/ausnog >>> >>>> >>> >>>> _______________________________________________ >>> >>>> AusNOG mailing list >>> >>>> AusNOG@lists.ausnog.net >>> >>>> https://lists.ausnog.net/mailman/listinfo/ausnog >>> >> >>> >> >>> >> >>> >> -- >>> >> veg·e·tar·i·an: >>> >> Ancient tribal slang for the village idiot who can't hunt, fish or ride >>> >> _______________________________________________ >>> >> AusNOG mailing list >>> >> AusNOG@lists.ausnog.net >>> >> https://lists.ausnog.net/mailman/listinfo/ausnog >>> > _______________________________________________ >>> > AusNOG mailing list >>> > AusNOG@lists.ausnog.net >>> > https://lists.ausnog.net/mailman/listinfo/ausnog >>> > >>> >>> >>> >>> >>> _______________________________________________ >>> AusNOG mailing list >>> AusNOG@lists.ausnog.net >>> https://lists.ausnog.net/mailman/listinfo/ausnog >> >> _______________________________________________ >> AusNOG mailing list >> AusNOG@lists.ausnog.net >> https://lists.ausnog.net/mailman/listinfo/ausnog >> > > _______________________________________________ > AusNOG mailing list > AusNOG@lists.ausnog.net > https://lists.ausnog.net/mailman/listinfo/ausnog -- veg·e·tar·i·an: Ancient tribal slang for the village idiot who can't hunt, fish or ride _______________________________________________ AusNOG mailing list AusNOG@lists.ausnog.net https://lists.ausnog.net/mailman/listinfo/ausnog