I'd muse they pay enough that there's an agreement made to wear that.

Once it's blown over, it's just another outage blip in the past.

They do happen; no person nor network is infallible.

As Ben highlights though, Optus seems rough.

Luke

On 17/11/2023 10:31 am, Andrew Oakeley wrote:
And in the senate enquiry this morning they both blamed Cisco

"The trigger was the Singtel outage, but the root cause was Cisco."

https://www.abc.net.au/news/2023-11-17/asx-markets-business-live-news-optus-outage-senate-inquiry/103115518

-----Original Message-----
From: AusNOG <ausnog-boun...@lists.ausnog.net> On Behalf Of DaZZa
Sent: Friday, November 17, 2023 8:15 AM
To: Luke Thompson <luk...@tncrew.com.au>
Cc: michael.beth...@australiaonline.au; ausnog@lists.ausnog.net
Subject: Re: [AusNOG] Optus downtime chat + affecting SMS verification 
toTelstra?

And now Singtel have returned serve and are denying it was them.

https://www.zdnet.com/article/singtel-refutes-reports-that-its-system-upgrade-caused-optus-outage/

It's like watching kids trying to blame each other for who broke the window 
with the cricket ball.

D

On Wed, 15 Nov 2023 at 11:01, Luke Thompson <luk...@tncrew.com.au> wrote:
They've blamed Singtel Internet Exchange (STiX) for the international peering 
route updates, at least going by anonymous sources cited by SMH.

https://www.smh.com.au/technology/identity-of-third-party-who-brought-
down-optus-network-revealed-20231114-p5ejy1.html

Luke

On 14 November 2023 12:37:30 pm Ben Buxton <bb.aus...@bb.cactii.net> wrote:

Blaming routing updates from peers is a scapegoat and never is the cause of an 
outage - public BGP is the wild west and you're always getting broken 
information - it's your responsibility to filter those updates and (unless it's 
a zero-day poison packet bug) you only have yourself to blame if you fall over 
from them.

If I were an optus business customer, reading that outage page would just make 
me even more determined to move elsewhere.

They vaguely categorised the "what" of the outage into a big bucket (software upgrade 
related), but gave absolutely no useful information or explain the "why" which would 
regain my confidence.

Why did this upgrade trigger an outage?
   - Was there a behaviour/feature change they neglected to take into account?
   - Did the upgrade require a config change that broke?
   - Were they neglectful in following config best practices? (filtering, 
prefix limits, restarts, etc?)
   - Did the new software have an unidentified bug?
   - Why did testing not catch this problem (they do test changes...right?)
   - How did progressive rollout still lead to this impact? (they do
progressive rollouts over N days/weeks...right?)

Why did mitigation take so long?
   - What detection/telemetry measures led them to realise the scope of the 
outage? (news reports dont count)
   - Were they dependent on the downed network for oncall paging & comms?
   - Why did their rollback plan fail? (they had a rollback plan...right?)
   - Why was remote console/power access not working? (they have both...right?)
   - Were they dependent on the downed network for said access?
   - Were their playbooks/credential access dependent on the downed network?

"We have made changes to the network to address this issue so that it cannot occur 
again." ... this smells like "whoops forgot to set max-prefix (with restart!)".

Bugs, config stuff-ups, etc happen, and they will continue to happen - it is a 
lie to state that outages will never happen again. This is the culmination of 
monumental failures in the trigger, prevention and mitigation measures which 
cannot be fixed in a couple of days, it sounds like much deeper architectural 
and organisational issues need addressing.

Many of the above failures are things that a young network will experience and 
learn from, but for Optus these should all be well planned for already.

I suspect any government investigation will simply add more bureaucracy and 
boxes to tick rather than effect meaningful change, but one can always be 
hopeful...

BB

On Tue, 14 Nov 2023 at 13:02, Michael Bethune <m...@ozonline.com.au> wrote:
"Optus network received changes to routing information from an
international peering network following a software upgrade"

I note they are very careful to avoid nominating whose software upgrade.

I also note that when they say they received routing updates, don't
they limit the number of prefixes accepted by their BGP from any
given peer?

Sounds like a carefully crafted statement to enable them to point
fingers elsewhere, not unexpected.

- Michael.

Quoting francisfi...@mailup.net:

Looks like it was a software upgrade:
https://www.abc.net.au/news/2023-11-13/optus-identifies-cause-of-n
ationwide-outage-software-upgrade/103099902

Nothing in their media centre, just appears as a new box on their
outage response page:
https://www.optus.com.au/notices/outage-response

Cheers

----
Text:

"We have been working to understand what caused the outage on
Wednesday, and we now know what the cause was and have taken steps
to ensure it will not happen again.  We apologise sincerely for
letting our customers down and the inconvenience it caused.

At around 4.05am Wednesday morning, the Optus network received
changes to routing information from an international peering
network  following a software upgrade. These routing information
changes propagated through multiple layers in our network and
exceeded preset safety levels on key routers. This resulted in
those routers disconnecting from the Optus IP Core network to protect 
themselves.

The restoration required a large-scale effort of the team and in
some cases required Optus to reconnect or reboot routers
physically,  requiring the dispatch of people across a number of
sites in Australia. This is why restoration was progressive over the afternoon.

Given the widespread impact of the outage, our investigations into
the issue took longer than we would have liked as we examined
several different paths to restoration. The restoration of the
network was at all times our priority and we subsequently
established the cause working together with our partners. We have
made changes to the network to address this issue so that it
cannot occur again.

We are committed to learning from what has occurred and continuing
to work with our international vendors and partners to increase
the resilience of our network. We will also support and fully
cooperate with the reviews being undertaken by the Government and the Senate.

We continue to invest heavily to improve the resiliency of our
network and services."

--

   francisfi...@mailup.net

On Thu, Nov 9, 2023, at 07:15, DaZZa wrote:
I have all three you're asking about.

But I'm very small potatoes compared to most of the members of
this list, and my required remote footprint is correspondingly
small, so it's easy to maintain.

D

On Thu, 9 Nov 2023 at 06:18, Phillip Grasso
<phillip.gra...@gmail.com> wrote:
I mean come on, it's nearly 2024 and a [major] telco does not
have remote console access?

If we send a poll out to this community, how many would be able
to  genuinely honestly answer:

Do you have a console or appropriate control plane access into
all  your critical infrastructure?
Do you have independant out of band that does not share any
infrastructure with your current system(s) - with exemption for
physical location and power.
Do you have the ability to remote power control your devices?

We know from the facebook outage in 2021 that they probably
didn't  have the above, so its not entirely uncommon for folks
to have *proper independant* console and remote access.


I empathize with the Optus team and their customers who have
been negatively impacted by this incident. I sincerely hope that
some positive outcomes can emerge from this situation, including:

- Attention to critical infrastructure resilience
- BGP clue increases
- Incident management improves
(I'm sure there's more).

Network is a black box to most people and I think a large chunk
of  Australia now knows what it feels like to not have it.


On Wed, 8 Nov 2023 at 11:06, Ben Buxton <bb.aus...@bb.cactii.net> wrote:


On Wed, 8 Nov 2023 at 10:14, DaZZa <dazzagi...@gmail.com> wrote:
Yeah, I'd be willing to bet that it's a change which wasn't
thoroughly tested before being rolled out, and which had an
inadequate backout plan.

Also, "Our on-site technician is actively prioritising
establishing a console connection.".

I mean come on, it's nearly 2024 and a [major] telco does not
have remote console access? Whilst I'm looking forward to
enthusiastically reading the PM, I'll have to book a physio
appointment in advance due to neck strain from all the head
shaking it'll likely induce.

BB



Interestingly, my Optus mobile actually had a valid connection
for a short time - wasn't able to actually DO anything, but
was connected to the OPtus network - but it's now gone to "SOS" mode.

D

On Wed, 8 Nov 2023 at 10:01, John Edwards <jaedwa...@gmail.com> wrote:
The 4am Wednesday morning outage start looks suspiciously
like
  a firmware upgrade window.
I note that Optus devices where I am are showing "SoS" which
indicates the tower is unable to reach the location register,
which presumably is on a private network and indicative of a
pretty major fault rather than just IP.
John


On Wed, 8 Nov 2023 at 09:10, DaZZa <dazzagi...@gmail.com> wrote:
The Optus hamster finally died of old age.

I would suggest your SMS issues would be caused by whoever
is issuing the SMS using Optus - not so much by the Telstra end receiving it.

Anecdotally, Optus enterprise/wholesale appears to be still
functional
- at least my link appears to be working fine - and my BGP
advertisements are still being seen overseas - seems to be
only NBN and mobile based services which are busted

D

On Wed, 8 Nov 2023 at 09:27, <francisfi...@mailup.net> wrote:
Morning all,
Hope the chaos isn't too hard on your work/family.
I have had trouble with a couple of SMS verifications
coming through to me, my Telstra number. Is this related?
Any general banter around the downtime would be fine too
-
looks like it all began at 4.07am AEDT?
Cheers

--

   francisfi...@mailup.net
_______________________________________________
AusNOG mailing list
AusNOG@lists.ausnog.net
https://lists.ausnog.net/mailman/listinfo/ausnog


--
veg·e·tar·i·an:
Ancient tribal slang for the village idiot who can't hunt,
fish or ride
_______________________________________________
AusNOG mailing list
AusNOG@lists.ausnog.net
https://lists.ausnog.net/mailman/listinfo/ausnog


--
veg·e·tar·i·an:
Ancient tribal slang for the village idiot who can't hunt,
fish or ride _______________________________________________
AusNOG mailing list
AusNOG@lists.ausnog.net
https://lists.ausnog.net/mailman/listinfo/ausnog
_______________________________________________
AusNOG mailing list
AusNOG@lists.ausnog.net
https://lists.ausnog.net/mailman/listinfo/ausnog


--
veg·e·tar·i·an:
Ancient tribal slang for the village idiot who can't hunt, fish
or ride _______________________________________________
AusNOG mailing list
AusNOG@lists.ausnog.net
https://lists.ausnog.net/mailman/listinfo/ausnog
_______________________________________________
AusNOG mailing list
AusNOG@lists.ausnog.net
https://lists.ausnog.net/mailman/listinfo/ausnog




_______________________________________________
AusNOG mailing list
AusNOG@lists.ausnog.net
https://lists.ausnog.net/mailman/listinfo/ausnog
_______________________________________________
AusNOG mailing list
AusNOG@lists.ausnog.net
https://lists.ausnog.net/mailman/listinfo/ausnog

_______________________________________________
AusNOG mailing list
AusNOG@lists.ausnog.net
https://lists.ausnog.net/mailman/listinfo/ausnog


--
veg·e·tar·i·an:
Ancient tribal slang for the village idiot who can't hunt, fish or ride 
_______________________________________________
AusNOG mailing list
AusNOG@lists.ausnog.net
https://lists.ausnog.net/mailman/listinfo/ausnog
_______________________________________________
AusNOG mailing list
AusNOG@lists.ausnog.net
https://lists.ausnog.net/mailman/listinfo/ausnog
_______________________________________________
AusNOG mailing list
AusNOG@lists.ausnog.net
https://lists.ausnog.net/mailman/listinfo/ausnog

Reply via email to