Re: Arista Switches rebooting

2020-05-04 Thread Saku Ytti
Hey Javier,

> Has anyone had issues with Arista switches rebooting out of the blue, when 
> there isn't even a sufficient load on them to be a CPU or memory issue?
> We have a couple Arista 7280s both SR and CR that have had this behaviour, 
> this is the second time we see this issue and just wanted to see if this is 
> something anyone else is experiencing with this platfrom

You may not realise, but you are asking 'has anyone seen software
crash'.  I've seen software crash many times. You should contact
Arista support and provide them with relevant information so they can
help you with the issue.

-- 
  ++ytti


RE: Arista Switches rebooting

2020-05-04 Thread Javier Gutierrez Guerra
EOS 4.22.0.1F

But after contacting Support, the issue seems to be related to a ECC issue that 
causes CPU to reset, so a Aboot upgrade is required
Field Notice 0044 - 
Arista<https://www.arista.com/en/support/advisories-notices/fieldnotices/8756-field-notice-44>

Javier Gutierrez Guerra

From: Ariel Biener 
Sent: Monday, May 4, 2020 9:31 AM
To: Javier Gutierrez Guerra ; nanog@nanog.org
Subject: Re: Arista Switches rebooting


CAUTION: This email is from an external source. Do not click links or open 
attachments unless you recognize the sender and know the content is safe.
Eos version?



From: NANOG mailto:nanog-boun...@nanog.org>> on behalf 
of Javier Gutierrez Guerra 
mailto:guer...@westmancom.com>>
Sent: Monday, May 4, 2020 5:27 PM
To: nanog@nanog.org<mailto:nanog@nanog.org>
Subject: Arista Switches rebooting

Hi,
Has anyone had issues with Arista switches rebooting out of the blue, when 
there isn't even a sufficient load on them to be a CPU or memory issue?
We have a couple Arista 7280s both SR and CR that have had this behaviour, this 
is the second time we see this issue and just wanted to see if this is 
something anyone else is experiencing with this platfrom
Thanks,

Javier Gutierrez Guerra


RE: Arista Switches rebooting

2020-05-04 Thread Javier Gutierrez Guerra
Nope, basically, that this is a bug and developers are working on providing 
more debug data when this happens, for now is just unknown and could be caused 
by that ECC error that brakes the CPU



Javier Gutierrez Guerra

From: Ariel Biener 
Sent: Monday, May 4, 2020 9:31 AM
To: Javier Gutierrez Guerra ; nanog@nanog.org
Subject: Re: Arista Switches rebooting


CAUTION: This email is from an external source. Do not click links or open 
attachments unless you recognize the sender and know the content is safe.
Eos version?



From: NANOG mailto:nanog-boun...@nanog.org>> on behalf 
of Javier Gutierrez Guerra 
mailto:guer...@westmancom.com>>
Sent: Monday, May 4, 2020 5:27 PM
To: nanog@nanog.org<mailto:nanog@nanog.org>
Subject: Arista Switches rebooting

Hi,
Has anyone had issues with Arista switches rebooting out of the blue, when 
there isn't even a sufficient load on them to be a CPU or memory issue?
We have a couple Arista 7280s both SR and CR that have had this behaviour, this 
is the second time we see this issue and just wanted to see if this is 
something anyone else is experiencing with this platfrom
Thanks,

Javier Gutierrez Guerra


Re: Arista Switches rebooting

2020-05-04 Thread Matthew Petach
Just history repeating itself... ;)

https://www.cisco.com/c/en/us/support/docs/field-notices/200/fn25994.html

https://www.networkworld.com/article/3122864/cisco-says-router-bug-could-be-result-of-cosmic-radiation-seriously.html

As the process size in fabrication gets smaller and smaller, it takes less
and less energy hitting a device to cause spurious events like these.

"smaller, faster, cheaper" does come with a few trade-offs.   ^_^;;

Matt


On Mon, May 4, 2020, 08:32 Javier Gutierrez Guerra 
wrote:

> EOS 4.22.0.1F
>
>
>
> But after contacting Support, the issue seems to be related to a ECC issue
> that causes CPU to reset, so a Aboot upgrade is required
>
> Field Notice 0044 - Arista
> <https://www.arista.com/en/support/advisories-notices/fieldnotices/8756-field-notice-44>
>
>
>
> Javier Gutierrez Guerra
>
>
>
> *From:* Ariel Biener 
> *Sent:* Monday, May 4, 2020 9:31 AM
> *To:* Javier Gutierrez Guerra ; nanog@nanog.org
> *Subject:* Re: Arista Switches rebooting
>
>
>
> *CAUTION: *This email is from an external source. Do not click links or
> open attachments unless you recognize the sender and know the content is
> safe.
>
> Eos version?
>
>
> --
>
>
>
> *From:* NANOG  on behalf of Javier Gutierrez
> Guerra 
> *Sent:* Monday, May 4, 2020 5:27 PM
> *To:* nanog@nanog.org
> *Subject:* Arista Switches rebooting
>
>
>
> Hi,
> Has anyone had issues with Arista switches rebooting out of the blue, when
> there isn't even a sufficient load on them to be a CPU or memory issue?
> We have a couple Arista 7280s both SR and CR that have had this behaviour,
> this is the second time we see this issue and just wanted to see if this is
> something anyone else is experiencing with this platfrom
> Thanks,
>
> Javier Gutierrez Guerra
>


Re: Arista Switches rebooting

2020-05-04 Thread Saku Ytti
On Mon, 4 May 2020 at 21:57, Matthew Petach  wrote:

> As the process size in fabrication gets smaller and smaller, it takes less 
> and less energy hitting a device to cause spurious events like these.

Agreed. However in this case it was not a single-event upset (or
multi-event upset) but timing problem, fixable with new software on
the memory controller.  Single-event upsets should be benign with ECC
memories.

-- 
  ++ytti


Re: Arista Switches rebooting

2020-05-04 Thread Ethan O'Toole

Hi,
Has anyone had issues with Arista switches rebooting out of the blue, when 
there isn't even a sufficient load on them to be a CPU or memory issue?
We have a couple Arista 7280s both SR and CR that have had this behaviour, this 
is the second time we see this issue and just wanted to see if this is 
something anyone else is experiencing with this platfrom
Thanks,


We found a bug on the 64 port x 100gig model that if you insert a quad 
twinax 10gig fanout cable in many of the ports it will trigger a reboot.


Some select ports are okay and supported, but the ones that are not would 
trigger a reset. Issue was immediate to the cable being inserted. No idea 
if this was patched or not.


- Ethan



Re: Arista Switches rebooting

2020-05-04 Thread Bryan Fields
On 5/4/20 4:02 PM, Ethan O'Toole wrote:
> We found a bug on the 64 port x 100gig model that if you insert a quad 
> twinax 10gig fanout cable in many of the ports it will trigger a reboot.
> 
> Some select ports are okay and supported, but the ones that are not would 
> trigger a reset. Issue was immediate to the cable being inserted. No idea 
> if this was patched or not.

Did you contact the vendor and did they commit to a fix?  I can't imagine a
vendor not wanting to fix an easily reproducible bug such as this.

-- 
Bryan Fields

727-409-1194 - Voice
http://bryanfields.net


Re: Arista Switches rebooting

2020-05-04 Thread Ethan O'Toole

Did you contact the vendor and did they commit to a fix?  I can't imagine a
vendor not wanting to fix an easily reproducible bug such as this.


It was a while ago, and the vendor was aware of the issue. Arista had the 
info on which specific ports would accept the cable.


An interesting moment when all the activity LEDs stop flashing green, go 
Orange for a second across the board, then continue to stay dark for a 
while as the switch boots back up.


- Ethan O'Toole




Re: Arista Switches rebooting

2020-05-04 Thread Saku Ytti
On Mon, 4 May 2020 at 23:06, Ethan O'Toole  wrote:

> We found a bug on the 64 port x 100gig model that if you insert a quad
> twinax 10gig fanout cable in many of the ports it will trigger a reboot.I

I've seen a similar issue in another vendor, where specific SFP
inserted would reload the linecard. This was because the SFP didn't
answer fast enough to I2C queries and the polling code couldn't handle
the error so it crashed the whole linecard. Vendor didn't fix the
code, because it didn't happen on vendor optic, while obviously they
must have understood they can't guarantee vendor optic answers in a
timely manner in I2C.

I2C is a pretty terrible bus, particularly if you try to actually hang
everything off of a single I2C bus, single misbehaving speaker and you
might get your power supplies offline. Hopefully we'll move to I3C or
10SPE Ethernet soon. Or maybe some sort of I2C switch where every
connection is on its own bus.


-- 
  ++ytti


Re: Arista Switches rebooting

2020-05-05 Thread Vincent Bernat
 ❦  5 mai 2020 09:09 +03, Saku Ytti:

>> We found a bug on the 64 port x 100gig model that if you insert a quad
>> twinax 10gig fanout cable in many of the ports it will trigger a reboot.I
>
> I've seen a similar issue in another vendor, where specific SFP
> inserted would reload the linecard. This was because the SFP didn't
> answer fast enough to I2C queries and the polling code couldn't handle
> the error so it crashed the whole linecard. Vendor didn't fix the
> code, because it didn't happen on vendor optic, while obviously they
> must have understood they can't guarantee vendor optic answers in a
> timely manner in I2C.

We had a similar issue, but vendor fixed the issue (despite it only
happened with cheap third-party optics). If we talk about the same
vendor, it was fixed in 17.3R4, 17.4R3, 17.3R3-S4 and 18.1R3-S4. It's PR
1425893 (not public).
-- 
Man is the only animal that blushes -- or needs to.
-- Mark Twain


Re: Arista Switches rebooting

2020-05-05 Thread Brandon Martin

On 5/5/20 2:09 AM, Saku Ytti wrote:

I2C is a pretty terrible bus, particularly if you try to actually hang
everything off of a single I2C bus, single misbehaving speaker and you
might get your power supplies offline. Hopefully we'll move to I3C or
10SPE Ethernet soon. Or maybe some sort of I2C switch where every
connection is on its own bus.


I was of the impression that, due to addressing, I2C bus switches are 
already typically used between each transceiver port and the system bus 
(hopefully one of many) that connects the micro to the ports (hopefully 
relatively dedicated to that purpose).


That's certainly been the case in all of the switches I've torn down 
and, last time I checked at least the SFP specification, there wasn't 
much of a way around it since there was each transceiver uses the same, 
fixed addresses for ID and DDM.


But yes, I2C, while very useful, is the devil.  Clock stretching is 
particularly annoying along with the requisite use of open-drain drivers 
to accomplish it.


I was not aware of 10SPE, though...looks very useful (for lots of 
purposes).  Physical multi-drop on low-cost cabling is quite useful.

--
Brandon Martin


Re: Arista Switches rebooting

2020-05-05 Thread Chris via NANOG

Hi,

On 5/5/20 4:02 am, Ethan O'Toole wrote:
We found a bug on the 64 port x 100gig model that if you insert a quad 
twinax 10gig fanout cable in many of the ports it will trigger a reboot.


Timing, seems like there is a similar issue for Juniper QFX51110-48S 
devices, just saw PR 1499422:


This technical support bulletin(TSB) has been opened to inform Juniper 
network customers about PR1499422.  On the QFX5110-48S device inserting 
any QSFP-100G-transceiver on one or more network ports will cause FPC 
state to flap.  In the problem state FPC will go down as soon as the 
100G link comes up and FPC flap will be seen every 90 seconds.