Re: [j-nsp] curious optic issue

2013-04-10 Thread Alexandre Snarskii
On Tue, Jan 29, 2013 at 11:04:10AM +0200, Saku Ytti wrote:
  Fix is to remove+reinsert optic, or reload router. I've not yet tried
  'test xfp 1 power off|on', but I'm guessing it'll help too.
 
 I can confirm that 'test xfp 1 power off|on' 'fixes' the issue as well. 

Are there any way to power-off SFP+ in MPC-3D-16XGE as well ? 
Looks like we have the same issue in one more location... :(

-- 
In theory, there is no difference between theory and practice. 
But, in practice, there is. 

___
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp


Re: [j-nsp] curious optic issue

2013-04-10 Thread Sidney Boumendil
On Wed, Apr 10, 2013 at 11:06 AM, Alexandre Snarskii s...@snar.spb.ruwrote:

 Are there any way to power-off SFP+ in MPC-3D-16XGE as well ?
 Looks like we have the same issue in one more location... :(


Maybe you need the still uncommon low power transceiver =1.5 Watts just
like in the other camp to power on all ports on high density 10Gig line
cards. This is supposedly because of thermal dissipation limits.
___
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp


Re: [j-nsp] curious optic issue

2013-01-29 Thread Saku Ytti
On (2013-01-29 11:04 +0200), Saku Ytti wrote:
 
 I can also confirm that it triggers at certain time post-insertion, so
 preventive measures of doing controlled off/on earlier than it triggers is
 possible, while waiting to get optics swapped.

Certain time is 249 days (MX80, MX80, C7600) samples.

-- 
  ++ytti
___
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp


Re: [j-nsp] curious optic issue

2013-01-28 Thread sthaug
 I've seen at least twice now issues on XFP where temperature starts to draw
 weird saw-tooth, like this http://ytti.fi/ddm2.png

We've seen this too.

 Obviously I'm not running JNPR optic, but the two optics I've seen, have
 been from different vendors, who are not using same source. I think I've
 only seen it on 11.4R3 so far.

All the optics we've seen this problem with have been bought from a
specific local vendor. The vendor has agreed to replace the optics,
and as far as I know the problem has not reoccurred after replacing
the optics.

The problem was *not* Juniper specific. The problem showed up as a
module temperature reading that counted up to 127, wrapped around to
-128 and continued counting up. In other words pretty clearly an 8
bit counter. With counter values outside normal range, Juniper will
shutdown the port (sensible, in my optinion). Not all other vendors
will do so.

 Fix is to remove+reinsert optic, or reload router. I've not yet tried
 'test xfp 1 power off|on', but I'm guessing it'll help too.
 
 Anyone else seen this? My best guess is for some reason JNPR does something
 which causes the optic to do something which raises interrupt. And as it
 propagates to far-end, it should be something like maybe autonego? Maybe
 clock election? Which may cause both parties to police interrupts which
 might explain why ISIS on unrelated interfaces might timeout?  Highly
 speculative explanation, but it's all I've got.

I recommend you get those XFPs replaced.

Steinar Haug, Nethelp consulting, sth...@nethelp.no
___
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp


Re: [j-nsp] curious optic issue

2013-01-28 Thread Saku Ytti
Thanks to Steinar, Alexandre and 3rd person in other media the problem is
bit clearer now. 

I incorrectly stated I've seen this in other vendor too. Now when I double
checked my data, I realized the location where I thought it affected other
vendor I was looking at wrong end.
The other location was http://ytti.fi/ddm3.png 7600 XENPAK (other end is
JNPR and XFP from another vendor. Graph is coarser due to being older and
averaged out more)

Both the affected XENPAK and XFP were sold by same reseller. And I think
all of these resellers have bought from Gigalight. I'm not trying to name
and shame them, they are very common and resold by many companies, they
have good customer service and weird problems could affect any vendor. Plus
they are one of the most competitively priced option out there.
This particular vendor has probably some software defected optics out
there, it might be it only triggers after DDM counters have been polled X
times. Likely the root cause has been fixed long ago, but of course in
field affected optics still exists.

Like Steinar said, my best bet is to get optics replaced from this
particular reseller. And Alexandre's observation of it being temporal and
periodically reoccuring is interesting.

Obviously what ever the optic does, JNPR should not start to flap ISIS in
unrelated interface, if it's seeing optic misbehaving it should just power
down the optic. When this happened in CSCO, no other links suffered than
the affected.

 []
  At least now I can go to DC, plug with my JNPR-crash-tool and crash
  competitors routers with no traces in syslogs.
 
 If my hypothesis is true, you have to wait for about eight months
 before crash happens :) 

I'm sorry I wasn't clear, that crash issue was completely different issue.
It crashes PFE immediately and always. JNPR has replicated it with these
optics in their lab in sunnyvale. And it's because some optics don't
respond to I2C polling in timely manner and JNPR does not handle this
gracefully.

-- 
  ++ytti
___
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp


Re: [j-nsp] curious optic issue

2013-01-28 Thread Alexandre Snarskii
On Mon, Jan 28, 2013 at 09:50:06AM +0200, Saku Ytti wrote:
 I've seen at least twice now issues on XFP where temperature starts to draw
 weird saw-tooth, like this http://ytti.fi/ddm2.png

About the same issue here: http://snar.spb.ru/xe.png. 

Differences between our and your cases: 
- we observed this behaviour on SFP+ (MPC-3D-16XGE, MX960 and MX480,
JunOS 11.4 and 10.4), not on XFP/MX80.
- other ports were not affected by this temperature shift in our case.
Similarities: 
- SFP+ removal/re-insertion is the easiest way to fix SFP+, MPC reboot 
helps too.
- No JTAC case, third-party DWDM SFP+. 

Some months ago we observed even more interesting situation: seven SFP+ 
in DWDM trunk (10xSFP+) started freezing in about the same time. 
All freezing SFPs were from the same vendor, and one of non-freezing 
SFPs was from the same vendor, whith the only difference - this SFP was 
removed and re-inserted into trunk in summer, while other SFPs in trunk 
worked since end of the March 
So, now I'm waiting for May when this SFP should freeze too and prove 
my hypothesis: there is some monotonically increasing counter (not sure 
if it is on SFP or in JunOS code) that messes up with temperature 
readings.

[]
 At least now I can go to DC, plug with my JNPR-crash-tool and crash
 competitors routers with no traces in syslogs.

If my hypothesis is true, you have to wait for about eight months
before crash happens :) 

-- 
In theory, there is no difference between theory and practice. 
But, in practice, there is. 

___
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp


[j-nsp] curious optic issue

2013-01-27 Thread Saku Ytti
I've seen at least twice now issues on XFP where temperature starts to draw
weird saw-tooth, like this http://ytti.fi/ddm2.png

When this occurs it can be accompanied by messages such as:

- tfeb0 MQchip 0 XE 0 Throttle: %PFE-4: Last 254 seconds have seen interrupt 
throttling at least once per second
- MQchip 0 XE 0 Throttle: Last 10 seconds have seen interrupt throttling at 
least once per second

When it does occur interface may flap 10 times an hour, 2 times a day or
anything in between.

The scary part is, all other local ISIS/LDP might flap _AND_ all far-end
(JNPR) router ISIS/LDP might flap.
Only interface seeing actual ifdown is the link with the affected optic,
other interfaces just appear to stop sending ISIS hellos

Obviously I'm not running JNPR optic, but the two optics I've seen, have
been from different vendors, who are not using same source. I think I've
only seen it on 11.4R3 so far.

Fix is to remove+reinsert optic, or reload router. I've not yet tried
'test xfp 1 power off|on', but I'm guessing it'll help too.

Anyone else seen this? My best guess is for some reason JNPR does something
which causes the optic to do something which raises interrupt. And as it
propagates to far-end, it should be something like maybe autonego? Maybe
clock election? Which may cause both parties to police interrupts which
might explain why ISIS on unrelated interfaces might timeout?  Highly
speculative explanation, but it's all I've got.


I can't be arsed to open JTAC case, I tried with one batch of SFP which
will crash PFE on every MX (due to I2C being too slow to answer), but JNPR
wasn't interested in fixing that, as obviously it's not bug, since it does
not happen on JNPR stickered optics.
At least now I can go to DC, plug with my JNPR-crash-tool and crash
competitors routers with no traces in syslogs.

-- 
  ++ytti
___
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp