Re: [j-nsp] curious optic issue
On Tue, Jan 29, 2013 at 11:04:10AM +0200, Saku Ytti wrote: Fix is to remove+reinsert optic, or reload router. I've not yet tried 'test xfp 1 power off|on', but I'm guessing it'll help too. I can confirm that 'test xfp 1 power off|on' 'fixes' the issue as well. Are there any way to power-off SFP+ in MPC-3D-16XGE as well ? Looks like we have the same issue in one more location... :( -- In theory, there is no difference between theory and practice. But, in practice, there is. ___ juniper-nsp mailing list juniper-nsp@puck.nether.net https://puck.nether.net/mailman/listinfo/juniper-nsp
Re: [j-nsp] curious optic issue
On Wed, Apr 10, 2013 at 11:06 AM, Alexandre Snarskii s...@snar.spb.ruwrote: Are there any way to power-off SFP+ in MPC-3D-16XGE as well ? Looks like we have the same issue in one more location... :( Maybe you need the still uncommon low power transceiver =1.5 Watts just like in the other camp to power on all ports on high density 10Gig line cards. This is supposedly because of thermal dissipation limits. ___ juniper-nsp mailing list juniper-nsp@puck.nether.net https://puck.nether.net/mailman/listinfo/juniper-nsp
Re: [j-nsp] curious optic issue
On (2013-01-29 11:04 +0200), Saku Ytti wrote: I can also confirm that it triggers at certain time post-insertion, so preventive measures of doing controlled off/on earlier than it triggers is possible, while waiting to get optics swapped. Certain time is 249 days (MX80, MX80, C7600) samples. -- ++ytti ___ juniper-nsp mailing list juniper-nsp@puck.nether.net https://puck.nether.net/mailman/listinfo/juniper-nsp
Re: [j-nsp] curious optic issue
I've seen at least twice now issues on XFP where temperature starts to draw weird saw-tooth, like this http://ytti.fi/ddm2.png We've seen this too. Obviously I'm not running JNPR optic, but the two optics I've seen, have been from different vendors, who are not using same source. I think I've only seen it on 11.4R3 so far. All the optics we've seen this problem with have been bought from a specific local vendor. The vendor has agreed to replace the optics, and as far as I know the problem has not reoccurred after replacing the optics. The problem was *not* Juniper specific. The problem showed up as a module temperature reading that counted up to 127, wrapped around to -128 and continued counting up. In other words pretty clearly an 8 bit counter. With counter values outside normal range, Juniper will shutdown the port (sensible, in my optinion). Not all other vendors will do so. Fix is to remove+reinsert optic, or reload router. I've not yet tried 'test xfp 1 power off|on', but I'm guessing it'll help too. Anyone else seen this? My best guess is for some reason JNPR does something which causes the optic to do something which raises interrupt. And as it propagates to far-end, it should be something like maybe autonego? Maybe clock election? Which may cause both parties to police interrupts which might explain why ISIS on unrelated interfaces might timeout? Highly speculative explanation, but it's all I've got. I recommend you get those XFPs replaced. Steinar Haug, Nethelp consulting, sth...@nethelp.no ___ juniper-nsp mailing list juniper-nsp@puck.nether.net https://puck.nether.net/mailman/listinfo/juniper-nsp
Re: [j-nsp] curious optic issue
Thanks to Steinar, Alexandre and 3rd person in other media the problem is bit clearer now. I incorrectly stated I've seen this in other vendor too. Now when I double checked my data, I realized the location where I thought it affected other vendor I was looking at wrong end. The other location was http://ytti.fi/ddm3.png 7600 XENPAK (other end is JNPR and XFP from another vendor. Graph is coarser due to being older and averaged out more) Both the affected XENPAK and XFP were sold by same reseller. And I think all of these resellers have bought from Gigalight. I'm not trying to name and shame them, they are very common and resold by many companies, they have good customer service and weird problems could affect any vendor. Plus they are one of the most competitively priced option out there. This particular vendor has probably some software defected optics out there, it might be it only triggers after DDM counters have been polled X times. Likely the root cause has been fixed long ago, but of course in field affected optics still exists. Like Steinar said, my best bet is to get optics replaced from this particular reseller. And Alexandre's observation of it being temporal and periodically reoccuring is interesting. Obviously what ever the optic does, JNPR should not start to flap ISIS in unrelated interface, if it's seeing optic misbehaving it should just power down the optic. When this happened in CSCO, no other links suffered than the affected. [] At least now I can go to DC, plug with my JNPR-crash-tool and crash competitors routers with no traces in syslogs. If my hypothesis is true, you have to wait for about eight months before crash happens :) I'm sorry I wasn't clear, that crash issue was completely different issue. It crashes PFE immediately and always. JNPR has replicated it with these optics in their lab in sunnyvale. And it's because some optics don't respond to I2C polling in timely manner and JNPR does not handle this gracefully. -- ++ytti ___ juniper-nsp mailing list juniper-nsp@puck.nether.net https://puck.nether.net/mailman/listinfo/juniper-nsp
Re: [j-nsp] curious optic issue
On Mon, Jan 28, 2013 at 09:50:06AM +0200, Saku Ytti wrote: I've seen at least twice now issues on XFP where temperature starts to draw weird saw-tooth, like this http://ytti.fi/ddm2.png About the same issue here: http://snar.spb.ru/xe.png. Differences between our and your cases: - we observed this behaviour on SFP+ (MPC-3D-16XGE, MX960 and MX480, JunOS 11.4 and 10.4), not on XFP/MX80. - other ports were not affected by this temperature shift in our case. Similarities: - SFP+ removal/re-insertion is the easiest way to fix SFP+, MPC reboot helps too. - No JTAC case, third-party DWDM SFP+. Some months ago we observed even more interesting situation: seven SFP+ in DWDM trunk (10xSFP+) started freezing in about the same time. All freezing SFPs were from the same vendor, and one of non-freezing SFPs was from the same vendor, whith the only difference - this SFP was removed and re-inserted into trunk in summer, while other SFPs in trunk worked since end of the March So, now I'm waiting for May when this SFP should freeze too and prove my hypothesis: there is some monotonically increasing counter (not sure if it is on SFP or in JunOS code) that messes up with temperature readings. [] At least now I can go to DC, plug with my JNPR-crash-tool and crash competitors routers with no traces in syslogs. If my hypothesis is true, you have to wait for about eight months before crash happens :) -- In theory, there is no difference between theory and practice. But, in practice, there is. ___ juniper-nsp mailing list juniper-nsp@puck.nether.net https://puck.nether.net/mailman/listinfo/juniper-nsp
[j-nsp] curious optic issue
I've seen at least twice now issues on XFP where temperature starts to draw weird saw-tooth, like this http://ytti.fi/ddm2.png When this occurs it can be accompanied by messages such as: - tfeb0 MQchip 0 XE 0 Throttle: %PFE-4: Last 254 seconds have seen interrupt throttling at least once per second - MQchip 0 XE 0 Throttle: Last 10 seconds have seen interrupt throttling at least once per second When it does occur interface may flap 10 times an hour, 2 times a day or anything in between. The scary part is, all other local ISIS/LDP might flap _AND_ all far-end (JNPR) router ISIS/LDP might flap. Only interface seeing actual ifdown is the link with the affected optic, other interfaces just appear to stop sending ISIS hellos Obviously I'm not running JNPR optic, but the two optics I've seen, have been from different vendors, who are not using same source. I think I've only seen it on 11.4R3 so far. Fix is to remove+reinsert optic, or reload router. I've not yet tried 'test xfp 1 power off|on', but I'm guessing it'll help too. Anyone else seen this? My best guess is for some reason JNPR does something which causes the optic to do something which raises interrupt. And as it propagates to far-end, it should be something like maybe autonego? Maybe clock election? Which may cause both parties to police interrupts which might explain why ISIS on unrelated interfaces might timeout? Highly speculative explanation, but it's all I've got. I can't be arsed to open JTAC case, I tried with one batch of SFP which will crash PFE on every MX (due to I2C being too slow to answer), but JNPR wasn't interested in fixing that, as obviously it's not bug, since it does not happen on JNPR stickered optics. At least now I can go to DC, plug with my JNPR-crash-tool and crash competitors routers with no traces in syslogs. -- ++ytti ___ juniper-nsp mailing list juniper-nsp@puck.nether.net https://puck.nether.net/mailman/listinfo/juniper-nsp