When I was getting these and the Cisco far end was getting tons of errors, the light levels were good all around. It ended up being a fiber problem near the transmitter. Try shooting the fiber link with an OTDR to see if you are getting lots of reflections.
On Fri, Oct 21, 2016 at 12:23:18PM -0700, Michael Loftis wrote: > Was hoping someone who knew more could chime in...but it's measured in > seconds basically because the PCS (physical coding sublayer) does NOT > keep detailed statistics...so the "Seconds" value means there were X > distinct seconds in which an error was flagged in that category...the > previous response detailing bit vs errored blocks I think is wrong. > The PCS layer can repair single bit errors, thus a second with one or > more single bit (but correctable!) errors is a "bit errored second" - > if it is unabled to correct and recover a valid PCS block then you get > the "errored block" seconds... > > It's not a raw count of the number of those errors, just that it > occurred in a ~1s window X times. You can totally get PCS errors > unplugging an optic or otherwise shutting down the remote end. You > can totally get spurious PCS errors from a marginal ish link that > shows PLENTY of light (SNR is low or a marginal cable). in MX > specifically it *can* in very rare circumstances indicate a problem > even between the optic and the MIC....most of the time my suggestion > for PCS errors is clear counters and check in 1h and 24h. If you get > a significant number of errored seconds in a 24h period then > check/clean ends and patches, maybe replace optics. > > Also beware, lots of DOM bugs in various JunOS releases cause the DOM > values to get stuck, and it can be hard or impossible to check in a > non outage causing way (sometimes you can safely bend the patch cable > and observe the increase in loss to verify your DOM values aren't > stuck) - I've had this most commonly in the past on DPC cards but have > also observed it in MPC cards. The DOM data is also highly dependent > upon the optic itself and there's a LOT of buggy stuff out there so > it's not all juniper's fault there. > > > On Fri, Oct 21, 2016 at 11:07 AM, David B Funk > <dbf...@engineering.uiowa.edu> wrote: > > Thanks guys but this isn't what I was asking. > > > > The optical power is similar (within a few tenths of a dBm) at my end, down > > by 3 dBm at the far end of the link that is having issues (-6.23 dBm as > > opposed to -3.73 dBm) but not enough to explain what I'm seeing. > > > > The big question I have is: What does "30 Seconds" mean for an attribute > > that by description of the docs is supposed to be number of PCS blocks with > > invalid Sync headers? > > Particularly when the guy on the Cisco at the other end says his error > > counters are going up like crazy (and packets are being dropped) while the > > stats my end stays constant at "30 Seconds". > > What does that mean? > > > > The particularly frustrating thing is that data streams are dropping packets > > (EG iperf3 showing retries and seriously degraded performance) but none of > > the interface stats are showing any values that indicate an issue other than > > that "30 Seconds". > > > > Can anybody tell me what "30 Seconds" means (in the context of an error > > counter)? > > > > > > > > > > On Fri, 21 Oct 2016, Christopher Costa wrote: > > > >> Here's my notes from a jtac review about these a couple years ago: > >> > >> > >> > >> [pcs] encoding is continually transmitting to keep the line in sync. The > >> PCS layer is directly below the MAC layer so for MX, > >> it’s on the MIC. PCS errors can be caused by anything MIC or lower, i.e. > >> transceiver, fiber, line equipment, etc. > >> > >> > >> > >> PCS functionality: > >> =================== > >> IEEE 802.3ae 10GbE interfaces use a 64B/66B encoder/decoder in the > >> PHY-PCS (Physical Coding Sub layer) to allow reasonable > >> clock recovery and facilitate alignment of the data stream at the > >> receiver. > >> As the scheme name suggests, 64 bits of data on the MAC layer are > >> transmitted as a 66-bit code block on the PHY layer, which > >> realizes easier clock/timing synchronization. A 66-bit code block contains > >> a 2-bit Sync. Header + 8 octets data/control field. > >> If the Sync. header is '01', the 8 octets are entirely data. > >> If the Sync. header is '10', an 8-bit Type field follows, plus 56 bits of > >> data/control field. > >> The 8 octets data/control field is scrambled by using a self-synchronous > >> scrambler to achieve complete DC-balance on the > >> serial line. > >> PCS statistics displays PCS fault conditions by checking valid Sync. > >> headers received with every 66 bits interval, so that we > >> can monitor 10Gbps high speed transmission line quality. > >> If the 64B/66B receiver does not detect the 2-bit Sync. > >> Header with regular 66-bit interval and it estimates the high BER (Bit > >> Error Rate of >10^-4), PCS statistics will report a > >> problem. > >> PCS statistics : > >> ================ > >> - "Bit errors" indicates the number of PCS blocks with invalid Sync > >> headers. > >> - "Errored blocks" indicates the number of PCS blocks with a valid Sync. > >> header but invalid block format. > >> > >> > >> On Fri, Oct 21, 2016 at 9:37 AM, Michael Carey <mca...@kinber.org> wrote: > >> David, > >> > >> When I've seen PCS statistical errors before, it pointed to either a > >> failing optic that needed replaced in our MX or a drastic change in > >> optical > >> light levels caused by an OSP fiber issue. How do your "show > >> interface > >> diagnostic optic" levels look? > >> > >> On Wed, Oct 19, 2016 at 7:40 PM, David B Funk > >> <dbf...@engineering.uiowa.edu> > >> wrote: > >> > >> > I've got a couple of 10Gig-eth interfaces (xe- on MX480) of which > >> I'm > >> > trying to interpret the "PCS statistics" values. > >> > > >> > One of them is pretty steady at: > >> > > >> > PCS statistics Seconds > >> > Bit errors 4 > >> > Errored blocks 4 > >> > > >> > The other one seems to vary with the values ranging from 10 to 70. > >> > EG: > >> > > >> > PCS statistics Seconds > >> > Bit errors 61 > >> > Errored blocks 69 > >> > > >> > The second interface will will trigger a number of error > >> conditions at the > >> > other end which terminates in a Cisco router with out showing any > >> error > >> > conditions at my end (EG BPDU Error: None, MAC-REWRITE Error: > >> None, > >> > CRC/Align errors 0, FIFO errors 0, etc..) During some of these > >> times I'll > >> > see significant packet loss and others see minimal problems. > >> > > >> > According to Juniper docs the PCS statistics should mean: > >> > > >> > PCS statistics > >> > (10-Gigabit Ethernet interfaces) Displays Physical Coding > >> Sublayer (PCS) > >> > fault > >> > conditions from the WAN PHY or the LAN PHY device. > >> > > >> > Bit errors—High bit error rate. Indicates the number of bit > >> errors > >> > when the > >> > PCS receiver is operating in normal mode. > >> > Errored blocks—Loss of block lock. The number of errored > >> blocks when > >> > PCS > >> > receiver is operating in normal mode. > >> > > >> > But I don't know how to interpret a value of "16 seconds" with > >> that > >> > definition. > >> > Can anybody shed some light on what those numbers mean. > >> > > >> > Thanks. _______________________________________________ juniper-nsp mailing list juniper-nsp@puck.nether.net https://puck.nether.net/mailman/listinfo/juniper-nsp