Re: [j-nsp] MX punting packets to RE - why?

2016-02-03 Thread Saku Ytti
On 3 February 2016 at 19:09, Ross Halliday
 wrote:

> Oh dear, that sounds like quite the chore. I don't understand your reasoning 
> behind lowering the parameters so far from the defaults, though. 3000 
> pps/5000 packet burst is how the box ships. Or am I to read between the lines 
> re: "random recommendation"? lol

Lot of the DDoS-protection limits are 20kpps by default, which is more
than the NPU will even punt to the PFE CPU, so there will be
additional policer anyhow limiting more strictly. The defaults are
unfortunately not sane.
Only reason you'd need to punt multicast, is to fix your ingress
interface in the HW, so really 1 packet per group will do, anything
extra is just additional useless work for CPU.

> Maybe this is something I should talk with JTAC about at this point. I don't 
> want to slam the RE but I don't want to have such a massive cutout, either.

Absolutely, always good idea to engage vendor support.

> Oh, the redundancy definitely works, don't get me wrong. For some reason the 
> MX is deciding it has to resolve packets instead of just sending whatever 
> comes in with that VLAN tag into an l2circuit.

Reason is, the ingress interface of mcast stream changed, so the
multicast tree was incorrect.

> Internet multicast, as we have things now, would be an absolute nightmare. 
> But as far as unknown DoS vectors and other quirkiness, I compare it to IPv6 
> a few years ago. Everybody basically does it half-assed because nobody uses 
> it. The only applications we have for multicast are TV service delivery and 
> some timing protocols here and there.

I did quite few multicast setup for companies running CCTV, where CCTV
by default sends to multicast (But can be changed to send unicast). In
each of these configurations the CCTV only had single host join
(recorder). So multicast was just useless complexity with no
advantages, so I guess my failure as I was consistently unable to to
convince them to reconfigure the CCTV's for unicast.

-- 
  ++ytti
___
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp


Re: [j-nsp] MX punting packets to RE - why?

2016-02-03 Thread Ross Halliday
Hey again,

> No, something like this:
> edit system ddos-protection protocols resolve mcast-v4
> set bandwidth 100
> set burst 100
> set flow-level-bandwidth logical-interface 20
> set flow-level-detection subscriber off
> set flow-level-detection logical-interface on
> 
> So we allow on aggregate 100pps of mcast-v4 resolve, but only 20pps
> per IFL. So even if one IFL is misbehaving, another IFL's mcast-v4
> resolve works fine.
> 
> There are only 4k policers available in HW for ddos-protection, which
> makes me turn off subscriber detection, as it's easy for attacker to
> generate 'new subscriber' (like in BGP change SADDR => new subscriber)
> and congest those 4k policer slots.
> I make logical-interface on (instead of 'automatic', which means they
> are added dynamically if aggregate policer if out-of-contract, but
> with 'on' they are always there, this guarantees well-behaving IFL
> does not suffer or the duration software detects this and adds the IFL
> policers)
> 
> This is just random recommendation. And funny thing is, you'd need to
> make similar thing to _all_ protools available for 'ddos-protection',
> there are quite many of them, like maybe 100, so you'll get several
> thousand new lines of config just for this.
> I wish there was way to set default explicit values, but there isn't.

Oh dear, that sounds like quite the chore. I don't understand your reasoning 
behind lowering the parameters so far from the defaults, though. 3000 pps/5000 
packet burst is how the box ships. Or am I to read between the lines re: 
"random recommendation"? lol

Maybe this is something I should talk with JTAC about at this point. I don't 
want to slam the RE but I don't want to have such a massive cutout, either.

> Hmm, then the redundancy really should have worked, unless you're
> doing some IGMP Snooping or something in the switches (on by default)
> which might require convergence on multicast states on the L2 too.
> If you're rocking RSTP  or MST, and it's correctly configured (all
> non-l2-core ports _MUST_ be portfast, because MST will not unblock
> downstream ports, until there is explicit permission from all upstream
> ports, and if you don't have portfast on in 1 port which is not
> speaking MST, then this port will block whole MST convergence, as
> you're waiting for explicit permission from that port, which will
> never come).

Oh, the redundancy definitely works, don't get me wrong. For some reason the MX 
is deciding it has to resolve packets instead of just sending whatever comes in 
with that VLAN tag into an l2circuit.

> Yeah multicast is tricky subject

That's the understatement of the year!

> I dislike
> multicast, I believe there are probably lot of yet unknown DoS vectors
> in it, and I would never run internet multicast. But for well
> controlled internal applications, it may sometimes be least bad
> solution.

Internet multicast, as we have things now, would be an absolute nightmare. But 
as far as unknown DoS vectors and other quirkiness, I compare it to IPv6 a few 
years ago. Everybody basically does it half-assed because nobody uses it. The 
only applications we have for multicast are TV service delivery and some timing 
protocols here and there.

> But getting help on the setup probably is huge chore, just getting
> external person to understand the network takes time.

Yes, that's for sure. Thank you very much for all of your feedback and effort 
into trying to understand what's going on here. I'm going to see if our CSE is 
subscribed to this list and ask him to take a peek.

Thanks again.

Ross
___
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp


Re: [j-nsp] MX punting packets to RE - why?

2016-02-03 Thread Saku Ytti
On 3 February 2016 at 02:18, Ross Halliday
 wrote:

Hey,

> Yes, on the entire MPC I will see unrelated control plane protocols bounce, 
> eg. spanning-tree. If I recall correctly some protocols are handled by the 
> TRIO chips, right? I don't see any of my BFD-managed ISIS adjacencies drop.
>
>> If yes, just limit the mcast resolve to something reasonable 100pps should 
>> be plenty, provided we're not competing with  actual attack traffic.
>>
>> I would start with ddos-protection fixes and see if it behaves better with 
>> more restricted punting.
>
> I assume you're referring to "set forwarding-options multicast resolve-rate", 
> right?

No, something like this:
edit system ddos-protection protocols resolve mcast-v4
set bandwidth 100
set burst 100
set flow-level-bandwidth logical-interface 20
set flow-level-detection subscriber off
set flow-level-detection logical-interface on

So we allow on aggregate 100pps of mcast-v4 resolve, but only 20pps
per IFL. So even if one IFL is misbehaving, another IFL's mcast-v4
resolve works fine.

There are only 4k policers available in HW for ddos-protection, which
makes me turn off subscriber detection, as it's easy for attacker to
generate 'new subscriber' (like in BGP change SADDR => new subscriber)
and congest those 4k policer slots.
I make logical-interface on (instead of 'automatic', which means they
are added dynamically if aggregate policer if out-of-contract, but
with 'on' they are always there, this guarantees well-behaving IFL
does not suffer or the duration software detects this and adds the IFL
policers)

This is just random recommendation. And funny thing is, you'd need to
make similar thing to _all_ protools available for 'ddos-protection',
there are quite many of them, like maybe 100, so you'll get several
thousand new lines of config just for this.
I wish there was way to set default explicit values, but there isn't.

> I'm in the habit of running VSTP for everything (the Cisco half of my brain 
> keeps trying to type rapid-pvst+) that isn't a two-port affair. BPDUs are 
> definitely making it through, everything checks out. The paths over the 
> l2circuits are normally blocked via increased interface cost.

Hmm, then the redundancy really should have worked, unless you're
doing some IGMP Snooping or something in the switches (on by default)
which might require convergence on multicast states on the L2 too.
If you're rocking RSTP  or MST, and it's correctly configured (all
non-l2-core ports _MUST_ be portfast, because MST will not unblock
downstream ports, until there is explicit permission from all upstream
ports, and if you don't have portfast on in 1 port which is not
speaking MST, then this port will block whole MST convergence, as
you're waiting for explicit permission from that port, which will
never come).

> One of the VLANs carried as an l2circuit by the MXes between the EXes is 
> actually not spanning-tree controlled, but a "backup" PIM interface. 
> Essentially a clone of the EX-EX direct link, but with higher metric. Unlike 
> the other VLANs this one always has the PIM and BGP adjacency sending traffic 
> over it. The ddos-protection resolve-mcast4 action trips when multicast is 
> slammed over that or one of the VSTP-managed VLANs transitions to a 
> forwarding state.
>
>
> I can do up a diagram if that would help. I'm really not sure how I'd explain 
> this to JTAC and wanted to get some real-world experience from guys who are 
> working with this stuff.

Yeah multicast is tricky subject, I've had to learn it maybe 3 times
now, since it's needed so rarely it's easy to forget. I dislike
multicast, I believe there are probably lot of yet unknown DoS vectors
in it, and I would never run internet multicast. But for well
controlled internal applications, it may sometimes be least bad
solution.
But getting help on the setup probably is huge chore, just getting
external person to understand the network takes time.




-- 
  ++ytti
___
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp


Re: [j-nsp] MX punting packets to RE - why?

2016-02-02 Thread Ross Halliday
Thanks Michael. Looks like I'm at 66 pps like Dragan mentioned.

Some night I'll set up a maintenance window and play with this knob...

Cheers
Ross



-Original Message-
From: Michael Hare [mailto:michael.h...@wisc.edu] 
Sent: Monday, February 01, 2016 10:19 PM
To: Ross Halliday
Cc: juniper-nsp@puck.nether.net
Subject: RE: [j-nsp] MX punting packets to RE - why?

Ross-

Change 'fpc0' to 'afeb0' in your failed command.  I got goose eggs, but this 
lab chassis isn't doing multicast which may play a part.

$user@mx104-lab-re0> request pfe execute target afeb0 command "show nhdb mcast 
resolve" 
SENT: Ukern command: show nhdb mcast resolve
GOT:
GOT: Nexthop Info:
GOT:ID  TypeProtocolResolve-Rate
GOT: -    --  ---
LOCAL: End of file

-Michael

-Original Message-
From: juniper-nsp [mailto:juniper-nsp-boun...@puck.nether.net] On Behalf Of 
Ross Halliday
Sent: Monday, February 1, 2016 2:38 PM
To: Dragan Jovicic ; Saku Ytti 
Cc: juniper-nsp@puck.nether.net
Subject: Re: [j-nsp] MX punting packets to RE - why?

Hi Saku and Dragan,

Thank you for the responses, and apologies for the ambiguity.

The EXes are our video source switches. PIM RP is shared with MSDP to an 
anycast address. The MXes connect to the EXes at L3 via BGP - MX1/EX1 link is 
de-prioritized with a metric. Most of our receivers ride off of MX2, with a few 
further downstream.

Due to some interop issues and our use of VBR we've settled on a single MDT for 
this VRF. Being the default MDT it is of course joined on all PEs with this 
VRF. During normal operation, MX1, which doesn't have any active traffic for 
this VRF, has a full list of mcast routes with the source interface of the MDT.

So, in the first failure scenario - let's say EX2 or MX2 totally dies - MX1 
will lose a preferred BGP route to the RP and sources and see everything over 
the MX1/EX1 link, so all of the S,G entries will need to be updated from 
mt-0/0/0.1081344 to xe-0/3/0.3812.

If I am understanding what you guys are saying correctly, this would cause 
everything to get punted to the CPU until a new hardware shortcut is created, 
and in the meantime - since our entire channel lineup is in there - this would 
hammer the DoS protection mechanism?

Can the rate at which the joins are sent out be slowed? I can live with a bit 
of a delay on the channels coming back to life, but not with the entire slot 
getting blackholed... I am also open to tweaking the DoS protection settings 
but it seems to me that a 10x increase would be opening myself up to really 
slamming the RE and causing even bigger problems. I come from SUP720 world, and 
I rather like having a box that can process BFD and BGP updates at the same 
time LOL


The other failure scenario is when the EX1/EX2 link goes down. When this 
happens, all devices are still up, so as far as BGP or really anything on the 
MX "knows", nothing has changed. Metric and next-hops are identical to the PEs. 
Instead of pulling video from the direct link, EX1 & EX2 can only see each 
other through VLANs that the MXes carry as EoMPLS l2circuits. This is what 
truly baffles me, as none of what you guys mentioned with regards to should 
apply to an l2circuit.


Also,
> request pfe execute target fpc0 command "show nhdb mcast resolve"
error: command is not valid on the mx104

:(

Thanks for your help guys!

Ross



From: Dragan Jovicic [mailto:dragan...@gmail.com] 
Sent: Sunday, January 31, 2016 7:44 AM
To: Saku Ytti
Cc: Ross Halliday; juniper-nsp@puck.nether.net
Subject: Re: [j-nsp] MX punting packets to RE - why?

Correct me if I'm wrong, this looks like MX doesn't have multicast cache for 
all those S,G routes (in inet.1).
So first packet of each S,G entry must first be resolved by kernel and 
downloaded to PFE.
DDOS feature is activated because large influx of unresolved packets are 
passing trough the router. You could change default DDOS setting for this type 
of traffic on your FPC.
Another thing that comes to mind is that kernel itself has limited number of 
resolves per second, which is 66. That is, 66 different NH S,G entries will be 
resolved per second.

dj@mx-re0> request pfe execute target fpc0 command "show nhdb mcast resolve" 
SENT: Ukern command: show nhdb mcast resolve
GOT:
GOT: Nexthop Info:
GOT:    ID  Type    Protocol    Resolve-Rate
GOT: -    --  ---
GOT:  1927   Resolve    IPv6   66
GOT:  1962   Resolve    IPv4   66
LOCAL: End of file
This is modified by (hidden) knob:

dj@mx-re0# set forwarding-options multicast resolve-rate ?  
Possible completions:
     Multicast resolve rate (100..1000 per second)
{master}[edit]
Mind you, I haven't tested this.
HTH,
Regards

On Sat, Jan 30, 2016 at 12:04 PM, Saku Ytti  wrote:
Hey Ross,

It's not clear to me i

Re: [j-nsp] MX punting packets to RE - why?

2016-02-02 Thread Ross Halliday
Hello,

> > If I am understanding what you guys are saying correctly, this would cause 
> > everything to get punted to the CPU until a new hardware shortcut is 
> > created, and in the meantime - since our entire channel lineup is in there 
> > - this would hammer the DoS protection mechanism?
>
> Yes, if ingress interface does not match, they will be punted.

Okay, thanks

> > Can the rate at which the joins are sent out be slowed? I can live with a 
> > bit of a delay on the channels coming back to life, but not with the entire 
> > slot getting blackholed...
>
> What do you mean 'entire slot being blackholed', do you mean losing unrelated 
> control-plane stuff, like BGP/ARP etc?

Yes, on the entire MPC I will see unrelated control plane protocols bounce, eg. 
spanning-tree. If I recall correctly some protocols are handled by the TRIO 
chips, right? I don't see any of my BFD-managed ISIS adjacencies drop.

> If yes, just limit the mcast resolve to something reasonable 100pps should be 
> plenty, provided we're not competing with  actual attack traffic.
>
> I would start with ddos-protection fixes and see if it behaves better with 
> more restricted punting.

I assume you're referring to "set forwarding-options multicast resolve-rate", 
right?

> Further research might involve figuring out if both MX boxes have multicast 
> state with source towards the local EX port, and clients subscribed. So that 
> no convergence is needed.

Interesting concept! Doesn't bother me at all, we like the idea of having our 
multicast available everywhere anyway.

> It wasn't obvious to me what kind of negative impact you observe when the 
> EX-EX link goes down. How are you now stopping loop in the EX network? You 
> have direct physical link between them, then you have l2circuit as well? But 
> it looks like you're not carrying BPDU over the l2circuit? So if you rely on 
> STP, I'm not entirely sure how the L2 redundancy works, which port is 
> normally being blocked? The actual physical link between switches or the link 
> via l2circuit, since my first guess would be that there would be L2 loop in 
> the topology and nothing to stop it, so I'm not sure I understand why it 
> works at all.

I'm in the habit of running VSTP for everything (the Cisco half of my brain 
keeps trying to type rapid-pvst+) that isn't a two-port affair. BPDUs are 
definitely making it through, everything checks out. The paths over the 
l2circuits are normally blocked via increased interface cost.

One of the VLANs carried as an l2circuit by the MXes between the EXes is 
actually not spanning-tree controlled, but a "backup" PIM interface. 
Essentially a clone of the EX-EX direct link, but with higher metric. Unlike 
the other VLANs this one always has the PIM and BGP adjacency sending traffic 
over it. The ddos-protection resolve-mcast4 action trips when multicast is 
slammed over that or one of the VSTP-managed VLANs transitions to a forwarding 
state.


I can do up a diagram if that would help. I'm really not sure how I'd explain 
this to JTAC and wanted to get some real-world experience from guys who are 
working with this stuff.

Thanks for all your help!

Ross
___
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp


Re: [j-nsp] MX punting packets to RE - why?

2016-02-01 Thread Michael Hare
Ross-

Change 'fpc0' to 'afeb0' in your failed command.  I got goose eggs, but this 
lab chassis isn't doing multicast which may play a part.

$user@mx104-lab-re0> request pfe execute target afeb0 command "show nhdb mcast 
resolve" 
SENT: Ukern command: show nhdb mcast resolve
GOT:
GOT: Nexthop Info:
GOT:ID  TypeProtocolResolve-Rate
GOT: -    --  ---
LOCAL: End of file

-Michael

-Original Message-
From: juniper-nsp [mailto:juniper-nsp-boun...@puck.nether.net] On Behalf Of 
Ross Halliday
Sent: Monday, February 1, 2016 2:38 PM
To: Dragan Jovicic ; Saku Ytti 
Cc: juniper-nsp@puck.nether.net
Subject: Re: [j-nsp] MX punting packets to RE - why?

Hi Saku and Dragan,

Thank you for the responses, and apologies for the ambiguity.

The EXes are our video source switches. PIM RP is shared with MSDP to an 
anycast address. The MXes connect to the EXes at L3 via BGP - MX1/EX1 link is 
de-prioritized with a metric. Most of our receivers ride off of MX2, with a few 
further downstream.

Due to some interop issues and our use of VBR we've settled on a single MDT for 
this VRF. Being the default MDT it is of course joined on all PEs with this 
VRF. During normal operation, MX1, which doesn't have any active traffic for 
this VRF, has a full list of mcast routes with the source interface of the MDT.

So, in the first failure scenario - let's say EX2 or MX2 totally dies - MX1 
will lose a preferred BGP route to the RP and sources and see everything over 
the MX1/EX1 link, so all of the S,G entries will need to be updated from 
mt-0/0/0.1081344 to xe-0/3/0.3812.

If I am understanding what you guys are saying correctly, this would cause 
everything to get punted to the CPU until a new hardware shortcut is created, 
and in the meantime - since our entire channel lineup is in there - this would 
hammer the DoS protection mechanism?

Can the rate at which the joins are sent out be slowed? I can live with a bit 
of a delay on the channels coming back to life, but not with the entire slot 
getting blackholed... I am also open to tweaking the DoS protection settings 
but it seems to me that a 10x increase would be opening myself up to really 
slamming the RE and causing even bigger problems. I come from SUP720 world, and 
I rather like having a box that can process BFD and BGP updates at the same 
time LOL


The other failure scenario is when the EX1/EX2 link goes down. When this 
happens, all devices are still up, so as far as BGP or really anything on the 
MX "knows", nothing has changed. Metric and next-hops are identical to the PEs. 
Instead of pulling video from the direct link, EX1 & EX2 can only see each 
other through VLANs that the MXes carry as EoMPLS l2circuits. This is what 
truly baffles me, as none of what you guys mentioned with regards to should 
apply to an l2circuit.


Also,
> request pfe execute target fpc0 command "show nhdb mcast resolve"
error: command is not valid on the mx104

:(

Thanks for your help guys!

Ross



From: Dragan Jovicic [mailto:dragan...@gmail.com] 
Sent: Sunday, January 31, 2016 7:44 AM
To: Saku Ytti
Cc: Ross Halliday; juniper-nsp@puck.nether.net
Subject: Re: [j-nsp] MX punting packets to RE - why?

Correct me if I'm wrong, this looks like MX doesn't have multicast cache for 
all those S,G routes (in inet.1).
So first packet of each S,G entry must first be resolved by kernel and 
downloaded to PFE.
DDOS feature is activated because large influx of unresolved packets are 
passing trough the router. You could change default DDOS setting for this type 
of traffic on your FPC.
Another thing that comes to mind is that kernel itself has limited number of 
resolves per second, which is 66. That is, 66 different NH S,G entries will be 
resolved per second.

dj@mx-re0> request pfe execute target fpc0 command "show nhdb mcast resolve" 
SENT: Ukern command: show nhdb mcast resolve
GOT:
GOT: Nexthop Info:
GOT:    ID  Type    Protocol    Resolve-Rate
GOT: -    --  ---
GOT:  1927   Resolve    IPv6   66
GOT:  1962   Resolve    IPv4   66
LOCAL: End of file
This is modified by (hidden) knob:

dj@mx-re0# set forwarding-options multicast resolve-rate ?  
Possible completions:
     Multicast resolve rate (100..1000 per second)
{master}[edit]
Mind you, I haven't tested this.
HTH,
Regards

On Sat, Jan 30, 2016 at 12:04 PM, Saku Ytti  wrote:
Hey Ross,

It's not clear to me if the mcast is only inside the EX or if it's
also on the MX's. And it's not clear to me how the faults impact the
multicast distribution tree. On stable state, do both MX80's have
mcast states for groups? Or only one of them?

Trio maps each multicast group into an input interface, if mismatch
occurs, that is group ingresses from other input interface than the
specif

Re: [j-nsp] MX punting packets to RE - why?

2016-02-01 Thread Saku Ytti
On 1 February 2016 at 22:37, Ross Halliday
 wrote:

Hey,

> If I am understanding what you guys are saying correctly, this would cause 
> everything to get punted to the CPU until a new hardware shortcut is created, 
> and in the meantime - since our entire channel lineup is in there - this 
> would hammer the DoS protection mechanism?

Yes, if ingress interface does not match, they will be punted.

> Can the rate at which the joins are sent out be slowed? I can live with a bit 
> of a delay on the channels coming back to life, but not with the entire slot 
> getting blackholed...

What do you mean 'entire slot being blackholed', do you mean losing
unrelated control-plane stuff, like BGP/ARP etc? If yes, just limit
the mcast resolve to something reasonable 100pps should be plenty,
provided we're not competing with actual attack traffic.

I would start with ddos-protection fixes and see if it behaves better
with more restricted punting. Further research might involve figuring
out if both MX boxes have multicast state with source towards the
local EX port, and clients subscribed. So that no convergence is
needed.


It wasn't obvious to me what kind of negative impact you observe when
the EX-EX link goes down. How are you now stopping loop in the EX
network? You have direct physical link between them, then you have
l2circuit as well? But it looks like you're not carrying BPDU over the
l2circuit? So if you rely on STP, I'm not entirely sure how the L2
redundancy works, which port is normally being blocked? The actual
physical link between switches or the link via l2circuit, since  my
first guess would be that there would be L2 loop in the topology and
nothing to stop it, so I'm not sure I understand why it works at all.

-- 
  ++ytti
___
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp


Re: [j-nsp] MX punting packets to RE - why?

2016-02-01 Thread Ross Halliday
Hi Saku and Dragan,

Thank you for the responses, and apologies for the ambiguity.

The EXes are our video source switches. PIM RP is shared with MSDP to an 
anycast address. The MXes connect to the EXes at L3 via BGP - MX1/EX1 link is 
de-prioritized with a metric. Most of our receivers ride off of MX2, with a few 
further downstream.

Due to some interop issues and our use of VBR we've settled on a single MDT for 
this VRF. Being the default MDT it is of course joined on all PEs with this 
VRF. During normal operation, MX1, which doesn't have any active traffic for 
this VRF, has a full list of mcast routes with the source interface of the MDT.

So, in the first failure scenario - let's say EX2 or MX2 totally dies - MX1 
will lose a preferred BGP route to the RP and sources and see everything over 
the MX1/EX1 link, so all of the S,G entries will need to be updated from 
mt-0/0/0.1081344 to xe-0/3/0.3812.

If I am understanding what you guys are saying correctly, this would cause 
everything to get punted to the CPU until a new hardware shortcut is created, 
and in the meantime - since our entire channel lineup is in there - this would 
hammer the DoS protection mechanism?

Can the rate at which the joins are sent out be slowed? I can live with a bit 
of a delay on the channels coming back to life, but not with the entire slot 
getting blackholed... I am also open to tweaking the DoS protection settings 
but it seems to me that a 10x increase would be opening myself up to really 
slamming the RE and causing even bigger problems. I come from SUP720 world, and 
I rather like having a box that can process BFD and BGP updates at the same 
time LOL


The other failure scenario is when the EX1/EX2 link goes down. When this 
happens, all devices are still up, so as far as BGP or really anything on the 
MX "knows", nothing has changed. Metric and next-hops are identical to the PEs. 
Instead of pulling video from the direct link, EX1 & EX2 can only see each 
other through VLANs that the MXes carry as EoMPLS l2circuits. This is what 
truly baffles me, as none of what you guys mentioned with regards to should 
apply to an l2circuit.


Also,
> request pfe execute target fpc0 command "show nhdb mcast resolve"
error: command is not valid on the mx104

:(

Thanks for your help guys!

Ross



From: Dragan Jovicic [mailto:dragan...@gmail.com] 
Sent: Sunday, January 31, 2016 7:44 AM
To: Saku Ytti
Cc: Ross Halliday; juniper-nsp@puck.nether.net
Subject: Re: [j-nsp] MX punting packets to RE - why?

Correct me if I'm wrong, this looks like MX doesn't have multicast cache for 
all those S,G routes (in inet.1).
So first packet of each S,G entry must first be resolved by kernel and 
downloaded to PFE.
DDOS feature is activated because large influx of unresolved packets are 
passing trough the router. You could change default DDOS setting for this type 
of traffic on your FPC.
Another thing that comes to mind is that kernel itself has limited number of 
resolves per second, which is 66. That is, 66 different NH S,G entries will be 
resolved per second.

dj@mx-re0> request pfe execute target fpc0 command "show nhdb mcast resolve" 
SENT: Ukern command: show nhdb mcast resolve
GOT:
GOT: Nexthop Info:
GOT:    ID  Type    Protocol    Resolve-Rate
GOT: -    --  ---
GOT:  1927   Resolve    IPv6   66
GOT:  1962   Resolve    IPv4   66
LOCAL: End of file
This is modified by (hidden) knob:

dj@mx-re0# set forwarding-options multicast resolve-rate ?  
Possible completions:
     Multicast resolve rate (100..1000 per second)
{master}[edit]
Mind you, I haven't tested this.
HTH,
Regards

On Sat, Jan 30, 2016 at 12:04 PM, Saku Ytti  wrote:
Hey Ross,

It's not clear to me if the mcast is only inside the EX or if it's
also on the MX's. And it's not clear to me how the faults impact the
multicast distribution tree. On stable state, do both MX80's have
mcast states for groups? Or only one of them?

Trio maps each multicast group into an input interface, if mismatch
occurs, that is group ingresses from other input interface than the
specified, I believe this causes host punt.

Alas DDoS-protection limits are quite insane, like 20kpps for many
protocols, that's more than NPU=>LC_PCU punting allows for, so it'll
kill pretty much everything. I'd set protocols I don't need to
10-100pps, non-critical protocols I need to 4kpps and critical
protocols I need to 8kpps.
And yes, configure each and every ddos-protocol, it'll inflate the
config quite a bit, but there is always 'set apply-flags omit', which
can be useful way to reduce config cruft about standard-configs you
don't really want to review in normally.


On 29 January 2016 at 23:36, Ross Halliday
 wrote:
> Hi list,
>
> I've run into an oddity that's been caus

Re: [j-nsp] MX punting packets to RE - why?

2016-01-31 Thread Dragan Jovicic
Correct me if I'm wrong, this looks like MX doesn't have multicast cache
for all those S,G routes (in inet.1).
So first packet of each S,G entry must first be resolved by kernel and
downloaded to PFE.

DDOS feature is activated because large influx of unresolved packets are
passing trough the router. You could change default DDOS setting for this
type of traffic on your FPC.

Another thing that comes to mind is that kernel itself has limited number
of resolves per second, which is 66. That is, 66 different NH S,G entries
will be resolved per second.

dj@mx-re0> request pfe execute target fpc0 command "show nhdb mcast
resolve"
SENT: Ukern command: show nhdb mcast resolve
GOT:
GOT: Nexthop Info:
GOT:ID  TypeProtocolResolve-Rate
GOT: -    --  ---
GOT:  1927   ResolveIPv6   66
GOT:  1962   ResolveIPv4   66
LOCAL: End of file

This is modified by (hidden) knob:

dj@mx-re0# set forwarding-options multicast resolve-rate ?
Possible completions:
 Multicast resolve rate (100..1000 per second)
{master}[edit]

Mind you, I haven't tested this.

HTH,

Regards


On Sat, Jan 30, 2016 at 12:04 PM, Saku Ytti  wrote:

> Hey Ross,
>
> It's not clear to me if the mcast is only inside the EX or if it's
> also on the MX's. And it's not clear to me how the faults impact the
> multicast distribution tree. On stable state, do both MX80's have
> mcast states for groups? Or only one of them?
>
> Trio maps each multicast group into an input interface, if mismatch
> occurs, that is group ingresses from other input interface than the
> specified, I believe this causes host punt.
>
> Alas DDoS-protection limits are quite insane, like 20kpps for many
> protocols, that's more than NPU=>LC_PCU punting allows for, so it'll
> kill pretty much everything. I'd set protocols I don't need to
> 10-100pps, non-critical protocols I need to 4kpps and critical
> protocols I need to 8kpps.
> And yes, configure each and every ddos-protocol, it'll inflate the
> config quite a bit, but there is always 'set apply-flags omit', which
> can be useful way to reduce config cruft about standard-configs you
> don't really want to review in normally.
>
>
> On 29 January 2016 at 23:36, Ross Halliday
>  wrote:
> > Hi list,
> >
> > I've run into an oddity that's been causing us some issues. First, a
> diagram!
> >
> > EX1EX2
> >  |  |
> >  |  |
> > MX1MX2
> >
> > EX1 and EX2 are independent switches (not VC) that run a ton of video
> traffic. EX4200 on 12.3R8.7
> > MX1 and MX2 are MPLS PEs that ingest video and send it out to our
> network. MX104 on 13.3R4.6
> > Several VLANs span EX1 and EX2 as each switch has a server that requires
> Layer 2 to the other unit. (active/active middleware)
> > EX1-EX2 link is direct fiber carrying VLANs
> > MX1-MX2 link is MPLS
> >
> > The MX ports facing the EXes terminate L3 as well as hauling L2:
> >
> > MX1:
> >
> > xe-0/3/0 {
> > description "EX1 xe-3/1/0";
> > flexible-vlan-tagging;
> > hold-time up 5000 down 0;
> > encapsulation flexible-ethernet-services;
> > unit 3810 {
> > description "Backup link between TV switches";
> > encapsulation vlan-ccc;
> > vlan-id-list [ 304 810-811 3810 3813 3821-3822 ];
> > }
> > unit 3812 {
> > description "Video feed 2/2 from head end switch";
> > vlan-id 3812;
> > family inet {
> > address MX1/31;
> > }
> > }
> > }
> > l2circuit {
> > neighbor MX2 {
> > interface xe-0/3/0.3810 {
> > virtual-circuit-id 3810;
> > description "IPTV switch redundant link";
> > no-control-word;
> > }
> > }
> > }
> >
> > MX2:
> >
> > xe-0/3/0 {
> > description "EX1 xe-0/1/0";
> > flexible-vlan-tagging;
> > hold-time up 5000 down 0;
> > encapsulation flexible-ethernet-services;
> > unit 3810 {
> > description "Backup link between TV switches";
> > encapsulation vlan-ccc;
> > vlan-id-list [ 304 810-811 3813 3821-3822 ];
> > }
> > unit 3811 {
> > description "Video feed 1/2 from head end switch";
> > vlan-id 3811;
> > family inet {
> > address MX2/31;
> > }
> > }
> > }
> > l2circuit {
> > neighbor MX1 {
> > interface xe-0/3/0.3810 {
> > virtual-circuit-id 3810;
> > description "IPTV switch redundant link";
> > no-control-word;
> > }
> > }
> > }
> >
> > We have dual L3 feeds from "the switches" to "the routers", and VLANs
> are carried over an l2circuit should the direct link between EX1 & EX2 bite
> the dust. It should be noted that MX1 is basically a "backup" - traffic
> normally f

Re: [j-nsp] MX punting packets to RE - why?

2016-01-30 Thread Saku Ytti
Hey Ross,

It's not clear to me if the mcast is only inside the EX or if it's
also on the MX's. And it's not clear to me how the faults impact the
multicast distribution tree. On stable state, do both MX80's have
mcast states for groups? Or only one of them?

Trio maps each multicast group into an input interface, if mismatch
occurs, that is group ingresses from other input interface than the
specified, I believe this causes host punt.

Alas DDoS-protection limits are quite insane, like 20kpps for many
protocols, that's more than NPU=>LC_PCU punting allows for, so it'll
kill pretty much everything. I'd set protocols I don't need to
10-100pps, non-critical protocols I need to 4kpps and critical
protocols I need to 8kpps.
And yes, configure each and every ddos-protocol, it'll inflate the
config quite a bit, but there is always 'set apply-flags omit', which
can be useful way to reduce config cruft about standard-configs you
don't really want to review in normally.


On 29 January 2016 at 23:36, Ross Halliday
 wrote:
> Hi list,
>
> I've run into an oddity that's been causing us some issues. First, a diagram!
>
> EX1EX2
>  |  |
>  |  |
> MX1MX2
>
> EX1 and EX2 are independent switches (not VC) that run a ton of video 
> traffic. EX4200 on 12.3R8.7
> MX1 and MX2 are MPLS PEs that ingest video and send it out to our network. 
> MX104 on 13.3R4.6
> Several VLANs span EX1 and EX2 as each switch has a server that requires 
> Layer 2 to the other unit. (active/active middleware)
> EX1-EX2 link is direct fiber carrying VLANs
> MX1-MX2 link is MPLS
>
> The MX ports facing the EXes terminate L3 as well as hauling L2:
>
> MX1:
>
> xe-0/3/0 {
> description "EX1 xe-3/1/0";
> flexible-vlan-tagging;
> hold-time up 5000 down 0;
> encapsulation flexible-ethernet-services;
> unit 3810 {
> description "Backup link between TV switches";
> encapsulation vlan-ccc;
> vlan-id-list [ 304 810-811 3810 3813 3821-3822 ];
> }
> unit 3812 {
> description "Video feed 2/2 from head end switch";
> vlan-id 3812;
> family inet {
> address MX1/31;
> }
> }
> }
> l2circuit {
> neighbor MX2 {
> interface xe-0/3/0.3810 {
> virtual-circuit-id 3810;
> description "IPTV switch redundant link";
> no-control-word;
> }
> }
> }
>
> MX2:
>
> xe-0/3/0 {
> description "EX1 xe-0/1/0";
> flexible-vlan-tagging;
> hold-time up 5000 down 0;
> encapsulation flexible-ethernet-services;
> unit 3810 {
> description "Backup link between TV switches";
> encapsulation vlan-ccc;
> vlan-id-list [ 304 810-811 3813 3821-3822 ];
> }
> unit 3811 {
> description "Video feed 1/2 from head end switch";
> vlan-id 3811;
> family inet {
> address MX2/31;
> }
> }
> }
> l2circuit {
> neighbor MX1 {
> interface xe-0/3/0.3810 {
> virtual-circuit-id 3810;
> description "IPTV switch redundant link";
> no-control-word;
> }
> }
> }
>
> We have dual L3 feeds from "the switches" to "the routers", and VLANs are 
> carried over an l2circuit should the direct link between EX1 & EX2 bite the 
> dust. It should be noted that MX1 is basically a "backup" - traffic normally 
> flows EX1-EX2-MX2. The goal of this setup is so that we can take out any link 
> and still have our video working.
>
> It works... eventually.
>
> The problem I am running into is that when a fail occurs, or I simply pull a 
> VLAN from the EX1-EX2 link, multicast is suddenly slammed either across or 
> into the MXes. When that happens, I get this lovely message:
>
> jddosd[1527]: DDOS_PROTOCOL_VIOLATION_SET: Protocol resolve:mcast-v4 is 
> violated at fpc 0 for 38 times, started at 2016-01-27 04:59:55 EST
> jddosd[1527]: DDOS_PROTOCOL_VIOLATION_CLEAR: Protocol resolve:mcast-v4 has 
> returned to normal. Violated at fpc 0 for 38 times, from 2016-01-27 04:59:55 
> EST to 2016-01-27 04:59:55 EST
>
> ...and traffic (maybe of just offending class) on that slot is dumped for a 
> little while.
>
>> show ddos-protection protocols resolve statistics
>
>   Packet type: mcast-v4
> System-wide information:
>   Bandwidth is no longer being violated
> No. of FPCs that have received excess traffic: 1
> Last violation started at: 2016-01-27 04:59:55 EST
> Last violation ended at:   2016-01-27 04:59:55 EST
> Duration of last violation: 00:00:00 Number of violations: 38
>   Received:  4496939 Arrival rate: 0 pps
>   Dropped:   2161644 Max arrival rate: 45877 pps
> Routing Engine information:
>   Policer is never violate

[j-nsp] MX punting packets to RE - why?

2016-01-29 Thread Ross Halliday
Hi list,

I've run into an oddity that's been causing us some issues. First, a diagram!

EX1EX2
 |  |
 |  |
MX1MX2

EX1 and EX2 are independent switches (not VC) that run a ton of video traffic. 
EX4200 on 12.3R8.7
MX1 and MX2 are MPLS PEs that ingest video and send it out to our network. 
MX104 on 13.3R4.6
Several VLANs span EX1 and EX2 as each switch has a server that requires Layer 
2 to the other unit. (active/active middleware)
EX1-EX2 link is direct fiber carrying VLANs
MX1-MX2 link is MPLS

The MX ports facing the EXes terminate L3 as well as hauling L2:

MX1:

xe-0/3/0 {
description "EX1 xe-3/1/0";
flexible-vlan-tagging;
hold-time up 5000 down 0;
encapsulation flexible-ethernet-services;
unit 3810 {
description "Backup link between TV switches";
encapsulation vlan-ccc;
vlan-id-list [ 304 810-811 3810 3813 3821-3822 ];
}
unit 3812 {
description "Video feed 2/2 from head end switch";
vlan-id 3812;
family inet {
address MX1/31;
}
}
}
l2circuit {
neighbor MX2 {
interface xe-0/3/0.3810 {
virtual-circuit-id 3810;
description "IPTV switch redundant link";
no-control-word;
}
}
}

MX2:

xe-0/3/0 {
description "EX1 xe-0/1/0";
flexible-vlan-tagging;
hold-time up 5000 down 0;
encapsulation flexible-ethernet-services;
unit 3810 {
description "Backup link between TV switches";
encapsulation vlan-ccc;
vlan-id-list [ 304 810-811 3813 3821-3822 ];
}
unit 3811 {
description "Video feed 1/2 from head end switch";
vlan-id 3811;
family inet {
address MX2/31;
}
}
}
l2circuit {
neighbor MX1 {
interface xe-0/3/0.3810 {
virtual-circuit-id 3810;
description "IPTV switch redundant link";
no-control-word;
}
}
}

We have dual L3 feeds from "the switches" to "the routers", and VLANs are 
carried over an l2circuit should the direct link between EX1 & EX2 bite the 
dust. It should be noted that MX1 is basically a "backup" - traffic normally 
flows EX1-EX2-MX2. The goal of this setup is so that we can take out any link 
and still have our video working.

It works... eventually.

The problem I am running into is that when a fail occurs, or I simply pull a 
VLAN from the EX1-EX2 link, multicast is suddenly slammed either across or into 
the MXes. When that happens, I get this lovely message:

jddosd[1527]: DDOS_PROTOCOL_VIOLATION_SET: Protocol resolve:mcast-v4 is 
violated at fpc 0 for 38 times, started at 2016-01-27 04:59:55 EST
jddosd[1527]: DDOS_PROTOCOL_VIOLATION_CLEAR: Protocol resolve:mcast-v4 has 
returned to normal. Violated at fpc 0 for 38 times, from 2016-01-27 04:59:55 
EST to 2016-01-27 04:59:55 EST

...and traffic (maybe of just offending class) on that slot is dumped for a 
little while.

> show ddos-protection protocols resolve statistics

  Packet type: mcast-v4
System-wide information:
  Bandwidth is no longer being violated
No. of FPCs that have received excess traffic: 1
Last violation started at: 2016-01-27 04:59:55 EST
Last violation ended at:   2016-01-27 04:59:55 EST
Duration of last violation: 00:00:00 Number of violations: 38
  Received:  4496939 Arrival rate: 0 pps
  Dropped:   2161644 Max arrival rate: 45877 pps
Routing Engine information:
  Policer is never violated
  Received:  130584  Arrival rate: 0 pps
  Dropped:   0   Max arrival rate: 1 pps
Dropped by aggregate policer: 0
FPC slot 0 information:
  Policer is no longer being violated
Last violation started at: 2016-01-27 04:59:57 EST
Last violation ended at:   2016-01-27 04:59:57 EST
Duration of last violation: 00:00:00 Number of violations: 38
  Received:  4496939 Arrival rate: 0 pps
  Dropped:   2161644 Max arrival rate: 45877 pps
Dropped by this policer:  2161644
Dropped by aggregate policer: 0
Dropped by flow suppression:  0
  Flow counts:
Aggregation level Current   Total detected   State
Subscriber0 0Active

Once the thing recovers, everything works again. But I cannot change a VLAN, a 
spanning tree topology, or work on anything without risking serious impact to 
my network!

I understand that the 'resolve' protocol means these packets are being sent to 
the RE.

...why the hell are they being sent to the RE? Even when there's a change on 
traffic that gets sent into that l2circuit - shouldn't this just be punte