Re: [j-nsp] MX104 with full BGP table problems

2014-05-22 Thread Brad Fleming
I wanted to close the loop on this thread.

JTAC was able to determine the root cause.. and as more often then not it was 
user error.

I had a default route configured to resolve all BGP next-hops for our 
test/lab/staging configuration and didn’t realize the default was evaluated 
last in this process. So as the box learned more specific routes RPD was 
working overtime to keep up.

We now have a full mesh of 11 iBGP sessions talking to the MX104; total of 890K 
paths and 500K active routes with no problems. CPU load is in the 98% idle even 
with general, constant Internet BGP table churn. We’re seeing good behavior 
using Junos 13.2R2 and 13.3R1.

Thanks to everyone for your suggestions. And thanks to Dave from JTAC for 
listening, asking the correct questions, then finding the stupid user’s problem.



On May 16, 2014, at 2:28 PM, Brad Fleming  wrote:

> Thanks for the response; answers inline...
> 
> 
> On May 16, 2014, at 1:58 PM, Tyler Christiansen  
> wrote:
> 
>> I don't have experience with the MX104s but do with the rest of the line 
>> (MX80 to MX2010 [excluding MX104, of course]).  MX80 isn't dual RE, but the 
>> CPUs are the same family between MX80 and MX104 IIRC--the MX104 is just 500 
>> or 600 Mhz faster.  And the MX80 kind of chokes when receiving a full feed 
>> (even just one at a time can easily send it up to ~40% during the initial 
>> feed consumption).  ;)
>> 
>> The MX80 and MX104 being sold as edge BGP routers is pretty much only 
>> because it has enough memory to do it...not because it's a good idea.
>> 
>> It's pretty odd for the backup RE to have CPU utilization (based on 
>> experience with the other dual RE MX devices).  Some, yes, but not 100% 
>> utilization as you show there.  I would buy 100% utilization during initial 
>> feed consumption on the master.  After you have some stability in the 
>> network, though, the CPU should be back down to ~5-15% (depending on what 
>> you have going on).
> 
> I agree; we’ve run a few M10is and never had this issue, but.. totally 
> different platform, and much older version of Junos made me generally 
> discount it. These are the first multi-RE boxes we’ve had running any Junos 
> newer then 10.0. Thanks for pointing it out, it’s something I missed in my 
> previous email. As the previous output shows, 15min load averages for each RE 
> are ~1.20 so the load remains elevated. I just confirmed that the 15min load 
> average after about 2hours of “sitting” remains ~1.22.
>> 
>> How aggressive are your BGP timers?  You may want to consider BFD instead of 
>> BGP timers for aggressive keepalives.
> 
> BGP timers are default; however, we’ve tried relaxing them with no change in 
> behavior.
>> 
>> Are you doing just plain IPv4 BGP, or are you utilizing MBGP extensions?  
>> MBGP extensions can inflate the size of the BGP tables and make the router 
>> do more work.
> 
> We’ve tried both with no difference in performance. The example outputs in my 
> original message were with MBGP extensions enabled but doing only IPv4 
> unicast on the session produces the same result.
>> 
>> In all scenarios, you really should probably have loopback IPs in the IGP 
>> and have the nexthop set to the loopback IPs for iBGP sessions.  I'm not 
>> sure why you have /30 P2P links as the next-hops as they're potentially 
>> unstable (even if they're not now, they can easily become unstable once in 
>> production).  I assume that since you mentioned you know it's not 
>> recommended, you're going to be changing that.
> 
> This is a bit of a legacy issue within our network. We’ve operated for nearly 
> 12 years using the actual PtP in our IGP and retaining it in BGP 
> advertisements. It is something we plan to resolve with the deployment of 
> this gear (as well as several new MX960s that were part of the same PO).
>> 
>> In scenario #2, how many RRs does the MX104 peer with?  And are they sending 
>> full routes or full routes + more?
> 
> The box was only peering with a single RR. The RR was only sending the 
> standard, full table (~496K routes), no VPN, no mcast, etc.
>> 
>> Finally, in scenario #3, if you're trying to do a full mesh with 11 other 
>> peers, the MX104 will choke if they're all trying to load full tables.  
>> There are about 500,000 routes in the global table, so you're trying to load 
>> 5,500,000 routes into a box with a 1.8Ghz CPU and 4GB RAM.
> 
> In scenario #3 the total number of routes entering the RE was ~867K with 
> ~496K active.
>> 
>> Regardless, I would think that the MX104 should be perfectly capable of 
>> scaling to at least five or six full feeds.  I would suspect either a bug in 
>> the software or very aggressive timers.
>> 
>> On Fri, May 16, 2014 at 11:00 AM, Brad Fleming  wrote:
>> We’ve been working with a handful of MX104s on the bench in preparation of 
>> putting them into a live network. We started pushing a full BGP table into 
>> the device and stumbled across some CPU utilization problems.
>> 
>> We tried pus

Re: [j-nsp] MX104 with full BGP table problems

2014-05-18 Thread Mark Tinka
On Sunday, May 18, 2014 01:35:17 PM Mark Tinka wrote:

> For all vendors, really; not just Cisco.

%s/Cisco/Juniper.

Mark.


signature.asc
Description: This is a digitally signed message part.
___
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp

Re: [j-nsp] MX104 with full BGP table problems

2014-05-18 Thread Mark Tinka
On Sunday, May 18, 2014 12:57:59 PM Ben Dale wrote:

> Looking forward to the day when there is a commonish RE
> (eg: intel-based) for all REs across the fleet.  You
> would think this would save a pile of development
> work...

For all vendors, really; not just Cisco.

For all the money we pay to buy routers, it's a shame a 
100Mbps DoS can take out a box rated to push 1Tbps/slot.

Mark.


signature.asc
Description: This is a digitally signed message part.
___
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp

Re: [j-nsp] MX104 with full BGP table problems

2014-05-18 Thread Ben Dale
> 
> You can cool a Sandy/Ivy Bridge CPU in a laptop at +40C degrees in direct
> sunlight outside (e.g. Panasonic Thoughbook) and they can't cool the same
> type of CPU in a datacenter with ~20-22 degrees ambient temperature and
> very powerful fans that are present in a router? I doubt it that the Xeon
> versions of those CPUs produce much more heat. Factor in the fact that they
> can go all SSD and you can have a very small package that can act as a RE
> (think of a very mini mini version of Intel NUC).

The 104s are rated for use in -40 to +65° C and based on the form-factor are 
designed for use in areas a little less comfortable than your average data 
centre (places where MX80s are not designed to go like de-mountables/cabinets). 

> There has to be another reason for the crappy REs in these routers.

Yeah, I'm kinda surprised the 104s maintained the same CPU, but I guess it 
meant much less code re-spin.  

Looking forward to the day when there is a commonish RE (eg: intel-based) for 
all REs across the fleet.  You would think this would save a pile of 
development work...


___
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp


Re: [j-nsp] MX104 with full BGP table problems

2014-05-17 Thread Mark Tinka
On Saturday, May 17, 2014 11:07:29 AM Eugeniu Patrascu 
wrote:

> You can cool a Sandy/Ivy Bridge CPU in a laptop at +40C
> degrees in direct sunlight outside (e.g. Panasonic
> Thoughbook) and they can't cool the same type of CPU in
> a datacenter with ~20-22 degrees ambient temperature and
> very powerful fans that are present in a router? I doubt
> it that the Xeon versions of those CPUs produce much
> more heat. Factor in the fact that they can go all SSD
> and you can have a very small package that can act as a
> RE (think of a very mini mini version of Intel NUC).

That and they needed to draw more power than they currently 
budgeted for - well, that's the story anyway.

Unlike the MX80, there is, at least, some hope given you can 
upgrade the RE's on the MX104.

Mark.


signature.asc
Description: This is a digitally signed message part.
___
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp

Re: [j-nsp] MX104 with full BGP table problems

2014-05-17 Thread Eugeniu Patrascu
On Sat, May 17, 2014 at 11:55 AM, Mark Tinka  wrote:

> On Friday, May 16, 2014 09:20:50 PM Saku Ytti wrote:
>
> > bill-of-material, pincount, thermal might be argument,
> > they are SOC so you get everything nicely in single
> > package.
>
> I know Juniper are not thinking of anything more powerful in
> the MX104 due to thermal budget, but alluded to the fact
> that this could change as more powerful CPU's require lower
> thermal budgets, due to the fact that the RE's on the MX104
> are modular.
>

You can cool a Sandy/Ivy Bridge CPU in a laptop at +40C degrees in direct
sunlight outside (e.g. Panasonic Thoughbook) and they can't cool the same
type of CPU in a datacenter with ~20-22 degrees ambient temperature and
very powerful fans that are present in a router? I doubt it that the Xeon
versions of those CPUs produce much more heat. Factor in the fact that they
can go all SSD and you can have a very small package that can act as a RE
(think of a very mini mini version of Intel NUC).

There has to be another reason for the crappy REs in these routers.
___
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp


Re: [j-nsp] MX104 with full BGP table problems

2014-05-17 Thread Mark Tinka
On Friday, May 16, 2014 09:20:50 PM Saku Ytti wrote:

> bill-of-material, pincount, thermal might be argument,
> they are SOC so you get everything nicely in single
> package.

I know Juniper are not thinking of anything more powerful in 
the MX104 due to thermal budget, but alluded to the fact 
that this could change as more powerful CPU's require lower 
thermal budgets, due to the fact that the RE's on the MX104 
are modular.

Mark.


signature.asc
Description: This is a digitally signed message part.
___
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp

Re: [j-nsp] MX104 with full BGP table problems

2014-05-16 Thread Brad Fleming

On May 16, 2014, at 1:58 PM, Saku Ytti  wrote:

> On (2014-05-16 13:00 -0500), Brad Fleming wrote:
> 
>> We’ve been working with a handful of MX104s on the bench in preparation of 
>> putting them into a live network. We started pushing a full BGP table into 
>> the device and stumbled across some CPU utilization problems.
> 
> What JunOS if 13.3R2 you're probably seeing /var/db/alarm.db recreated every
> 20, it's 80MB zero filled file.
> The file is synchronized over em0 to RE1, causing em congestion and various
> other problems, such as loss of connectivity to backup RE1, loss of fan, etc.
> And indeed high CPU.
> 
> Your em0 traffic should be like <100pps, so the issue is quite obvious if you
> graph em0.
> 
> I'm using this as workaround:
> set system processes alarm-management disable
> 
> I dpn't know if it's safe, and JTAC has not yet been able to tell confirm if I
> can continue using it. Regardless I'm running it in production in some 12
> boxes.
> If it's same issue, you might want to refer JTAC to 2014-0430-0067.
> 
> I did explain all this when opening the case, asked this week if there is any
> progress, and they now told they've found that it's caused by rcp process
> causing high I/O load.
> I attempted to explain that I don't believe that is the culprit, that is just
> symptom of synchronizing the changed file from RE0 to RE1, and perhaps they
> should focus on figuring out why the file keeps being recreated.
> 
Thanks for the response!

We’ve been testing this with Junos 13.3R1.8 and 13.3R2.7. I’ll see if your 
workaround mitigates or alleviates our problem. Thanks for sharing your 
experience!


___
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp


Re: [j-nsp] MX104 with full BGP table problems

2014-05-16 Thread Brad Fleming
Thanks for the response; answers inline...


On May 16, 2014, at 1:58 PM, Tyler Christiansen  
wrote:

> I don't have experience with the MX104s but do with the rest of the line 
> (MX80 to MX2010 [excluding MX104, of course]).  MX80 isn't dual RE, but the 
> CPUs are the same family between MX80 and MX104 IIRC--the MX104 is just 500 
> or 600 Mhz faster.  And the MX80 kind of chokes when receiving a full feed 
> (even just one at a time can easily send it up to ~40% during the initial 
> feed consumption).  ;)
> 
> The MX80 and MX104 being sold as edge BGP routers is pretty much only because 
> it has enough memory to do it...not because it's a good idea.
> 
> It's pretty odd for the backup RE to have CPU utilization (based on 
> experience with the other dual RE MX devices).  Some, yes, but not 100% 
> utilization as you show there.  I would buy 100% utilization during initial 
> feed consumption on the master.  After you have some stability in the 
> network, though, the CPU should be back down to ~5-15% (depending on what you 
> have going on).

I agree; we’ve run a few M10is and never had this issue, but.. totally 
different platform, and much older version of Junos made me generally discount 
it. These are the first multi-RE boxes we’ve had running any Junos newer then 
10.0. Thanks for pointing it out, it’s something I missed in my previous email. 
As the previous output shows, 15min load averages for each RE are ~1.20 so the 
load remains elevated. I just confirmed that the 15min load average after about 
2hours of “sitting” remains ~1.22.
> 
> How aggressive are your BGP timers?  You may want to consider BFD instead of 
> BGP timers for aggressive keepalives.

BGP timers are default; however, we’ve tried relaxing them with no change in 
behavior.
> 
> Are you doing just plain IPv4 BGP, or are you utilizing MBGP extensions?  
> MBGP extensions can inflate the size of the BGP tables and make the router do 
> more work.

We’ve tried both with no difference in performance. The example outputs in my 
original message were with MBGP extensions enabled but doing only IPv4 unicast 
on the session produces the same result.
> 
> In all scenarios, you really should probably have loopback IPs in the IGP and 
> have the nexthop set to the loopback IPs for iBGP sessions.  I'm not sure why 
> you have /30 P2P links as the next-hops as they're potentially unstable (even 
> if they're not now, they can easily become unstable once in production).  I 
> assume that since you mentioned you know it's not recommended, you're going 
> to be changing that.

This is a bit of a legacy issue within our network. We’ve operated for nearly 
12 years using the actual PtP in our IGP and retaining it in BGP 
advertisements. It is something we plan to resolve with the deployment of this 
gear (as well as several new MX960s that were part of the same PO).
> 
> In scenario #2, how many RRs does the MX104 peer with?  And are they sending 
> full routes or full routes + more?

The box was only peering with a single RR. The RR was only sending the 
standard, full table (~496K routes), no VPN, no mcast, etc.
> 
> Finally, in scenario #3, if you're trying to do a full mesh with 11 other 
> peers, the MX104 will choke if they're all trying to load full tables.  There 
> are about 500,000 routes in the global table, so you're trying to load 
> 5,500,000 routes into a box with a 1.8Ghz CPU and 4GB RAM.

In scenario #3 the total number of routes entering the RE was ~867K with ~496K 
active.
> 
> Regardless, I would think that the MX104 should be perfectly capable of 
> scaling to at least five or six full feeds.  I would suspect either a bug in 
> the software or very aggressive timers.
> 
> On Fri, May 16, 2014 at 11:00 AM, Brad Fleming  wrote:
> We’ve been working with a handful of MX104s on the bench in preparation of 
> putting them into a live network. We started pushing a full BGP table into 
> the device and stumbled across some CPU utilization problems.
> 
> We tried pushing a full table into the box three different ways:
> 1) via an eBGP session
> 2) via a reflected session on an iBGP session
> 3) via a full mesh of iBGP sessions (11 other routers)
> 
> In situation #1: RE CPU was slightly elevated but remained ~60% idle and 1min 
> load averages were around 0.3.
> 
> In situation #2: RE CPU is highly elevated. We maintain actual p-t-p /30s for 
> our next-hops (I know, not best practice for many networks) which results in 
> a total of about 50-65 next-hops network-wide.
> 
> In situation #3: RE CPU is saturated at all times. In this case we configured 
> the mesh sessions to advertise routes with “next-hop-self” so the number of 
> next-hops is reduced to 11 total.
> 
> It appears that RPD Is the process actually killing the CPU; nearly always 
> running 75+% and in a “RUN” state. If we enable task accounting it shows 
> “Resolve Tree 2” as the task consuming tons of CPU time. (see below) There’s 
> plenty of RAM remaining, we’re 

Re: [j-nsp] MX104 with full BGP table problems

2014-05-16 Thread Saku Ytti
On (2014-05-16 22:09 +0300), Eugeniu Patrascu wrote:

> As a side question: what is Juniper's benefit on running a custom processor
> for control plane management instead of x86 CPUs like they do with other
> routing engines?

bill-of-material, pincount, thermal might be argument, they are SOC so you get
everything nicely in single package.

Freescale PQ3 is very very common CPU, every vendor users them in many boxes.
They are more common than X86 in embedded stuff.

-- 
  ++ytti
___
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp


Re: [j-nsp] MX104 with full BGP table problems

2014-05-16 Thread Eugeniu Patrascu
On Fri, May 16, 2014 at 10:04 PM, Saku Ytti  wrote:

> On (2014-05-16 11:58 -0700), Tyler Christiansen wrote:
>
> > I don't have experience with the MX104s but do with the rest of the line
> > (MX80 to MX2010 [excluding MX104, of course]).  MX80 isn't dual RE, but
> the
> > CPUs are the same family between MX80 and MX104 IIRC--the MX104 is just
> 500
> > or 600 Mhz faster.  And the MX80 kind of chokes when receiving a full
> feed
> > (even just one at a time can easily send it up to ~40% during the initial
> > feed consumption).  ;)
>
> All MX, T, M linecards use Freescale PQ3 family processors. MX80
> control-plane
> as well.
> Freescale is phasing out PQ3 and MX104 uses QorIQ in control-plane and in
> 'linecard'.
>
> Exact model for MX80 is 8572 and MX104 is P5021
>
>
As a side question: what is Juniper's benefit on running a custom processor
for control plane management instead of x86 CPUs like they do with other
routing engines?
___
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp


Re: [j-nsp] MX104 with full BGP table problems

2014-05-16 Thread Saku Ytti
On (2014-05-16 11:58 -0700), Tyler Christiansen wrote:

> I don't have experience with the MX104s but do with the rest of the line
> (MX80 to MX2010 [excluding MX104, of course]).  MX80 isn't dual RE, but the
> CPUs are the same family between MX80 and MX104 IIRC--the MX104 is just 500
> or 600 Mhz faster.  And the MX80 kind of chokes when receiving a full feed
> (even just one at a time can easily send it up to ~40% during the initial
> feed consumption).  ;)

All MX, T, M linecards use Freescale PQ3 family processors. MX80 control-plane
as well.
Freescale is phasing out PQ3 and MX104 uses QorIQ in control-plane and in
'linecard'.

Exact model for MX80 is 8572 and MX104 is P5021 

> The MX80 and MX104 being sold as edge BGP routers is pretty much only
> because it has enough memory to do it...not because it's a good idea.

MX104 doubles the DRAM of control-plane from MX80's 2GB to 4GB.

> Regardless, I would think that the MX104 should be perfectly capable of
> scaling to at least five or six full feeds.  I would suspect either a bug
> in the software or very aggressive timers.

Agreed. JunOS is very control-plane demanding architecture, it requires lot of
power, on same CPU IOS-XE will fly, which is architecturally quite comparable.

-- 
  ++ytti
___
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp


Re: [j-nsp] MX104 with full BGP table problems

2014-05-16 Thread Saku Ytti
On (2014-05-16 13:00 -0500), Brad Fleming wrote:

> We’ve been working with a handful of MX104s on the bench in preparation of 
> putting them into a live network. We started pushing a full BGP table into 
> the device and stumbled across some CPU utilization problems.

What JunOS if 13.3R2 you're probably seeing /var/db/alarm.db recreated every
20, it's 80MB zero filled file.
The file is synchronized over em0 to RE1, causing em congestion and various
other problems, such as loss of connectivity to backup RE1, loss of fan, etc.
And indeed high CPU.

Your em0 traffic should be like <100pps, so the issue is quite obvious if you
graph em0.

I'm using this as workaround:
set system processes alarm-management disable

I dpn't know if it's safe, and JTAC has not yet been able to tell confirm if I
can continue using it. Regardless I'm running it in production in some 12
boxes.
If it's same issue, you might want to refer JTAC to 2014-0430-0067.

I did explain all this when opening the case, asked this week if there is any
progress, and they now told they've found that it's caused by rcp process
causing high I/O load.
I attempted to explain that I don't believe that is the culprit, that is just
symptom of synchronizing the changed file from RE0 to RE1, and perhaps they
should focus on figuring out why the file keeps being recreated.

-- 
  ++ytti
___
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp

Re: [j-nsp] MX104 with full BGP table problems

2014-05-16 Thread Tyler Christiansen
I don't have experience with the MX104s but do with the rest of the line
(MX80 to MX2010 [excluding MX104, of course]).  MX80 isn't dual RE, but the
CPUs are the same family between MX80 and MX104 IIRC--the MX104 is just 500
or 600 Mhz faster.  And the MX80 kind of chokes when receiving a full feed
(even just one at a time can easily send it up to ~40% during the initial
feed consumption).  ;)

The MX80 and MX104 being sold as edge BGP routers is pretty much only
because it has enough memory to do it...not because it's a good idea.

It's pretty odd for the backup RE to have CPU utilization (based on
experience with the other dual RE MX devices).  Some, yes, but not 100%
utilization as you show there.  I would buy 100% utilization during initial
feed consumption on the master.  After you have some stability in the
network, though, the CPU should be back down to ~5-15% (depending on what
you have going on).

How aggressive are your BGP timers?  You may want to consider BFD instead
of BGP timers for aggressive keepalives.

Are you doing just plain IPv4 BGP, or are you utilizing MBGP extensions?
 MBGP extensions can inflate the size of the BGP tables and make the router
do more work.

In all scenarios, you really should probably have loopback IPs in the IGP
and have the nexthop set to the loopback IPs for iBGP sessions.  I'm not
sure why you have /30 P2P links as the next-hops as they're potentially
unstable (even if they're not now, they can easily become unstable once in
production).  I assume that since you mentioned you know it's not
recommended, you're going to be changing that.

In scenario #2, how many RRs does the MX104 peer with?  And are they
sending full routes or full routes + more?

Finally, in scenario #3, if you're trying to do a full mesh with 11 other
peers, the MX104 will choke if they're all trying to load full tables.
 There are about 500,000 routes in the global table, so you're trying to
load 5,500,000 routes into a box with a 1.8Ghz CPU and 4GB RAM.

Regardless, I would think that the MX104 should be perfectly capable of
scaling to at least five or six full feeds.  I would suspect either a bug
in the software or very aggressive timers.

On Fri, May 16, 2014 at 11:00 AM, Brad Fleming  wrote:

> We’ve been working with a handful of MX104s on the bench in preparation of
> putting them into a live network. We started pushing a full BGP table into
> the device and stumbled across some CPU utilization problems.
>
> We tried pushing a full table into the box three different ways:
> 1) via an eBGP session
> 2) via a reflected session on an iBGP session
> 3) via a full mesh of iBGP sessions (11 other routers)
>
> In situation #1: RE CPU was slightly elevated but remained ~60% idle and
> 1min load averages were around 0.3.
>
> In situation #2: RE CPU is highly elevated. We maintain actual p-t-p /30s
> for our next-hops (I know, not best practice for many networks) which
> results in a total of about 50-65 next-hops network-wide.
>
> In situation #3: RE CPU is saturated at all times. In this case we
> configured the mesh sessions to advertise routes with “next-hop-self” so
> the number of next-hops is reduced to 11 total.
>
> It appears that RPD Is the process actually killing the CPU; nearly always
> running 75+% and in a “RUN” state. If we enable task accounting it shows
> “Resolve Tree 2” as the task consuming tons of CPU time. (see below)
> There’s plenty of RAM remaining, we’re not using any swap space, and we’ve
> not exceed the number of routes licensed for the system; we paid for the
> full 1Million+ route scaling. Logs are full of lost communication with the
> backup RE; however, if we disable all the BGP sessions that issue goes away
> completely (for days on end).
>
> Has anyone else tried shoving a full BGP table into one of these routers
> yet? Have you noticed anything similar?
>
> I’ve opened a JTAC case for the issue but I’m wondering if anyone with
> more experience in multi-RE setups has seen similar. Thanks in advance for
> any thoughts, suggestions, or insights.
>
>
> Incoming command output dump….
>
> netadm@test-MX104> show chassis routing-engine
> Routing Engine status:
>   Slot 0:
> Current state  Master
> Election priority  Master (default)
> Temperature 39 degrees C / 102 degrees F
> CPU temperature 42 degrees C / 107 degrees F
> DRAM  3968 MB (4096 MB installed)
> Memory utilization  32 percent
> CPU utilization:
>   User  87 percent
>   Background 0 percent
>   Kernel11 percent
>   Interrupt  2 percent
>   Idle   0 percent
> Model  RE-MX-104
> Serial ID  CACH2444
> Start time 2009-12-31 18:05:43 CST
> Uptime 21 hours, 31 minutes, 32 seconds
> La

[j-nsp] MX104 with full BGP table problems

2014-05-16 Thread Brad Fleming
We’ve been working with a handful of MX104s on the bench in preparation of 
putting them into a live network. We started pushing a full BGP table into the 
device and stumbled across some CPU utilization problems.

We tried pushing a full table into the box three different ways:
1) via an eBGP session
2) via a reflected session on an iBGP session
3) via a full mesh of iBGP sessions (11 other routers)

In situation #1: RE CPU was slightly elevated but remained ~60% idle and 1min 
load averages were around 0.3.

In situation #2: RE CPU is highly elevated. We maintain actual p-t-p /30s for 
our next-hops (I know, not best practice for many networks) which results in a 
total of about 50-65 next-hops network-wide.

In situation #3: RE CPU is saturated at all times. In this case we configured 
the mesh sessions to advertise routes with “next-hop-self” so the number of 
next-hops is reduced to 11 total.

It appears that RPD Is the process actually killing the CPU; nearly always 
running 75+% and in a “RUN” state. If we enable task accounting it shows 
“Resolve Tree 2” as the task consuming tons of CPU time. (see below) There’s 
plenty of RAM remaining, we’re not using any swap space, and we’ve not exceed 
the number of routes licensed for the system; we paid for the full 1Million+ 
route scaling. Logs are full of lost communication with the backup RE; however, 
if we disable all the BGP sessions that issue goes away completely (for days on 
end).

Has anyone else tried shoving a full BGP table into one of these routers yet? 
Have you noticed anything similar?

I’ve opened a JTAC case for the issue but I’m wondering if anyone with more 
experience in multi-RE setups has seen similar. Thanks in advance for any 
thoughts, suggestions, or insights.


Incoming command output dump….

netadm@test-MX104> show chassis routing-engine
Routing Engine status:
  Slot 0:
Current state  Master
Election priority  Master (default)
Temperature 39 degrees C / 102 degrees F
CPU temperature 42 degrees C / 107 degrees F
DRAM  3968 MB (4096 MB installed)
Memory utilization  32 percent
CPU utilization:
  User  87 percent
  Background 0 percent
  Kernel11 percent
  Interrupt  2 percent
  Idle   0 percent
Model  RE-MX-104
Serial ID  CACH2444
Start time 2009-12-31 18:05:43 CST
Uptime 21 hours, 31 minutes, 32 seconds
Last reboot reason 0x200:normal shutdown
Load averages: 1 minute   5 minute  15 minute
   1.06   1.12   1.23
Routing Engine status:
  Slot 1:
Current state  Backup
Election priority  Backup (default)
Temperature 37 degrees C / 98 degrees F
CPU temperature 38 degrees C / 100 degrees F
DRAM  3968 MB (4096 MB installed)
Memory utilization  30 percent
CPU utilization:
  User  62 percent
  Background 0 percent
  Kernel15 percent
  Interrupt 24 percent
  Idle   0 percent
Model  RE-MX-104
Serial ID  CACD1529
Start time 2010-03-18 05:16:34 CDT
Uptime 21 hours, 45 minutes, 26 seconds
Last reboot reason 0x200:normal shutdown
Load averages: 1 minute   5 minute  15 minute
   1.22   1.19   1.20

netadm@test-MX104> show system processes extensive
last pid: 20303;  load averages:  1.18,  1.14,  1.22  up 0+21:33:3503:03:41
127 processes: 8 running, 99 sleeping, 20 waiting
Mem: 796M Active, 96M Inact, 308M Wired, 270M Cache, 112M Buf, 2399M Free
Swap: 1025M Total, 1025M Free
  PID USERNAME THR PRI NICE   SIZERES STATETIME   WCPU COMMAND
 3217 root   1 1320   485M   432M RUN120:56 72.85% rpd

netadm@test-MX104> show task accounting
Task accounting is enabled.

Task   StartedUser Time  System Time  Longest Run
Scheduler322940.9240.1480.000
Memory  260.0010.0000.000
RT58760.9470.1620.003
hakr 60.0000.0000.000
OSPF I/O./var/run/ppmd_co  1170.0020.0000.000
BGP rsync  1920.0070.0010.000
BGP_RT_Background   780.0010.0000.000
BGP_Listen.0.0.0.0+17926961.1010.218