Re: fuzzy subnet aggregation

2019-10-27 Thread Joe Maimon
Not quite.

203.0.113.1
203.0.113.3
203.0.113.5
203.0.112.6
203.0.112.7

Will aggregate to 203.0.113.0/29 if you dont mind the missing 3 addresses
in the unaggregated list.

Hence, fuzzy aggregation.

Joe

> Is this what you are trying to accomplish?
>
> $ python
> Python 2.7.15rc1 (default, Nov 12 2018, 14:31:15)
> [GCC 7.3.0] on linux2
> Type "help", "copyright", "credits" or "license" for more information.
 import netaddr
 SomeList=netaddr.IPSet()
 SomeList.add('203.0.113.0/25')
 SomeList.add('203.0.113.128/25')
 for x in list(SomeList.iter_cidrs()):
> ...   print x
> ...
> 203.0.113.0/24

 DifferentList=netaddr.IPSet()
 DifferentList.add('0.0.0.0/0')
 DifferentList.remove('203.0.113.1')
 for x in list(DifferentList.iter_cidrs()):
> ...   print x
> ...
> 0.0.0.0/1
> 128.0.0.0/2
> 192.0.0.0/5
> 200.0.0.0/7
> 202.0.0.0/8
> 203.0.0.0/18
> 203.0.64.0/19
> 203.0.96.0/20
> 203.0.112.0/24
> 203.0.113.0/32
> 203.0.113.2/31
> 203.0.113.4/30
> 203.0.113.8/29
> 203.0.113.16/28
> 203.0.113.32/27
> 203.0.113.64/26
> 203.0.113.128/25
> 203.0.114.0/23
> 203.0.116.0/22
> 203.0.120.0/21
> 203.0.128.0/17
> 203.1.0.0/16
> 203.2.0.0/15
> 203.4.0.0/14
> 203.8.0.0/13
> 203.16.0.0/12
> 203.32.0.0/11
> 203.64.0.0/10
> 203.128.0.0/9
> 204.0.0.0/6
> 208.0.0.0/4
> 224.0.0.0/3

>
> On Sun, Oct 27, 2019 at 1:10 PM Joe Maimon  wrote:
>
>> Does anyone have or seen any such tool? I have a script that seems to
>> work, but its terribly slow.
>>
>> Currently I can produce aggregated subnets that can be mising up to a
>> specified number of individual addresses. Which can be fed back in for
>> multiple passes.
>>
>> Doing RTBH on individual /32 does not scale well, if you are eyeing
>> collaboration with external lists. I have found likely sources that
>> could
>> produce another 100k prefixes easily.
>>
>> Joe
>>
>



Re: fuzzy subnet aggregation

2019-10-27 Thread Nick Morrison via NANOG


> On 27. Oct 2019, at 20:36, Joe Maimon  wrote:
> 
> Not quite.
> 
> 203.0.113.1
> 203.0.113.3
> 203.0.113.5
> 203.0.112.6
> 203.0.112.7
> 
> Will aggregate to 203.0.113.0/29 if you dont mind the missing 3 addresses
> in the unaggregated list.
> 
> Hence, fuzzy aggregation.

Could you describe the problem again? I’m interested, but I’m not sure that I 
quite understand what you want to do :-) were the last two addresses supposed 
to have 112 in the third octet?

Nick



Re: fuzzy subnet aggregation

2019-10-27 Thread Christopher Morrow
On Sun, Oct 27, 2019 at 3:09 PM Joe Maimon  wrote:
>
> Does anyone have or seen any such tool? I have a script that seems to
> work, but its terribly slow.
>
> Currently I can produce aggregated subnets that can be mising up to a
> specified number of individual addresses. Which can be fed back in for
> multiple passes.
>

your aim is to get to maximum aggregation .. with some overage, like
90% of a /24  ?
so missing like 25 addresses in a whole /24.. (for instance)

> Doing RTBH on individual /32 does not scale well, if you are eyeing
> collaboration with external lists. I have found likely sources that could
> produce another 100k prefixes easily.
>
> Joe


Re: fuzzy subnet aggregation

2019-10-27 Thread Mark Leonard
Is this what you are trying to accomplish?

$ python
Python 2.7.15rc1 (default, Nov 12 2018, 14:31:15)
[GCC 7.3.0] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import netaddr
>>> SomeList=netaddr.IPSet()
>>> SomeList.add('203.0.113.0/25')
>>> SomeList.add('203.0.113.128/25')
>>> for x in list(SomeList.iter_cidrs()):
...   print x
...
203.0.113.0/24
>>>
>>> DifferentList=netaddr.IPSet()
>>> DifferentList.add('0.0.0.0/0')
>>> DifferentList.remove('203.0.113.1')
>>> for x in list(DifferentList.iter_cidrs()):
...   print x
...
0.0.0.0/1
128.0.0.0/2
192.0.0.0/5
200.0.0.0/7
202.0.0.0/8
203.0.0.0/18
203.0.64.0/19
203.0.96.0/20
203.0.112.0/24
203.0.113.0/32
203.0.113.2/31
203.0.113.4/30
203.0.113.8/29
203.0.113.16/28
203.0.113.32/27
203.0.113.64/26
203.0.113.128/25
203.0.114.0/23
203.0.116.0/22
203.0.120.0/21
203.0.128.0/17
203.1.0.0/16
203.2.0.0/15
203.4.0.0/14
203.8.0.0/13
203.16.0.0/12
203.32.0.0/11
203.64.0.0/10
203.128.0.0/9
204.0.0.0/6
208.0.0.0/4
224.0.0.0/3
>>>

On Sun, Oct 27, 2019 at 1:10 PM Joe Maimon  wrote:

> Does anyone have or seen any such tool? I have a script that seems to
> work, but its terribly slow.
>
> Currently I can produce aggregated subnets that can be mising up to a
> specified number of individual addresses. Which can be fed back in for
> multiple passes.
>
> Doing RTBH on individual /32 does not scale well, if you are eyeing
> collaboration with external lists. I have found likely sources that could
> produce another 100k prefixes easily.
>
> Joe
>


Re: fuzzy subnet aggregation

2019-10-27 Thread Joe Maimon
> On Sun, Oct 27, 2019 at 3:09 PM Joe Maimon  wrote:

>>
>
> your aim is to get to maximum aggregation .. with some overage, like
> 90% of a /24  ?
> so missing like 25 addresses in a whole /24.. (for instance)

I would be happy to get /29's missing 3 /28's missing 5, etc...

This is not punitive, its about scale.

Joe


Re: fuzzy subnet aggregation

2019-10-27 Thread Grant Taylor via NANOG

On 10/27/19 4:27 PM, Joe Maimon wrote:

I would be happy to get /29's missing 3 /28's missing 5, etc...


Are you good with rounding up to the next larger network if you have 
~62% of the members?



This is not punitive, its about scale.


ACK



--
Grant. . . .
unix || die



smime.p7s
Description: S/MIME Cryptographic Signature


Re: fuzzy subnet aggregation

2019-10-27 Thread Masataka Ohta

Joe Maimon wrote:


Does anyone have or seen any such tool? I have a script that seems to
work, but its terribly slow.


It's a logic synthesis problem and should be NP hard.

Masataka  Ohta


Re: fuzzy subnet aggregation

2019-10-27 Thread Antonio Querubin
Are you trying to reduce the number of ACL rules that include a known set of 
addresses but also minimize covered addresses that are not part of the 
mandatory set?

Tony

> On Oct 27, 2019, at 12:29, Joe Maimon  wrote:
> 
> 
>> 
>> On Sun, Oct 27, 2019 at 3:09 PM Joe Maimon  wrote:
> 
>>> 
>> 
>> your aim is to get to maximum aggregation .. with some overage, like
>> 90% of a /24  ?
>> so missing like 25 addresses in a whole /24.. (for instance)
> 
> I would be happy to get /29's missing 3 /28's missing 5, etc...
> 
> This is not punitive, its about scale.
> 
> Joe
> 



Re: fuzzy subnet aggregation

2019-10-27 Thread Joe Maimon
So I went back to the drawing board, and I think I have something that 
seems to work much better.


- convert input prefixes to single ip expressed as integer
- sort -n | uniq
- into a temporary list file

begin

read sequentially until maxhosts (or minhosts) or next subnet

If matched enough single addresses, output subnet (and missing hosts 
without early loop termination)


delete all subnet addresses read

loop

Total process time on a vm on old hardware, less than 2m for a 5500 line 
input. Now to verify results, positive and negative


Results are still raw, but anyone who wishes is welcome to it.

Joe

Joe Maimon wrote:

Does anyone have or seen any such tool? I have a script that seems to
work, but its terribly slow.

Currently I can produce aggregated subnets that can be mising up to a
specified number of individual addresses. Which can be fed back in for
multiple passes.

Doing RTBH on individual /32 does not scale well, if you are eyeing
collaboration with external lists. I have found likely sources that could
produce another 100k prefixes easily.

Joe






Re: fuzzy subnet aggregation

2019-10-28 Thread Mark Leonard
You could modify a radix tree to include a consolidation function and
resulting confidence.  Then walk the nodes of the tree, check to see if the
subtree meets the requirements for consolidation, if so, prune and record
the confidence.  You would need to re-run the consolidation from the
original data every time an individual IP was added/removed from the list
as the consolidation function is lossy.

Alternatively, you could do consolidation on the fly losslessly if you had
a custom tree walk algorithm.  That's probably the way I would do it.  I'm
not a programmer so I assume there are better ways out there.

Your processing time for 5k IPs should be measured in seconds (ie: less
than one) rather than minutes on any modern core.  Based on your pseudocode
(sort -n | uniq) I get the impression that you're using BASH which isn't
ideal for performing this sort of operation at high speed.

On the flip side, I think an extra 100k routes isn't that much unless
you're suffering from hardware routing table limitations.  In my world the
cost of a false positive match would far outweigh the cost of upgrading
hardware.  YMMV.

Do you have a git repo?

On Sun, Oct 27, 2019 at 9:58 PM Joe Maimon  wrote:

> So I went back to the drawing board, and I think I have something that
> seems to work much better.
>
> - convert input prefixes to single ip expressed as integer
> - sort -n | uniq
> - into a temporary list file
>
> begin
>
> read sequentially until maxhosts (or minhosts) or next subnet
>
> If matched enough single addresses, output subnet (and missing hosts
> without early loop termination)
>
> delete all subnet addresses read
>
> loop
>
> Total process time on a vm on old hardware, less than 2m for a 5500 line
> input. Now to verify results, positive and negative
>
> Results are still raw, but anyone who wishes is welcome to it.
>
> Joe
>
> Joe Maimon wrote:
> > Does anyone have or seen any such tool? I have a script that seems to
> > work, but its terribly slow.
> >
> > Currently I can produce aggregated subnets that can be mising up to a
> > specified number of individual addresses. Which can be fed back in for
> > multiple passes.
> >
> > Doing RTBH on individual /32 does not scale well, if you are eyeing
> > collaboration with external lists. I have found likely sources that could
> > produce another 100k prefixes easily.
> >
> > Joe
> >
> >
>
>


Re: fuzzy subnet aggregation

2019-10-28 Thread Joe Maimon
At this time I am just trying to get an idea if the whole exercise is 
worth it. Whether the processing time is feasible for 5k, 50k, 100k, 
200k. Whether the results reduce the count measurably at acceptable 
collateral levels.


Because rtbh scaling to 100k is one thing. And from there it could go 
higher. And it works best if it can be spread to as many edge devices as 
possible, including ones with limited tcam, such as customer edge l3 
switches. And to customers/friends who have redundant connections with 
other providers, including broadband, possibly with tiny routers, even 
ddwrt.


50k routes here and there and soon you are talking about real table bloat.

I try to start every project as a script if at all possible. If that 
works even somewhat real code is promising.


Joe

Mark Leonard wrote:
You could modify a radix tree to include a consolidation function and 
resulting confidence.  Then walk the nodes of the tree, check to see 
if the subtree meets the requirements for consolidation, if so, prune 
and record the confidence.  You would need to re-run the consolidation 
from the original data every time an individual IP was added/removed 
from the list as the consolidation function is lossy.


Alternatively, you could do consolidation on the fly losslessly if you 
had a custom tree walk algorithm.  That's probably the way I would do 
it.  I'm not a programmer so I assume there are better ways out there.


Your processing time for 5k IPs should be measured in seconds (ie: 
less than one) rather than minutes on any modern core.  Based on your 
pseudocode (sort -n | uniq) I get the impression that you're using 
BASH which isn't ideal for performing this sort of operation at high 
speed.


On the flip side, I think an extra 100k routes isn't that much unless 
you're suffering from hardware routing table limitations.  In my world 
the cost of a false positive match would far outweigh the cost of 
upgrading hardware.  YMMV.


Do you have a git repo?

On Sun, Oct 27, 2019 at 9:58 PM Joe Maimon > wrote:


So I went back to the drawing board, and I think I have something
that
seems to work much better.

- convert input prefixes to single ip expressed as integer
- sort -n | uniq
- into a temporary list file

begin

read sequentially until maxhosts (or minhosts) or next subnet

If matched enough single addresses, output subnet (and missing hosts
without early loop termination)

delete all subnet addresses read

loop

Total process time on a vm on old hardware, less than 2m for a
5500 line
input. Now to verify results, positive and negative

Results are still raw, but anyone who wishes is welcome to it.

Joe

Joe Maimon wrote:
> Does anyone have or seen any such tool? I have a script that
seems to
> work, but its terribly slow.
>
> Currently I can produce aggregated subnets that can be mising up
to a
> specified number of individual addresses. Which can be fed back
in for
> multiple passes.
>
> Doing RTBH on individual /32 does not scale well, if you are eyeing
> collaboration with external lists. I have found likely sources
that could
> produce another 100k prefixes easily.
>
> Joe
>
>





RE: fuzzy subnet aggregation

2019-10-28 Thread Michel Py
> Mark Leonard wrote :
> Your processing time for 5k IPs should be measured in seconds (ie: less than 
> one) rather than minutes on any modern core.

I agree. I am surprised by the minutes thing as well.

>  Based on your pseudocode (sort -n | uniq) I get the impression that you're 
> using BASH which isn't ideal for performing this sort of operation at high 
> speed.

Even with bash it does not take minutes. I used a bash sort in the first 
version of the CBBC and it was quite fast.
Was not very elegant, but worked fine. Fetch all sources into the same file, 
then bash sort removing duplicates.
My current limitation now is the injection into ExaBGP and because I still 
display the prefixes in the console.
Kinda, its in debug mode.

> On the flip side, I think an extra 100k routes isn't that much unless you're 
> suffering from hardware routing table limitations.  In my world the cost of a 
> false positive match would far outweigh the cost of upgrading hardware.  YMMV.

The problem here is that there still are plenty of routers out there that have 
a 1 million route limit and that the growth of the routing table is predictable 
enough that the year to upgrade is 2023.
By adding 100K prefixes, you advance the upgrade year into 2021. Does not look 
good to request the capex 2 years earlier than previously predicted.

I am curious to see how many prefixes aggregation saves.

Michel.

TSI Disclaimer:  This message and any files or text attached to it are intended 
only for the recipients named above and contain information that may be 
confidential or privileged. If you are not the intended recipient, you must not 
forward, copy, use or otherwise disclose this communication or the information 
contained herein. In the event you have received this message in error, please 
notify the sender immediately by replying to this message, and then delete all 
copies of it from your system. Thank you!...


Re: fuzzy subnet aggregation

2019-10-28 Thread Joe Maimon

Good news, bad news.

With an inefficient bash script on an inefficient platform, 120k 
processes in less than 15minutes.


Thus far, the best I have is less than 10% reduction with barely 
acceptable aggressiveness.


The distribution is too varied or the level of aggressiveness has to be 
beyond any subtleties.


Putting this one back on the shelf.

Joe

Michel Py wrote:


I am curious to see how many prefixes aggregation saves.

Michel.





RE: fuzzy subnet aggregation

2019-10-30 Thread Jakob Heitz (jheitz) via NANOG
Another thing to consider is how long it takes to download into forwarding 
hardware.
Forwarding hardware is optimized for forwarding, not programming.
The programming has to wait for time slots when forwarding is not using the 
memory.

When you do smart aggregation, a single changed route could cause a massive
change in your aggregates. Then the resulting download could be both long
and cause glitches. Glitches, because you remove some aggregates while adding
others within a finite time. During this finite time, you may have incorrect
routing.

Regards,
Jakob.