On 2020-03-11 12:41, Anders Andersson wrote:
On Tue, Mar 10, 2020 at 10:53 PM Jordan Geoghegan <jor...@geoghegan.ca> wrote:
pf-badhost and unbound-adblock are both now at version 0.3, released
earlier today.

Links to the scripts can be found here:

www.geoghegan.ca/pfbadhost.html
www.geoghegan.ca/unbound-adblock.html
Thanks, this looks very interesting! But maybe you can help answering
a question that popped up when I read your page about pf-badhost.

You mention that "Subnet aggregation is used to take the address list
and "aggregate" the addresses into the smallest possible
representation using CIDR blocks.", but I was under the assumption
that pf already did this for its tables to speed up lookups.

Is there anything preventing the aggregation code to run on every pf
table modification? Assuming an already sorted list, it shouldn't take
long to merge a new entry. Perhaps I've missed some use of pf tables
that makes this impossible or not applicable in the general case.


Hi Anders,

I am by no means an expert on the nuts and bolts of pf, but I do know that pf stores table data in a radix tree / radix table. By their nature, radix trees ignore exact duplicates, but I'm not exactly sure how they handle the partial overlapping of ranges. This article gives an easy to follow cursory overview of raddix trees if you're interested:
https://blog.sqreen.com/demystifying-radix-trees/

As far as I understand, pf makes no modifications to the contents of your tables, all it does is parse the list to confirm the addresses and/or CIDR blocks are valid. When it's looking for matches within ranges, it will look for the most specific match available. For example, if you have a list containing an overlap:
...
192.168.0.0/16
192.168.1.0/22
...
When a packet from 192.168.1.5 arrives and is processed by a rule referencing this table, it will match with 192.168.1.0/22. Even though both entries are valid and match the packet, the /22 is more specific, and thus the one which matches closest.

pf may do some magic optimizations under the hood that I'm unaware of, but at the end of the day, it does not modify the actual contents of your table.

The use I've found in the subnet aggregation function has been mostly for the purpose of keeping the list clean and tidy. I have a few installations where I have all the lists enabled, including the use of the GeoIP country blacklisting function. On these installations, subnet aggregation can reduce the /etc/pf-badhost.txt file from ~60,000 lines down to ~40,000 lines. For example, when blocking China's netblocks (which pulls an aggregated list of all addresses assigned to China by APNIC, and thus uses massive CIDR blocks of /10's etc), if any addresses from any of the other blocklists come from China, they will be removed from the list as they are already covered by the CIDR block info from APNIC. I run pf-badhost on a bunch of Edgerouter Lites, and I've found them to run better when the lists are tidy.

With regards to pf performing aggregation on all tables automatically, it wouldn't make sense to run the full subnet aggregation calculations for every table load or insertion/removal, as it can be quite CPU intensive. It takes less than a second to load the table on a $5 Vultr VPS, it takes 20-70 seconds to run the subnet aggregation (depending on which lists are enabled). On my Edgerouter Pro with all the lists enabled, it takes ~6 minutes. On my Edgerouter Lite it takes ~15 minutes to run (over 2 hours when using the built in Perl-based aggregator). I just run the aggregation function with nice and let it do its thing, its being called by cron in the wee hours, so I'm fine just letting it chug along.

Regards,

Jordan

Reply via email to