Re: Let's launch our own blocklists...

Michael Tremer Tue, 06 Jan 2026 02:20:47 -0800

Good Morning Adolf,

I had a look at this problem yesterday and it seems that parsing the format is 
becoming a little bit difficult this way. Since this is only affecting very few 
domains, I have simply whitelisted them all manually and duckduckgo.com 
<http://duckduckgo.com/> and others should now be easily reachable again.


Please let me know if you have any more findings.

All the best,
-Michael

> On 5 Jan 2026, at 11:48, Michael Tremer <[email protected]> wrote:
> 
> Hello Adolf,
> 
> This is a good find.
> 
> But if duckduckgo.com <http://duckduckgo.com/> is blocked, we will have to 
> have a source somewhere that blocks that domain. Not only a sub-domain of it. 
> Otherwise we have a bug somewhere.
> 
> This is most likely as the domain is listed here, but with some stuff 
> afterwards:
> 
>  
> https://raw.githubusercontent.com/mtxadmin/ublock/refs/heads/master/hosts/_malware_typo
> 
> We strip everything after a # away because we consider it a comment. However, 
> that causes that there is only a line with the domain left which will cause 
> it being listed.
> 
> The # sign is used as some special character but at the same time it is being 
> used for comments.
> 
> I will fix this and then refresh the list.
> 
> -Michael
> 
>> On 5 Jan 2026, at 11:31, Adolf Belka <[email protected]> wrote:
>> 
>> Hi Michael,
>> 
>> 
>> On 05/01/2026 12:11, Adolf Belka wrote:
>>> Hi Michael,
>>> 
>>> I have found that the malware list includes duckduckgo.com
>>> 
>> I have checked through the various sources used for the malware list.
>> 
>> The ShadowWhisperer (Tracking) list has improving.duckduckgo.com in its 
>> list. I suspect that this one is the one causing the problem.
>> 
>> The mtxadmin (_malware_typo) list has duckduckgo.com mentioned 3 times but 
>> not directly as a domain name - looks more like a reference.
>> 
>> Regards,
>> 
>> Adolf.
>> 
>> 
>>> Regards,
>>> Adolf.
>>> 
>>> 
>>> On 02/01/2026 14:02, Adolf Belka wrote:
>>>> Hi,
>>>> 
>>>> On 02/01/2026 12:09, Michael Tremer wrote:
>>>>> Hello,
>>>>> 
>>>>>> On 30 Dec 2025, at 14:05, Adolf Belka <[email protected]> wrote:
>>>>>> 
>>>>>> Hi Michael,
>>>>>> 
>>>>>> On 29/12/2025 13:05, Michael Tremer wrote:
>>>>>>> Hello everyone,
>>>>>>> 
>>>>>>> I hope everyone had a great Christmas and a couple of quiet days to 
>>>>>>> relax from all the stress that was the year 2025.
>>>>>> Still relaxing.
>>>>> 
>>>>> Very good, so let’s have a strong start into 2026 now!
>>>> 
>>>> Starting next week, yes.
>>>> 
>>>>> 
>>>>>>> Having a couple of quieter days, I have been working on a new, little 
>>>>>>> (hopefully) side project that has probably been high up on our radar 
>>>>>>> since the Shalla list has shut down in 2020, or maybe even earlier. The 
>>>>>>> goal of the project is to provide good lists with categories of domain 
>>>>>>> names which are usually used to block access to these domains.
>>>>>>> 
>>>>>>> I simply call this IPFire DNSBL which is short for IPFire DNS 
>>>>>>> Blocklists.
>>>>>>> 
>>>>>>> How did we get here?
>>>>>>> 
>>>>>>> As stated before, the URL filter feature in IPFire has the problem that 
>>>>>>> there are not many good blocklists available any more. There used to be 
>>>>>>> a couple more - most famously the Shalla list - but we are now down to 
>>>>>>> a single list from the University of Toulouse. It is a great list, but 
>>>>>>> it is not always the best fit for all users.
>>>>>>> 
>>>>>>> Then there has been talk about whether we could implement more blocking 
>>>>>>> features into IPFire that don’t involve the proxy. Most famously 
>>>>>>> blocking over DNS. The problem here remains a the blocking feature is 
>>>>>>> only as good as the data that is fed into it. Some people have been 
>>>>>>> putting forward a number of lists that were suitable for them, but they 
>>>>>>> would not have replaced the blocking functionality as we know it. Their 
>>>>>>> aim is to provide “one list for everything” but that is not what people 
>>>>>>> usually want. It is targeted at a classic home user and the only 
>>>>>>> separation that is being made is any adult/porn/NSFW content which 
>>>>>>> usually is put into a separate list.
>>>>>>> 
>>>>>>> It would have been technically possible to include these lists and let 
>>>>>>> the users decide, but that is not the aim of IPFire. We want to do the 
>>>>>>> job for the user so that their job is getting easier. Including obscure 
>>>>>>> lists that don’t have a clear outline of what they actually want to 
>>>>>>> block (“bad content” is not a category) and passing the burden of 
>>>>>>> figuring out whether they need the “Light”, “Normal”, “Pro”, “Pro++”, 
>>>>>>> “Ultimate” or even a “Venti” list with cream on top is really not going 
>>>>>>> to work. It is all confusing and will lead to a bad user experience.
>>>>>>> 
>>>>>>> An even bigger problem that is however completely impossible to solve 
>>>>>>> is bad licensing of these lists. A user has asked the publisher of the 
>>>>>>> HaGeZi list whether they could be included in IPFire and under what 
>>>>>>> terms. The response was that the list is available under the terms of 
>>>>>>> the GNU General Public License v3, but that does not seem to be true. 
>>>>>>> The list contains data from various sources. Many of them are licensed 
>>>>>>> under incompatible licenses (CC BY-SA 4.0, MPL, Apache2, …) and unless 
>>>>>>> there is a non-public agreement that this data may be redistributed, 
>>>>>>> there is a huge legal issue here. We would expose our users to 
>>>>>>> potential copyright infringement which we cannot do under any 
>>>>>>> circumstances. Furthermore many lists are available under a 
>>>>>>> non-commercial license which excludes them from being used in any kind 
>>>>>>> of business. Plenty of IPFire systems are running in businesses, if not 
>>>>>>> even the vast majority.
>>>>>>> 
>>>>>>> In short, these lists are completely unusable for us. Apart from 
>>>>>>> HaGeZi, I consider OISD to have the same problem.
>>>>>>> 
>>>>>>> Enough about all the things that are bad. Let’s talk about the new, 
>>>>>>> good things:
>>>>>>> 
>>>>>>> Many blacklists on the internet are an amalgamation of other lists. 
>>>>>>> These lists vary in quality with some of them being not that good and 
>>>>>>> without a clear focus and others being excellent data. Since we don’t 
>>>>>>> have the man power to start from scratch, I felt that we can copy the 
>>>>>>> concept that HaGeZi and OISD have started and simply create a new list 
>>>>>>> that is based on other lists at the beginning to have a good starting 
>>>>>>> point. That way, we have much better control over what is going on 
>>>>>>> these lists and we can shape and mould them as we need them. Most 
>>>>>>> importantly, we don’t create a single lists, but many lists that have a 
>>>>>>> clear focus and allow users to choose what they want to block and what 
>>>>>>> not.
>>>>>>> 
>>>>>>> So the current experimental stage that I am in has these lists:
>>>>>>> 
>>>>>>>   * Ads
>>>>>>>   * Dating
>>>>>>>   * DoH
>>>>>>>   * Gambling
>>>>>>>   * Malware
>>>>>>>   * Porn
>>>>>>>   * Social
>>>>>>>   * Violence
>>>>>>> 
>>>>>>> The categories have been determined by what source lists we have 
>>>>>>> available with good data and are compatible with our chosen license CC 
>>>>>>> BY-SA 4.0. This is the same license that we are using for the IPFire 
>>>>>>> Location database, too.
>>>>>>> 
>>>>>>> The main use-cases for any kind of blocking are to comply with legal 
>>>>>>> requirements in networks with children (i.e. schools) to remove any 
>>>>>>> kind of pornographic content, sometimes block social media as well. 
>>>>>>> Gambling and violence are commonly blocked, too. Even more common would 
>>>>>>> be filtering advertising and any malicious content.
>>>>>>> 
>>>>>>> The latter is especially difficult because so many source lists throw 
>>>>>>> phishing, spyware, malvertising, tracking and other things into the 
>>>>>>> same bucket. Here this is currently all in the malware list which has 
>>>>>>> therefore become quite large. I am not sure whether this will stay like 
>>>>>>> this in the future or if we will have to make some adjustments, but 
>>>>>>> that is exactly why this is now entering some larger testing.
>>>>>>> 
>>>>>>> What has been built so far? In order to put these lists together 
>>>>>>> properly, track any data about where it is coming from, I have built a 
>>>>>>> tool in Python available here:
>>>>>>> 
>>>>>>>   https://git.ipfire.org/?p=dnsbl.git;a=summary
>>>>>>> 
>>>>>>> This tool will automatically update all lists once an hour if there 
>>>>>>> have been any changes and export them in various formats. The exported 
>>>>>>> lists are available for download here:
>>>>>>> 
>>>>>>>   https://dnsbl.ipfire.org/lists/
>>>>>> The download using dnsbl.ipfire.org/lists/squidguard.tar.gz as the 
>>>>>> custom url works fine.
>>>>>> 
>>>>>> However you need to remember not to put the https:// at the front of the 
>>>>>> url otherwise the WUI page completes without any error messages but 
>>>>>> leaves an error message in the system logs saying
>>>>>> 
>>>>>> URL filter blacklist - ERROR: Not a valid URL filter blacklist
>>>>>> 
>>>>>> I found this out the hard way.
>>>>> 
>>>>> Oh yes, I forgot that there is a field on the web UI. If that does not 
>>>>> accept https:// as a prefix, please file a bug and we will fix it.
>>>> 
>>>> I will confirm it and raise a bug.
>>>> 
>>>>> 
>>>>>> The other thing I noticed is that if you already have the Toulouse 
>>>>>> University list downloaded and you then change to the ipfire custom url 
>>>>>> then all the existing Toulouse blocklists stay in the directory on 
>>>>>> IPFire and so you end up with a huge number of category tick boxes, most 
>>>>>> of which are the old Toulouse ones, which are still available to select 
>>>>>> and it is not clear which ones are from Toulouse and which ones from 
>>>>>> IPFire.
>>>>> 
>>>>> Yes, I got the same thing, too. I think this is a bug, too, because 
>>>>> otherwise you would have a lot of unused categories lying around that 
>>>>> will never be updated. You cannot even tell which ones are from the 
>>>>> current list and which ones from the old list.
>>>>> 
>>>>> Long-term we could even consider to remove the Univ. Toulouse list 
>>>>> entirely and only have our own lists available which would make the 
>>>>> problem go away.
>>>>> 
>>>>>> I think if the blocklist URL source is changed or a custom url is 
>>>>>> provided the first step should be to remove the old ones already 
>>>>>> existing.
>>>>>> That might be a problem because users can also create their own 
>>>>>> blocklists and I believe those go into the same directory.
>>>>> 
>>>>> Good thought. We of course cannot delete the custom lists.
>>>>> 
>>>>>> Without clearing out the old blocklists you end up with a huge number of 
>>>>>> checkboxes for lists but it is not clear what happens if there is a 
>>>>>> category that has the same name for the Toulouse list and the IPFire 
>>>>>> list such as gambling. I will have a look at that and see what happens.
>>>>>> 
>>>>>> Not sure what the best approach to this is.
>>>>> 
>>>>> I believe it is removing all old content.
>>>>> 
>>>>>> Manually deleting all contents of the urlfilter/blacklists/ directory 
>>>>>> and then selecting the IPFire blocklist url for the custom url I end up 
>>>>>> with only the 8 categories from the IPFire list.
>>>>>> 
>>>>>> I have tested some gambling sites from the IPFire list and the block 
>>>>>> worked on some. On others the site no longer exists so there is nothing 
>>>>>> to block or has been changed to an https site and in that case it went 
>>>>>> straight through. Also if I chose the http version of the link, it was 
>>>>>> automatically changed to https and went through without being blocked.
>>>>> 
>>>>> The entire IPFire infrastructure always requires HTTPS. If you start 
>>>>> using HTTP, you will be automatically redirected. It is 2026 and we don’t 
>>>>> need to talk HTTP any more :)
>>>> 
>>>> Some of the domains in the gambling list (maybe quite a lot) seem to only 
>>>> have an http access. If I tried https it came back with the fact that it 
>>>> couldn't find it.
>>>> 
>>>>> 
>>>>> I am glad to hear that the list is actually blocking. It would have been 
>>>>> bad if it didn’t. Now we have the big task to check out the “quality” - 
>>>>> however that can be determined. I think this is what needs some time…
>>>>> 
>>>>> In the meantime I have set up a small page on our website:
>>>>> 
>>>>>   https://www.ipfire.org/dnsbl
>>>>> 
>>>>> I would like to run this as a first-class project inside IPFire like we 
>>>>> are doing with IPFire Location. That means that we need to tell people 
>>>>> about what we are doing. Hopefully this page is a little start.
>>>>> 
>>>>> Initially it has a couple of high-level bullet points about what we are 
>>>>> trying to achieve. I don’t think the text is very good, yet, but it is 
>>>>> the best I had in that moment. There is then also a list of the lists 
>>>>> that we currently offer. For each list, a detailed page will tell you 
>>>>> about the license, how many domains are listed, when the last update has 
>>>>> been, the sources and even there is a history page that shows all the 
>>>>> changes whenever they have happened.
>>>>> 
>>>>> Finally there is a section that explains “How To Use?” the list which I 
>>>>> would love to extend to include AdGuard Plus and things like that as well 
>>>>> as Pi-Hole and whatever else could use the list. In a later step we 
>>>>> should go ahead and talk to any projects to include our list(s) into 
>>>>> their dropdown so that people can enable them nice and easy.
>>>>> 
>>>>> Behind the web page there is an API service that is running on the host 
>>>>> that is running the DNSBL. The frontend web app that is running 
>>>>> www.ipfire.org <http://www.ipfire.org/> is connecting to that API service 
>>>>> to fetch the current lists, any details and so on. That way, we can split 
>>>>> the logic and avoid creating a huge monolith of a web app. This also 
>>>>> means that page could be down a little as I am still working on the 
>>>>> entire thing and will frequently restart it.
>>>>> 
>>>>> The API documentation is available here and the API is publicly 
>>>>> available: https://api.dnsbl.ipfire.org/docs
>>>>> 
>>>>> The website/API allows to file reports for anything that does not seem to 
>>>>> be right on any of the lists. I would like to keep it as an open process, 
>>>>> however, long-term, this cannot cost us any time. In the current stage, 
>>>>> the reports are getting filed and that is about it. I still need to build 
>>>>> out some way for admins or moderators (I am not sure what kind of roles I 
>>>>> want to have here) to accept or reject those reports.
>>>>> 
>>>>> In case of us receiving a domain from a source list, I would rather like 
>>>>> to submit a report to upstream for them to de-list. That way, we don’t 
>>>>> have any admin to do and we are contributing back to other list. That 
>>>>> would be a very good thing to do. We cannot however throw tons of emails 
>>>>> at some random upstream projects without co-ordinating this first. By not 
>>>>> reporting upstream, we will probably over time create large whitelists 
>>>>> and I am not sure if that is a good thing to do.
>>>>> 
>>>>> Finally, there is a search box that can be used to find out if a domain 
>>>>> is listed on any of the lists.
>>>>> 
>>>>>>> If you download and open any of the files, you will see a large header 
>>>>>>> that includes copyright information and lists all sources that have 
>>>>>>> been used to create the individual lists. This way we ensure maximum 
>>>>>>> transparency, comply with the terms of the individual licenses of the 
>>>>>>> source lists and give credit to the people who help us to put together 
>>>>>>> the most perfect list for our users.
>>>>>>> 
>>>>>>> I would like this to become a project that is not only being used in 
>>>>>>> IPFire. We can and will be compatible with other solutions like 
>>>>>>> AdGuard, PiHole so that people can use our lists if they would like to 
>>>>>>> even though they are not using IPFire. Hopefully, these users will also 
>>>>>>> feed back to us so that we can improve our lists over time and make 
>>>>>>> them one of the best options out there.
>>>>>>> 
>>>>>>> All lists are available as a simple text file that lists the domains. 
>>>>>>> Then there is a hosts file available as well as a DNS zone file and an 
>>>>>>> RPZ file. Each list is individually available to be used in squidGuard 
>>>>>>> and there is a larger tarball available with all lists that can be used 
>>>>>>> in IPFire’s URL Filter. I am planning to add Suricata/Snort signatures 
>>>>>>> whenever I have time to do so. Even though it is not a good idea to 
>>>>>>> filter pornographic content this way, I suppose that catching malware 
>>>>>>> and blocking DoH are good use-cases for an IPS. Time will tell…
>>>>>>> 
>>>>>>> As a start, we will make these lists available in IPFire’s URL Filter 
>>>>>>> and collect some feedback about how we are doing. Afterwards, we can 
>>>>>>> see where else we can take this project.
>>>>>>> 
>>>>>>> If you want to enable this on your system, simply add the URL to your 
>>>>>>> autoupdate.urls file like here:
>>>>>>> 
>>>>>>> https://git.ipfire.org/?p=people/ms/ipfire-2.x.git;a=commitdiff;h=bf675bb937faa7617474b3cc84435af3b1f7f45f
>>>>>> I also tested out adding the IPFire url to autoupdate.urls and that also 
>>>>>> worked fine for me.
>>>>> 
>>>>> Very good. Should we include this already with Core Update 200? I don’t 
>>>>> think we would break anything, but we might already gain a couple more 
>>>>> people who are helping us to test this all?
>>>> 
>>>> I think that would be a good idea.
>>>> 
>>>>> 
>>>>> The next step would be to build and test our DNS infrastructure. In the 
>>>>> “How To Use?” Section on the pages of the individual lists, you can 
>>>>> already see some instructions on how to use the lists as an RPZ. In 
>>>>> comparison to other “providers”, I would prefer if people would be using 
>>>>> DNS to fetch the lists. This is simply to push out updates in a cheap way 
>>>>> for us and also do it very regularly.
>>>>> 
>>>>> Initially, clients will pull the entire list using AXFR. There is no way 
>>>>> around this as they need to have the data in the first place. After that, 
>>>>> clients will only need the changes. As you can see in the history, the 
>>>>> lists don’t actually change that often. Sometimes only once a day and 
>>>>> therefore downloading the entire list again would be a huge waste of 
>>>>> data, both on the client side, but also for us hosting then.
>>>>> 
>>>>> Some other providers update their lists “every 10 minutes”, and there 
>>>>> won't be any changes whatsoever. We don’t do that. We will only export 
>>>>> the lists again when they have actually changed. The timestamps on the 
>>>>> files that we offer using HTTPS can be checked by clients so that they 
>>>>> won’t re-download the list again if it has not been changed. But using 
>>>>> HTTPS still means that we would have to re-download the entire list and 
>>>>> not only the changes.
>>>>> 
>>>>> Using DNS and IXFR will update the lists by only transferring a few 
>>>>> kilobytes and therefore we can have clients check once an hour if a list 
>>>>> has actually changed and only send out the raw changes. That way, we will 
>>>>> be able to serve millions of clients at very cheap cost and they will 
>>>>> always have a very up to date list.
>>>>> 
>>>>> As far as I can see any DNS software that supports RPZs supports 
>>>>> AXFR/IXFR with exception of Knot Resolver which expects the zone to be 
>>>>> downloaded externally. There is a ticket for AXFR/IXFR support 
>>>>> (https://gitlab.nic.cz/knot/knot-resolver/-/issues/195).
>>>>> 
>>>>> Initially, some of the lists have been *huge* which is why a simple HTTP 
>>>>> download is not feasible. The porn list was over 100 MiB. We could have 
>>>>> spent thousands on just traffic alone which I don’t have for this kind of 
>>>>> project. It would also be unnecessary money being spent. There are simply 
>>>>> better solutions out there. But then I built something that basically 
>>>>> tests the data that we are receiving from upstream but simply checking if 
>>>>> a listed domain still exists. The result was very astonishing to me.
>>>>> 
>>>>> So whenever someone adds a domain to the list, we will (eventually, but 
>>>>> not immediately) check if we can resolve the domain’s SOA record. If not, 
>>>>> we mark the domain as non-active and will no longer include them in the 
>>>>> exported data. This brought down the porn list from just under 5 million 
>>>>> domains to just 421k. On the sources page 
>>>>> (https://www.ipfire.org/dnsbl/lists/porn/sources) I am listing the 
>>>>> percentage of dead domains from each of them and the UT1 list has 94% 
>>>>> dead domains. Wow.
>>>>> 
>>>>> If we cannot resolve the domain, neither can our users. So we would 
>>>>> otherwise fill the lists with tons of domains that simply could never be 
>>>>> reached. And if they cannot be reached, why would we block them? We would 
>>>>> waste bandwidth and a lot of memory on each single client.
>>>>> 
>>>>> The other sources have similarly high rations of dead domains. Most of 
>>>>> them are in the 50-80% range. Therefore I am happy that we are doing some 
>>>>> extra work here to give our users much better data for their filtering.
>>>> 
>>>> Removing all dead entries sounds like an excellent step.
>>>> 
>>>> Regards,
>>>> 
>>>> Adolf.
>>>> 
>>>>> 
>>>>> So, if you like, please go and check out the RPZ blocking with Unbound. 
>>>>> Instructions are on the page. I would be happy to hear how this is 
>>>>> turning out.
>>>>> 
>>>>> Please let me know if there are any more questions, and I would be glad 
>>>>> to answer them.
>>>>> 
>>>>> Happy New Year,
>>>>> -Michael
>>>>> 
>>>>>> 
>>>>>> Regards,
>>>>>> Adolf.
>>>>>>> This email is just a brain dump from me to this list. I would be happy 
>>>>>>> to answer any questions about implementation details, etc. if people 
>>>>>>> are interested. Right now, this email is long enough already…
>>>>>>> 
>>>>>>> All the best,
>>>>>>> -Michael
>>>>>> 
>>>>>> -- 
>>>>>> Sent from my laptop
>>>>> 
>>>>> 
>>>>> 
>>>> 
>>> 
>> 
>> -- 
>> Sent from my laptop
>> 
>> 
>

Re: Let's launch our own blocklists...

Reply via email to