Re: Let's launch our own blocklists...

Michael Tremer Mon, 05 Jan 2026 03:48:23 -0800

Hello Adolf,

This is a good find.


But if duckduckgo.com <http://duckduckgo.com/> is blocked, we will have to have 
a source somewhere that blocks that domain. Not only a sub-domain of it. 
Otherwise we have a bug somewhere.

This is most likely as the domain is listed here, but with some stuff 
afterwards:

  
https://raw.githubusercontent.com/mtxadmin/ublock/refs/heads/master/hosts/_malware_typo

We strip everything after a # away because we consider it a comment. However, 
that causes that there is only a line with the domain left which will cause it 
being listed.

The # sign is used as some special character but at the same time it is being 
used for comments.

I will fix this and then refresh the list.

-Michael

> On 5 Jan 2026, at 11:31, Adolf Belka <[email protected]> wrote:
> 
> Hi Michael,
> 
> 
> On 05/01/2026 12:11, Adolf Belka wrote:
>> Hi Michael,
>> 
>> I have found that the malware list includes duckduckgo.com
>> 
> I have checked through the various sources used for the malware list.
> 
> The ShadowWhisperer (Tracking) list has improving.duckduckgo.com in its list. 
> I suspect that this one is the one causing the problem.
> 
> The mtxadmin (_malware_typo) list has duckduckgo.com mentioned 3 times but 
> not directly as a domain name - looks more like a reference.
> 
> Regards,
> 
> Adolf.
> 
> 
>> Regards,
>> Adolf.
>> 
>> 
>> On 02/01/2026 14:02, Adolf Belka wrote:
>>> Hi,
>>> 
>>> On 02/01/2026 12:09, Michael Tremer wrote:
>>>> Hello,
>>>> 
>>>>> On 30 Dec 2025, at 14:05, Adolf Belka <[email protected]> wrote:
>>>>> 
>>>>> Hi Michael,
>>>>> 
>>>>> On 29/12/2025 13:05, Michael Tremer wrote:
>>>>>> Hello everyone,
>>>>>> 
>>>>>> I hope everyone had a great Christmas and a couple of quiet days to 
>>>>>> relax from all the stress that was the year 2025.
>>>>> Still relaxing.
>>>> 
>>>> Very good, so let’s have a strong start into 2026 now!
>>> 
>>> Starting next week, yes.
>>> 
>>>> 
>>>>>> Having a couple of quieter days, I have been working on a new, little 
>>>>>> (hopefully) side project that has probably been high up on our radar 
>>>>>> since the Shalla list has shut down in 2020, or maybe even earlier. The 
>>>>>> goal of the project is to provide good lists with categories of domain 
>>>>>> names which are usually used to block access to these domains.
>>>>>> 
>>>>>> I simply call this IPFire DNSBL which is short for IPFire DNS Blocklists.
>>>>>> 
>>>>>> How did we get here?
>>>>>> 
>>>>>> As stated before, the URL filter feature in IPFire has the problem that 
>>>>>> there are not many good blocklists available any more. There used to be 
>>>>>> a couple more - most famously the Shalla list - but we are now down to a 
>>>>>> single list from the University of Toulouse. It is a great list, but it 
>>>>>> is not always the best fit for all users.
>>>>>> 
>>>>>> Then there has been talk about whether we could implement more blocking 
>>>>>> features into IPFire that don’t involve the proxy. Most famously 
>>>>>> blocking over DNS. The problem here remains a the blocking feature is 
>>>>>> only as good as the data that is fed into it. Some people have been 
>>>>>> putting forward a number of lists that were suitable for them, but they 
>>>>>> would not have replaced the blocking functionality as we know it. Their 
>>>>>> aim is to provide “one list for everything” but that is not what people 
>>>>>> usually want. It is targeted at a classic home user and the only 
>>>>>> separation that is being made is any adult/porn/NSFW content which 
>>>>>> usually is put into a separate list.
>>>>>> 
>>>>>> It would have been technically possible to include these lists and let 
>>>>>> the users decide, but that is not the aim of IPFire. We want to do the 
>>>>>> job for the user so that their job is getting easier. Including obscure 
>>>>>> lists that don’t have a clear outline of what they actually want to 
>>>>>> block (“bad content” is not a category) and passing the burden of 
>>>>>> figuring out whether they need the “Light”, “Normal”, “Pro”, “Pro++”, 
>>>>>> “Ultimate” or even a “Venti” list with cream on top is really not going 
>>>>>> to work. It is all confusing and will lead to a bad user experience.
>>>>>> 
>>>>>> An even bigger problem that is however completely impossible to solve is 
>>>>>> bad licensing of these lists. A user has asked the publisher of the 
>>>>>> HaGeZi list whether they could be included in IPFire and under what 
>>>>>> terms. The response was that the list is available under the terms of 
>>>>>> the GNU General Public License v3, but that does not seem to be true. 
>>>>>> The list contains data from various sources. Many of them are licensed 
>>>>>> under incompatible licenses (CC BY-SA 4.0, MPL, Apache2, …) and unless 
>>>>>> there is a non-public agreement that this data may be redistributed, 
>>>>>> there is a huge legal issue here. We would expose our users to potential 
>>>>>> copyright infringement which we cannot do under any circumstances. 
>>>>>> Furthermore many lists are available under a non-commercial license 
>>>>>> which excludes them from being used in any kind of business. Plenty of 
>>>>>> IPFire systems are running in businesses, if not even the vast majority.
>>>>>> 
>>>>>> In short, these lists are completely unusable for us. Apart from HaGeZi, 
>>>>>> I consider OISD to have the same problem.
>>>>>> 
>>>>>> Enough about all the things that are bad. Let’s talk about the new, good 
>>>>>> things:
>>>>>> 
>>>>>> Many blacklists on the internet are an amalgamation of other lists. 
>>>>>> These lists vary in quality with some of them being not that good and 
>>>>>> without a clear focus and others being excellent data. Since we don’t 
>>>>>> have the man power to start from scratch, I felt that we can copy the 
>>>>>> concept that HaGeZi and OISD have started and simply create a new list 
>>>>>> that is based on other lists at the beginning to have a good starting 
>>>>>> point. That way, we have much better control over what is going on these 
>>>>>> lists and we can shape and mould them as we need them. Most importantly, 
>>>>>> we don’t create a single lists, but many lists that have a clear focus 
>>>>>> and allow users to choose what they want to block and what not.
>>>>>> 
>>>>>> So the current experimental stage that I am in has these lists:
>>>>>> 
>>>>>>    * Ads
>>>>>>    * Dating
>>>>>>    * DoH
>>>>>>    * Gambling
>>>>>>    * Malware
>>>>>>    * Porn
>>>>>>    * Social
>>>>>>    * Violence
>>>>>> 
>>>>>> The categories have been determined by what source lists we have 
>>>>>> available with good data and are compatible with our chosen license CC 
>>>>>> BY-SA 4.0. This is the same license that we are using for the IPFire 
>>>>>> Location database, too.
>>>>>> 
>>>>>> The main use-cases for any kind of blocking are to comply with legal 
>>>>>> requirements in networks with children (i.e. schools) to remove any kind 
>>>>>> of pornographic content, sometimes block social media as well. Gambling 
>>>>>> and violence are commonly blocked, too. Even more common would be 
>>>>>> filtering advertising and any malicious content.
>>>>>> 
>>>>>> The latter is especially difficult because so many source lists throw 
>>>>>> phishing, spyware, malvertising, tracking and other things into the same 
>>>>>> bucket. Here this is currently all in the malware list which has 
>>>>>> therefore become quite large. I am not sure whether this will stay like 
>>>>>> this in the future or if we will have to make some adjustments, but that 
>>>>>> is exactly why this is now entering some larger testing.
>>>>>> 
>>>>>> What has been built so far? In order to put these lists together 
>>>>>> properly, track any data about where it is coming from, I have built a 
>>>>>> tool in Python available here:
>>>>>> 
>>>>>>    https://git.ipfire.org/?p=dnsbl.git;a=summary
>>>>>> 
>>>>>> This tool will automatically update all lists once an hour if there have 
>>>>>> been any changes and export them in various formats. The exported lists 
>>>>>> are available for download here:
>>>>>> 
>>>>>>    https://dnsbl.ipfire.org/lists/
>>>>> The download using dnsbl.ipfire.org/lists/squidguard.tar.gz as the custom 
>>>>> url works fine.
>>>>> 
>>>>> However you need to remember not to put the https:// at the front of the 
>>>>> url otherwise the WUI page completes without any error messages but 
>>>>> leaves an error message in the system logs saying
>>>>> 
>>>>> URL filter blacklist - ERROR: Not a valid URL filter blacklist
>>>>> 
>>>>> I found this out the hard way.
>>>> 
>>>> Oh yes, I forgot that there is a field on the web UI. If that does not 
>>>> accept https:// as a prefix, please file a bug and we will fix it.
>>> 
>>> I will confirm it and raise a bug.
>>> 
>>>> 
>>>>> The other thing I noticed is that if you already have the Toulouse 
>>>>> University list downloaded and you then change to the ipfire custom url 
>>>>> then all the existing Toulouse blocklists stay in the directory on IPFire 
>>>>> and so you end up with a huge number of category tick boxes, most of 
>>>>> which are the old Toulouse ones, which are still available to select and 
>>>>> it is not clear which ones are from Toulouse and which ones from IPFire.
>>>> 
>>>> Yes, I got the same thing, too. I think this is a bug, too, because 
>>>> otherwise you would have a lot of unused categories lying around that will 
>>>> never be updated. You cannot even tell which ones are from the current 
>>>> list and which ones from the old list.
>>>> 
>>>> Long-term we could even consider to remove the Univ. Toulouse list 
>>>> entirely and only have our own lists available which would make the 
>>>> problem go away.
>>>> 
>>>>> I think if the blocklist URL source is changed or a custom url is 
>>>>> provided the first step should be to remove the old ones already existing.
>>>>> That might be a problem because users can also create their own 
>>>>> blocklists and I believe those go into the same directory.
>>>> 
>>>> Good thought. We of course cannot delete the custom lists.
>>>> 
>>>>> Without clearing out the old blocklists you end up with a huge number of 
>>>>> checkboxes for lists but it is not clear what happens if there is a 
>>>>> category that has the same name for the Toulouse list and the IPFire list 
>>>>> such as gambling. I will have a look at that and see what happens.
>>>>> 
>>>>> Not sure what the best approach to this is.
>>>> 
>>>> I believe it is removing all old content.
>>>> 
>>>>> Manually deleting all contents of the urlfilter/blacklists/ directory and 
>>>>> then selecting the IPFire blocklist url for the custom url I end up with 
>>>>> only the 8 categories from the IPFire list.
>>>>> 
>>>>> I have tested some gambling sites from the IPFire list and the block 
>>>>> worked on some. On others the site no longer exists so there is nothing 
>>>>> to block or has been changed to an https site and in that case it went 
>>>>> straight through. Also if I chose the http version of the link, it was 
>>>>> automatically changed to https and went through without being blocked.
>>>> 
>>>> The entire IPFire infrastructure always requires HTTPS. If you start using 
>>>> HTTP, you will be automatically redirected. It is 2026 and we don’t need 
>>>> to talk HTTP any more :)
>>> 
>>> Some of the domains in the gambling list (maybe quite a lot) seem to only 
>>> have an http access. If I tried https it came back with the fact that it 
>>> couldn't find it.
>>> 
>>>> 
>>>> I am glad to hear that the list is actually blocking. It would have been 
>>>> bad if it didn’t. Now we have the big task to check out the “quality” - 
>>>> however that can be determined. I think this is what needs some time…
>>>> 
>>>> In the meantime I have set up a small page on our website:
>>>> 
>>>>    https://www.ipfire.org/dnsbl
>>>> 
>>>> I would like to run this as a first-class project inside IPFire like we 
>>>> are doing with IPFire Location. That means that we need to tell people 
>>>> about what we are doing. Hopefully this page is a little start.
>>>> 
>>>> Initially it has a couple of high-level bullet points about what we are 
>>>> trying to achieve. I don’t think the text is very good, yet, but it is the 
>>>> best I had in that moment. There is then also a list of the lists that we 
>>>> currently offer. For each list, a detailed page will tell you about the 
>>>> license, how many domains are listed, when the last update has been, the 
>>>> sources and even there is a history page that shows all the changes 
>>>> whenever they have happened.
>>>> 
>>>> Finally there is a section that explains “How To Use?” the list which I 
>>>> would love to extend to include AdGuard Plus and things like that as well 
>>>> as Pi-Hole and whatever else could use the list. In a later step we should 
>>>> go ahead and talk to any projects to include our list(s) into their 
>>>> dropdown so that people can enable them nice and easy.
>>>> 
>>>> Behind the web page there is an API service that is running on the host 
>>>> that is running the DNSBL. The frontend web app that is running 
>>>> www.ipfire.org <http://www.ipfire.org/> is connecting to that API service 
>>>> to fetch the current lists, any details and so on. That way, we can split 
>>>> the logic and avoid creating a huge monolith of a web app. This also means 
>>>> that page could be down a little as I am still working on the entire thing 
>>>> and will frequently restart it.
>>>> 
>>>> The API documentation is available here and the API is publicly available: 
>>>> https://api.dnsbl.ipfire.org/docs
>>>> 
>>>> The website/API allows to file reports for anything that does not seem to 
>>>> be right on any of the lists. I would like to keep it as an open process, 
>>>> however, long-term, this cannot cost us any time. In the current stage, 
>>>> the reports are getting filed and that is about it. I still need to build 
>>>> out some way for admins or moderators (I am not sure what kind of roles I 
>>>> want to have here) to accept or reject those reports.
>>>> 
>>>> In case of us receiving a domain from a source list, I would rather like 
>>>> to submit a report to upstream for them to de-list. That way, we don’t 
>>>> have any admin to do and we are contributing back to other list. That 
>>>> would be a very good thing to do. We cannot however throw tons of emails 
>>>> at some random upstream projects without co-ordinating this first. By not 
>>>> reporting upstream, we will probably over time create large whitelists and 
>>>> I am not sure if that is a good thing to do.
>>>> 
>>>> Finally, there is a search box that can be used to find out if a domain is 
>>>> listed on any of the lists.
>>>> 
>>>>>> If you download and open any of the files, you will see a large header 
>>>>>> that includes copyright information and lists all sources that have been 
>>>>>> used to create the individual lists. This way we ensure maximum 
>>>>>> transparency, comply with the terms of the individual licenses of the 
>>>>>> source lists and give credit to the people who help us to put together 
>>>>>> the most perfect list for our users.
>>>>>> 
>>>>>> I would like this to become a project that is not only being used in 
>>>>>> IPFire. We can and will be compatible with other solutions like AdGuard, 
>>>>>> PiHole so that people can use our lists if they would like to even 
>>>>>> though they are not using IPFire. Hopefully, these users will also feed 
>>>>>> back to us so that we can improve our lists over time and make them one 
>>>>>> of the best options out there.
>>>>>> 
>>>>>> All lists are available as a simple text file that lists the domains. 
>>>>>> Then there is a hosts file available as well as a DNS zone file and an 
>>>>>> RPZ file. Each list is individually available to be used in squidGuard 
>>>>>> and there is a larger tarball available with all lists that can be used 
>>>>>> in IPFire’s URL Filter. I am planning to add Suricata/Snort signatures 
>>>>>> whenever I have time to do so. Even though it is not a good idea to 
>>>>>> filter pornographic content this way, I suppose that catching malware 
>>>>>> and blocking DoH are good use-cases for an IPS. Time will tell…
>>>>>> 
>>>>>> As a start, we will make these lists available in IPFire’s URL Filter 
>>>>>> and collect some feedback about how we are doing. Afterwards, we can see 
>>>>>> where else we can take this project.
>>>>>> 
>>>>>> If you want to enable this on your system, simply add the URL to your 
>>>>>> autoupdate.urls file like here:
>>>>>> 
>>>>>> https://git.ipfire.org/?p=people/ms/ipfire-2.x.git;a=commitdiff;h=bf675bb937faa7617474b3cc84435af3b1f7f45f
>>>>> I also tested out adding the IPFire url to autoupdate.urls and that also 
>>>>> worked fine for me.
>>>> 
>>>> Very good. Should we include this already with Core Update 200? I don’t 
>>>> think we would break anything, but we might already gain a couple more 
>>>> people who are helping us to test this all?
>>> 
>>> I think that would be a good idea.
>>> 
>>>> 
>>>> The next step would be to build and test our DNS infrastructure. In the 
>>>> “How To Use?” Section on the pages of the individual lists, you can 
>>>> already see some instructions on how to use the lists as an RPZ. In 
>>>> comparison to other “providers”, I would prefer if people would be using 
>>>> DNS to fetch the lists. This is simply to push out updates in a cheap way 
>>>> for us and also do it very regularly.
>>>> 
>>>> Initially, clients will pull the entire list using AXFR. There is no way 
>>>> around this as they need to have the data in the first place. After that, 
>>>> clients will only need the changes. As you can see in the history, the 
>>>> lists don’t actually change that often. Sometimes only once a day and 
>>>> therefore downloading the entire list again would be a huge waste of data, 
>>>> both on the client side, but also for us hosting then.
>>>> 
>>>> Some other providers update their lists “every 10 minutes”, and there 
>>>> won't be any changes whatsoever. We don’t do that. We will only export the 
>>>> lists again when they have actually changed. The timestamps on the files 
>>>> that we offer using HTTPS can be checked by clients so that they won’t 
>>>> re-download the list again if it has not been changed. But using HTTPS 
>>>> still means that we would have to re-download the entire list and not only 
>>>> the changes.
>>>> 
>>>> Using DNS and IXFR will update the lists by only transferring a few 
>>>> kilobytes and therefore we can have clients check once an hour if a list 
>>>> has actually changed and only send out the raw changes. That way, we will 
>>>> be able to serve millions of clients at very cheap cost and they will 
>>>> always have a very up to date list.
>>>> 
>>>> As far as I can see any DNS software that supports RPZs supports AXFR/IXFR 
>>>> with exception of Knot Resolver which expects the zone to be downloaded 
>>>> externally. There is a ticket for AXFR/IXFR support 
>>>> (https://gitlab.nic.cz/knot/knot-resolver/-/issues/195).
>>>> 
>>>> Initially, some of the lists have been *huge* which is why a simple HTTP 
>>>> download is not feasible. The porn list was over 100 MiB. We could have 
>>>> spent thousands on just traffic alone which I don’t have for this kind of 
>>>> project. It would also be unnecessary money being spent. There are simply 
>>>> better solutions out there. But then I built something that basically 
>>>> tests the data that we are receiving from upstream but simply checking if 
>>>> a listed domain still exists. The result was very astonishing to me.
>>>> 
>>>> So whenever someone adds a domain to the list, we will (eventually, but 
>>>> not immediately) check if we can resolve the domain’s SOA record. If not, 
>>>> we mark the domain as non-active and will no longer include them in the 
>>>> exported data. This brought down the porn list from just under 5 million 
>>>> domains to just 421k. On the sources page 
>>>> (https://www.ipfire.org/dnsbl/lists/porn/sources) I am listing the 
>>>> percentage of dead domains from each of them and the UT1 list has 94% dead 
>>>> domains. Wow.
>>>> 
>>>> If we cannot resolve the domain, neither can our users. So we would 
>>>> otherwise fill the lists with tons of domains that simply could never be 
>>>> reached. And if they cannot be reached, why would we block them? We would 
>>>> waste bandwidth and a lot of memory on each single client.
>>>> 
>>>> The other sources have similarly high rations of dead domains. Most of 
>>>> them are in the 50-80% range. Therefore I am happy that we are doing some 
>>>> extra work here to give our users much better data for their filtering.
>>> 
>>> Removing all dead entries sounds like an excellent step.
>>> 
>>> Regards,
>>> 
>>> Adolf.
>>> 
>>>> 
>>>> So, if you like, please go and check out the RPZ blocking with Unbound. 
>>>> Instructions are on the page. I would be happy to hear how this is turning 
>>>> out.
>>>> 
>>>> Please let me know if there are any more questions, and I would be glad to 
>>>> answer them.
>>>> 
>>>> Happy New Year,
>>>> -Michael
>>>> 
>>>>> 
>>>>> Regards,
>>>>> Adolf.
>>>>>> This email is just a brain dump from me to this list. I would be happy 
>>>>>> to answer any questions about implementation details, etc. if people are 
>>>>>> interested. Right now, this email is long enough already…
>>>>>> 
>>>>>> All the best,
>>>>>> -Michael
>>>>> 
>>>>> -- 
>>>>> Sent from my laptop
>>>> 
>>>> 
>>>> 
>>> 
>> 
> 
> -- 
> Sent from my laptop
> 
>

Re: Let's launch our own blocklists...

Reply via email to