On 19/09/18 08:59, Kristian Evensen wrote:
> Hi Simon,
> 
> Thanks for a quick reply.
> 
> On Wed, Sep 19, 2018 at 12:23 AM Simon Kelley <si...@thekelleys.org.uk> wrote:
>> Thanks for the report. The obvious explanation is that whine_malloc() is
>> returning NULL, and the code should handle that. whine_malloc only
>> returns NULL if the system cannot allocate any more memory, which is
>> possible, but unlikely. Is your router very short on memory?
> 
> No, the router has plenty of memory (2GB) and I don't see the "failed
> to allocate"-message, so I guess whine_malloc() can't be the culprit.
> Since I am using OpenWRT, there could be some defines affecting the
> line numbers. I tried to read up on how ifdefs affects line numbers in
> gdb backtraces to see if the error could be somewhere else than the
> "default" line 1437, but I unfortunately couldn't find anything.
> Probably my google-foo is a bit rusty.
> 
> When looking over my notes, I see that I have made the following
> observations related to this bug:
> 
> * Crash happens quite rarely.
> * I have only seen the bug right after boot.
> * When the bug strikes, dnsmasq will enter a crash loop and never
> recover. I.e., I can restart dnsmasq as many times as I like, crash
> always happens.
> * If I start dnsmasq manually and run it in the foreground after a
> crash, I also see the error.
> 
> So there seems to be something in the system causing this error, but I
> can't figure out what.
> 
>> I think the best solution is to wrap all of
>>
>>       *crecp = *source;
>>       crecp->flags &= ~(F_IPV4 | F_IPV6 | F_CNAME | F_DNSKEY | F_DS |
>> F_REVERSE);
>>       crecp->flags |= F_NAMEP;
>>       crecp->name.namep = name;
>>
>>       cache_hash(crecp);
>>
>> with
>>
>> if (crecp)
>> {
>> }
> 
> Thanks, this is basically the same as my current fix, so I can already
> report that it is good :)
> 

This all makes me slightly uneasy. I think the "out of memory"
explanation for the crashes you are seeing is not a good one.

The patch is definitely needed. The philosophy for memory allocation
failures is that the code should be not terminate: it logs an error and
continues, but not necessarily behaving correctly. That's about the best
you can do.

Unfortunately, with the patch, if crecp is NULL at this point for some
other reason, it will now fail silently, and behave wrongly (the
non-terminal cache entry will not be created)

It would be really good to find out what actually causes this, and
preferably a way to reproduce it.

Cheers,

Simon.




_______________________________________________
Dnsmasq-discuss mailing list
Dnsmasq-discuss@lists.thekelleys.org.uk
http://lists.thekelleys.org.uk/mailman/listinfo/dnsmasq-discuss

Reply via email to