Re: mod_md 1.1.0 repeating on error

Stefan Eissing Tue, 12 Dec 2017 05:50:18 -0800


> Am 12.12.2017 um 14:37 schrieb Steffen <i...@apachelounge.com>:
> 
> The curl error was just to show you the debug log entries which you asked.
> 
> This curl error we discussed by mail  already in the very beginning (mod_md 
> does not work with curl openssl on Windows).
> 
> 1.1.0 is working  fine so far.
> 
> I am only testing rare cases  (you asked to test).


I see. I did not understand you before. I suspected you had a real error with 
v1.1.0.

As to your design proposals:
- No, I will not change the endless retry behaviour.
- Yes, I expect people to check their server logs, somehow. Either manual or 
with some tools.
  There is more than mod_md that can have trouble and every module implementing 
its own
  solution is not a good service for users.
- Yes, I think some log levels need adjustment, e.g. when LE cannot be reached 
at all.
- Yes, I think there should be a high level NOTICE log entry when a certificate 
could not
  be renewed and the existing one only last a couple of days.

All that said: I do not want to make more changes to mod_md before a release. 
If we find a
serious error, sure. But otherwise I'd rather enhance the documentation for now.

If you want to add some Windows specific advice to the mod_md XML, please do so.

Cheers,

Stefan

> Steffen
>  
> On Tuesday 12/12/2017 at 14:21, Stefan Eissing wrote:
>> *without* introducing new ones, I meant. Please provide a log.
>> 
>>> Am 12.12.2017 um 14:21 schrieb Stefan Eissing 
>>> <stefan.eiss...@greenbytes.de>:
>>> 
>>> 
>>> 
>>>> Am 12.12.2017 um 14:17 schrieb Steffen <i...@apachelounge.com>:
>>>> 
>>>> To be clear : As I said the curl error I have introduced (by my self), so 
>>>> I know exactly what is wrong.
>>> 
>>> Ah, that was not clear to me.
>>> 
>>> So, what is the error happening with you introducing new ones? Is there 
>>> nothing to see in the logs or did I miss it?
>>> 
>>>> Your reply shows me that you want to keep the endless retry loop. I the 
>>>> worst case a user can end with a non working SSL because a certificate is 
>>>> not renewed.
>>>> 
>>>> Why is it retried again and again ? Looks all hard errors, except when LE 
>>>> is temporary down.
>>>> 
>>>> I think it should be fixed. No every one is constantly look at the 
>>>> error.log.
>>>> 
>>>> 
>>>> What I like:
>>>> 
>>>> Use MDNotifyCmd for the first error AH10057 . 
>>>> Now the MDNotifyCmd is only triggered when it is ok, seems logical to also 
>>>> notify when there is some wrong.
>>>> 
>>>> 
>>>> On Tuesday 12/12/2017 at 13:58, Stefan Eissing wrote: 
>>>>> 
>>>>> 
>>>>>> Am 12.12.2017 um 13:47 schrieb Steffen <i...@apachelounge.com>:
>>>>>> 
>>>>>> It was happening before 1.1.0, but i did not give it attention, seen it 
>>>>>> in several situations which all I unfortunate cannot recall (see the 
>>>>>> retries as example https://github.com/icing/mod_md/issues/52and 
>>>>>> https://github.com/icing/mod_md/issues/62 ).
>>>>>> 
>>>>>> It is a more serious issue then I thought before. 
>>>>>> 
>>>>>> I think we must first fix this, otherwise it is a bad introduction to 
>>>>>> our users. This because Windows community first-time users learned that 
>>>>>> they are dealing with it and are dealing with all kind of (try) errors, 
>>>>>> most users stopped using it. As said in an other post mod_md is not that 
>>>>>> easy to start with.
>>>>>> 
>>>>>> Also when the loglevel is on the default Warn, users see hardly what is 
>>>>>> happening. I advise our users to use LogLevel info md:trace2 ssl:notice
>>>>>> 
>>>>>> The Endless Retry loop Tested now in the following situations, tested 
>>>>>> during renew and no new certificate is generated, httpd running fine 
>>>>>> with the old certificate which was still valid.
>>>>>> 
>>>>>> 1 - Mis-configuration like below.
>>>>>> 2 - ACME CA service down (cause Letsencrypt down)
>>>>>> 3 - ACME CA service not reachable (cause local network, or OS 
>>>>>> failure/misconfig)
>>>>>> 4 - Error response (Get/Post errors)when accessing Letsencrypt, 
>>>>>> dependency issue like curl, mod_ssl.
>>>>>> 5 - mod_md/mod_ssl faults
>>>>>> 6 - Should be more
>>>>>> 
>>>>>> 
>>>>>> 2) 3) Both can be that Letsencrypt is temp down maybe retry there, but 
>>>>>> hard to tell if the cause is temp LE-Down, issue local or OS misconfig.
>>>>>> 
>>>>>> 4) Is a good example: Error response from LE, which happens quite some 
>>>>>> situations, Curl issues, Rate-Limits, mod_md faults etc.
>>>>>> 
>>>>>> Below I introduced a Curl issue:
>>>>>> 
>>>>>> ...
>>>>>> [md:debug] [pid 7508:tid 1052] mod_md.c(762): AH10055: md watchdog run, 
>>>>>> auto drive 2 mds
>>>>>> [md:debug] [pid 7508:tid 1052] mod_md.c(691): AH10052: 
>>>>>> md(apachelounge.nl): state=2, driving
>>>>>> [md:debug] [pid 7508:tid 1052] md_reg.c(884): apachelounge.nl: run 
>>>>>> staging
>>>>>> [md:debug] [pid 7508:tid 1052] md_acme_drive.c(690): apachelounge.nl: 
>>>>>> staging started, state=2, can_http=0, can_https=1, 
>>>>>> challenges='tls-sni-01'
>>>>>> [md:debug] [pid 7508:tid 1052] md_store_fs.c(690): purge 
>>>>>> staging/apachelounge.nl (D:/servers/apacheS/md/staging/apachelounge.nl)
>>>>>> [md:debug] [pid 7508:tid 1052] md_acme.c(144): get directory from 
>>>>>> https://acme-v01.api.letsencrypt.org/directory
>>>>>> [md:debug] [pid 7508:tid 1052] md_acme.c(407): req: POST 
>>>>>> https://acme-v01.api.letsencrypt.org/directory
>>>>>> [md:debug] [pid 7508:tid 1052] md_curl.c(258): (20014)Internal error 
>>>>>> (specific information not available): request 10 failed(60): Peer 
>>>>>> certificate cannot be authenticated with given CA certificates
>>>>> 
>>>>> Ok, this needs to be logged at ERROR level, so users do not have to mess 
>>>>> with LogLevel to see what is going on.
>>>>> 
>>>>> As for the reason, this seems to indicate that the curl client finds no 
>>>>> way to verify the Let's Encrypt server certificate. Can you verify that 
>>>>> the "curl.exe" can connect to 
>>>>> "https://acme-v01.api.letsencrypt.org/directory"; and retrieve the JSON 
>>>>> there *without* you giving it the '-k' or '--insecure' option? And where 
>>>>> does your curl.exe/libcurl come from? Did you build it yourself?
>>>>> 
>>>>>> [md:debug] [pid 7508:tid 1052] md_acme.c(425): (20014)Internal error 
>>>>>> (specific information not available): req sent
>>>>>> [md:error] [pid 7508:tid 1052] (20014)Internal error (specific 
>>>>>> information not available): apachelounge.nl: setup 
>>>>>> ACME(https://acme-v01.api.letsencrypt.org/directory)
>>>>>> [md:debug] [pid 7508:tid 1052] md_acme_drive.c(912): (20014)Internal 
>>>>>> error (specific information not available): apachelounge.nl: ACME, ACME 
>>>>>> staging
>>>>>> [md:debug] [pid 7508:tid 1052] md_reg.c(891): (20014)Internal error 
>>>>>> (specific information not available): apachelounge.nl: staging done
>>>>>> [md:error] [pid 7508:tid 1052] (20014)Internal error (specific 
>>>>>> information not available): AH10056: processing apachelounge.nl
>>>>>> [md:info] [pid 7508:tid 1052] AH10057: apachelounge.nl: encountered 
>>>>>> error for the 6. time, next run in 0:02:40 hours
>>>>>> ...
>>>>>> 
>>>>>> Maybe a little solution: starting httpd, mod_md checks if LE is 
>>>>>> reachable without error.
>>>>> 
>>>>> No, I think checking external servers on every httpd restart is a good 
>>>>> idea.
>>>>> 
>>>>>> And a solution for the below one can be: make a check that 443 and/or 80 
>>>>>> is used.
>>>>>> 
>>>>>> Still my questions:
>>>>>> 
>>>>>> Does the retry stop ?
>>>>> 
>>>>> The retry does not stop, but it uses longer and longer retry intervals. 
>>>>> Exactly to recover from errors with the ACME server that are recoverable, 
>>>>> e.g. server/internet down. Your local certificate store not able to 
>>>>> verify the LE server will not recover itself, however.
>>>>> 
>>>>>> When does it happen, on what errors ?
>>>>> 
>>>>> On any error where signup/renew is necessary and could not complete.
>>>>> 
>>>>>> 
>>>>>> 
>>>>>> Steffen
>>>>>> 
>>>>>> 
>>>>>> On Tuesday 12/12/2017 at 10:18, Stefan Eissing wrote:
>>>>>>> Can you switch to "LogLevel md:debug" for a while and send me the 
>>>>>>> details? Did this start on the v1.1.0 or before that?
>>>>>>> 
>>>>>>>> Am 11.12.2017 um 16:09 schrieb Steffen <i...@apachelounge.com>:
>>>>>>>> 
>>>>>>>> 
>>>>>>>> Running 1.1.0 with the new naming.
>>>>>>>> 
>>>>>>>> When mod_md encounters an error it looks like it is going in a endless 
>>>>>>>> loop:
>>>>>>>> 
>>>>>>>> 
>>>>>>>> [md:info] [pid 10372:tid 1964] AH10057: apachelounge.nl: encountered 
>>>>>>>> error for the 1. time, next run in 0:00:05 hours
>>>>>>>> [md:info] [pid 10372:tid 1964] AH10057: apachelounge.nl: encountered 
>>>>>>>> error for the 2. time, next run in 0:00:10 hours
>>>>>>>> [md:info] [pid 10372:tid 1964] AH10057: apachelounge.nl: encountered 
>>>>>>>> error for the 3. time, next run in 0:00:20 hours
>>>>>>>> [md:info] [pid 10372:tid 1964] AH10057: apachelounge.nl: encountered 
>>>>>>>> error for the 4. time, next run in 0:00:40 hours
>>>>>>>> [md:info] [pid 10372:tid 1964] AH10057: apachelounge.nl: encountered 
>>>>>>>> error for the 5. time, next run in 0:01:20 hours
>>>>>>>> [md:info] [pid 10372:tid 1964] AH10057: apachelounge.nl: encountered 
>>>>>>>> error for the 6. time, next run in 0:02:40 hours
>>>>>>>> [md:info] [pid 10372:tid 1964] AH10057: apachelounge.nl: encountered 
>>>>>>>> error for the 7. time, next run in 0:05:20 hours
>>>>>>>> [md:info] [pid 10372:tid 1964] AH10057: apachelounge.nl: encountered 
>>>>>>>> error for the 8. time, next run in 0:10:40 hours
>>>>>>>> ...
>>>>>>>> ...
>>>>>>>> ...
>>>>>>>> 
>>>>>>>> Above is during renew and using port 444..
>>>>>>>> 
>>>>>>>> Apache is running fine because the certificate is still valid.
>>>>>>>> 
>>>>>>>> Does it stop ?
>>>>>>>> 
>>>>>>>> When does it happen, on what errors ? Above happens when: 
>>>>>>>> (20014)Internal error (specific information not available): AH10056: 
>>>>>>>> processing apachelounge.nl.
>>>>>>>> 
>>>>>>>> What to do. Stopping on above retries can be tricky because when the 
>>>>>>>> ACME CA service is temp down or not reachable we do want maybe a 
>>>>>>>> retry. A reachable error/down error is different then a configuration 
>>>>>>>> error causing it like in above case..
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>>> 
>>>>> 
>>>> 
>>>> 
>>> 
>> 
> 
>

Re: mod_md 1.1.0 repeating on error

Reply via email to