> Am 12.12.2017 um 14:37 schrieb Steffen <i...@apachelounge.com>: > > The curl error was just to show you the debug log entries which you asked. > > This curl error we discussed by mail already in the very beginning (mod_md > does not work with curl openssl on Windows). > > 1.1.0 is working fine so far. > > I am only testing rare cases (you asked to test).
I see. I did not understand you before. I suspected you had a real error with v1.1.0. As to your design proposals: - No, I will not change the endless retry behaviour. - Yes, I expect people to check their server logs, somehow. Either manual or with some tools. There is more than mod_md that can have trouble and every module implementing its own solution is not a good service for users. - Yes, I think some log levels need adjustment, e.g. when LE cannot be reached at all. - Yes, I think there should be a high level NOTICE log entry when a certificate could not be renewed and the existing one only last a couple of days. All that said: I do not want to make more changes to mod_md before a release. If we find a serious error, sure. But otherwise I'd rather enhance the documentation for now. If you want to add some Windows specific advice to the mod_md XML, please do so. Cheers, Stefan > Steffen > > On Tuesday 12/12/2017 at 14:21, Stefan Eissing wrote: >> *without* introducing new ones, I meant. Please provide a log. >> >>> Am 12.12.2017 um 14:21 schrieb Stefan Eissing >>> <stefan.eiss...@greenbytes.de>: >>> >>> >>> >>>> Am 12.12.2017 um 14:17 schrieb Steffen <i...@apachelounge.com>: >>>> >>>> To be clear : As I said the curl error I have introduced (by my self), so >>>> I know exactly what is wrong. >>> >>> Ah, that was not clear to me. >>> >>> So, what is the error happening with you introducing new ones? Is there >>> nothing to see in the logs or did I miss it? >>> >>>> Your reply shows me that you want to keep the endless retry loop. I the >>>> worst case a user can end with a non working SSL because a certificate is >>>> not renewed. >>>> >>>> Why is it retried again and again ? Looks all hard errors, except when LE >>>> is temporary down. >>>> >>>> I think it should be fixed. No every one is constantly look at the >>>> error.log. >>>> >>>> >>>> What I like: >>>> >>>> Use MDNotifyCmd for the first error AH10057 . >>>> Now the MDNotifyCmd is only triggered when it is ok, seems logical to also >>>> notify when there is some wrong. >>>> >>>> >>>> On Tuesday 12/12/2017 at 13:58, Stefan Eissing wrote: >>>>> >>>>> >>>>>> Am 12.12.2017 um 13:47 schrieb Steffen <i...@apachelounge.com>: >>>>>> >>>>>> It was happening before 1.1.0, but i did not give it attention, seen it >>>>>> in several situations which all I unfortunate cannot recall (see the >>>>>> retries as example https://github.com/icing/mod_md/issues/52and >>>>>> https://github.com/icing/mod_md/issues/62 ). >>>>>> >>>>>> It is a more serious issue then I thought before. >>>>>> >>>>>> I think we must first fix this, otherwise it is a bad introduction to >>>>>> our users. This because Windows community first-time users learned that >>>>>> they are dealing with it and are dealing with all kind of (try) errors, >>>>>> most users stopped using it. As said in an other post mod_md is not that >>>>>> easy to start with. >>>>>> >>>>>> Also when the loglevel is on the default Warn, users see hardly what is >>>>>> happening. I advise our users to use LogLevel info md:trace2 ssl:notice >>>>>> >>>>>> The Endless Retry loop Tested now in the following situations, tested >>>>>> during renew and no new certificate is generated, httpd running fine >>>>>> with the old certificate which was still valid. >>>>>> >>>>>> 1 - Mis-configuration like below. >>>>>> 2 - ACME CA service down (cause Letsencrypt down) >>>>>> 3 - ACME CA service not reachable (cause local network, or OS >>>>>> failure/misconfig) >>>>>> 4 - Error response (Get/Post errors)when accessing Letsencrypt, >>>>>> dependency issue like curl, mod_ssl. >>>>>> 5 - mod_md/mod_ssl faults >>>>>> 6 - Should be more >>>>>> >>>>>> >>>>>> 2) 3) Both can be that Letsencrypt is temp down maybe retry there, but >>>>>> hard to tell if the cause is temp LE-Down, issue local or OS misconfig. >>>>>> >>>>>> 4) Is a good example: Error response from LE, which happens quite some >>>>>> situations, Curl issues, Rate-Limits, mod_md faults etc. >>>>>> >>>>>> Below I introduced a Curl issue: >>>>>> >>>>>> ... >>>>>> [md:debug] [pid 7508:tid 1052] mod_md.c(762): AH10055: md watchdog run, >>>>>> auto drive 2 mds >>>>>> [md:debug] [pid 7508:tid 1052] mod_md.c(691): AH10052: >>>>>> md(apachelounge.nl): state=2, driving >>>>>> [md:debug] [pid 7508:tid 1052] md_reg.c(884): apachelounge.nl: run >>>>>> staging >>>>>> [md:debug] [pid 7508:tid 1052] md_acme_drive.c(690): apachelounge.nl: >>>>>> staging started, state=2, can_http=0, can_https=1, >>>>>> challenges='tls-sni-01' >>>>>> [md:debug] [pid 7508:tid 1052] md_store_fs.c(690): purge >>>>>> staging/apachelounge.nl (D:/servers/apacheS/md/staging/apachelounge.nl) >>>>>> [md:debug] [pid 7508:tid 1052] md_acme.c(144): get directory from >>>>>> https://acme-v01.api.letsencrypt.org/directory >>>>>> [md:debug] [pid 7508:tid 1052] md_acme.c(407): req: POST >>>>>> https://acme-v01.api.letsencrypt.org/directory >>>>>> [md:debug] [pid 7508:tid 1052] md_curl.c(258): (20014)Internal error >>>>>> (specific information not available): request 10 failed(60): Peer >>>>>> certificate cannot be authenticated with given CA certificates >>>>> >>>>> Ok, this needs to be logged at ERROR level, so users do not have to mess >>>>> with LogLevel to see what is going on. >>>>> >>>>> As for the reason, this seems to indicate that the curl client finds no >>>>> way to verify the Let's Encrypt server certificate. Can you verify that >>>>> the "curl.exe" can connect to >>>>> "https://acme-v01.api.letsencrypt.org/directory" and retrieve the JSON >>>>> there *without* you giving it the '-k' or '--insecure' option? And where >>>>> does your curl.exe/libcurl come from? Did you build it yourself? >>>>> >>>>>> [md:debug] [pid 7508:tid 1052] md_acme.c(425): (20014)Internal error >>>>>> (specific information not available): req sent >>>>>> [md:error] [pid 7508:tid 1052] (20014)Internal error (specific >>>>>> information not available): apachelounge.nl: setup >>>>>> ACME(https://acme-v01.api.letsencrypt.org/directory) >>>>>> [md:debug] [pid 7508:tid 1052] md_acme_drive.c(912): (20014)Internal >>>>>> error (specific information not available): apachelounge.nl: ACME, ACME >>>>>> staging >>>>>> [md:debug] [pid 7508:tid 1052] md_reg.c(891): (20014)Internal error >>>>>> (specific information not available): apachelounge.nl: staging done >>>>>> [md:error] [pid 7508:tid 1052] (20014)Internal error (specific >>>>>> information not available): AH10056: processing apachelounge.nl >>>>>> [md:info] [pid 7508:tid 1052] AH10057: apachelounge.nl: encountered >>>>>> error for the 6. time, next run in 0:02:40 hours >>>>>> ... >>>>>> >>>>>> Maybe a little solution: starting httpd, mod_md checks if LE is >>>>>> reachable without error. >>>>> >>>>> No, I think checking external servers on every httpd restart is a good >>>>> idea. >>>>> >>>>>> And a solution for the below one can be: make a check that 443 and/or 80 >>>>>> is used. >>>>>> >>>>>> Still my questions: >>>>>> >>>>>> Does the retry stop ? >>>>> >>>>> The retry does not stop, but it uses longer and longer retry intervals. >>>>> Exactly to recover from errors with the ACME server that are recoverable, >>>>> e.g. server/internet down. Your local certificate store not able to >>>>> verify the LE server will not recover itself, however. >>>>> >>>>>> When does it happen, on what errors ? >>>>> >>>>> On any error where signup/renew is necessary and could not complete. >>>>> >>>>>> >>>>>> >>>>>> Steffen >>>>>> >>>>>> >>>>>> On Tuesday 12/12/2017 at 10:18, Stefan Eissing wrote: >>>>>>> Can you switch to "LogLevel md:debug" for a while and send me the >>>>>>> details? Did this start on the v1.1.0 or before that? >>>>>>> >>>>>>>> Am 11.12.2017 um 16:09 schrieb Steffen <i...@apachelounge.com>: >>>>>>>> >>>>>>>> >>>>>>>> Running 1.1.0 with the new naming. >>>>>>>> >>>>>>>> When mod_md encounters an error it looks like it is going in a endless >>>>>>>> loop: >>>>>>>> >>>>>>>> >>>>>>>> [md:info] [pid 10372:tid 1964] AH10057: apachelounge.nl: encountered >>>>>>>> error for the 1. time, next run in 0:00:05 hours >>>>>>>> [md:info] [pid 10372:tid 1964] AH10057: apachelounge.nl: encountered >>>>>>>> error for the 2. time, next run in 0:00:10 hours >>>>>>>> [md:info] [pid 10372:tid 1964] AH10057: apachelounge.nl: encountered >>>>>>>> error for the 3. time, next run in 0:00:20 hours >>>>>>>> [md:info] [pid 10372:tid 1964] AH10057: apachelounge.nl: encountered >>>>>>>> error for the 4. time, next run in 0:00:40 hours >>>>>>>> [md:info] [pid 10372:tid 1964] AH10057: apachelounge.nl: encountered >>>>>>>> error for the 5. time, next run in 0:01:20 hours >>>>>>>> [md:info] [pid 10372:tid 1964] AH10057: apachelounge.nl: encountered >>>>>>>> error for the 6. time, next run in 0:02:40 hours >>>>>>>> [md:info] [pid 10372:tid 1964] AH10057: apachelounge.nl: encountered >>>>>>>> error for the 7. time, next run in 0:05:20 hours >>>>>>>> [md:info] [pid 10372:tid 1964] AH10057: apachelounge.nl: encountered >>>>>>>> error for the 8. time, next run in 0:10:40 hours >>>>>>>> ... >>>>>>>> ... >>>>>>>> ... >>>>>>>> >>>>>>>> Above is during renew and using port 444.. >>>>>>>> >>>>>>>> Apache is running fine because the certificate is still valid. >>>>>>>> >>>>>>>> Does it stop ? >>>>>>>> >>>>>>>> When does it happen, on what errors ? Above happens when: >>>>>>>> (20014)Internal error (specific information not available): AH10056: >>>>>>>> processing apachelounge.nl. >>>>>>>> >>>>>>>> What to do. Stopping on above retries can be tricky because when the >>>>>>>> ACME CA service is temp down or not reachable we do want maybe a >>>>>>>> retry. A reachable error/down error is different then a configuration >>>>>>>> error causing it like in above case.. >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>>>> >>>>> >>>> >>>> >>> >> > >