Ryan, Wayne and I have been discussing making various improvements to 1.5.2 mandatory for all CAs. I've made a few improvements to DigiCert's CPSs in this area, but things probably still could be better. There will probably be a CA/B ballot in this area soon.
DigiCert's 1.5.2 has our support email address, and our Certificate Problem Report email (which I recently added). That doesn't really cover everything (yet). It looks like GTS 1.5.2 splits things into security (including CPRs), non-security requests. I didn't chase down any other 1.5.2's yet, but it'd be interesting to hear what other CAs have here. I suspect most only have one address for everything. Something to keep in mind once the CA/B thread shows up. -Tim > -----Original Message----- > From: dev-security-policy [mailto:dev-security-policy- > bounces+tim.hollebeek=digicert....@lists.mozilla.org] On Behalf Of Ryan > Hurst via dev-security-policy > Sent: Wednesday, February 21, 2018 9:53 PM > To: mozilla-dev-security-pol...@lists.mozilla.org > Subject: Re: Google OCSP service down > > I wanted to follow up with our findings and a summary of this issue for the > community. > > Bellow you will see a detail on what happened and how we resolved the issue, > hopefully this will help explain what hapened and potentially others not > encounter a similar issue. > > Summary > ------- > January 19th, at 08:40 UTC, a code push to improve OCSP generation for a > subset of the Google operated Certificate Authorities was initiated. The > change > was related to the packaging of generated OCSP responses. The first time this > change was invoked in production was January 19th at 16:40 UTC. > > NOTE: The publication of new revocation information to all geographies can > take up to 6 hours to propagate. Additionally, clients and middle-boxes > commonly implement caching behavior. This results in a large window where > clients may have begun to observe the outage. > > NOTE: Most modern web browsers “soft-fail” in response to OCSP server > availability issues, masking outages. Firefox, however, supports an advanced > option that allows users to opt-in to “hard-fail” behavior for revocation > checking. An unknown percentage of Firefox users enable this setting. We > believe most users who were impacted by the outage were these Firefox users. > > About 9 hours after the deployment of the change began (2018-01-20 01:36 > UTC) a user on Twitter mentions that they were having problems with their > hard-fail OCSP checking configuration in Firefox when visiting Google > properties. This tweet and the few that followed during the outage period were > not noticed by any Google employees until after the incident’s post-mortem > investigation had begun. > > About 1 day and 22 hours after the push was initiated (2018-01-21 15:07 UTC), > a user posted a message to the mozilla.dev.security.policy mailing list where > they mention they too are having problems with their hard-fail configuration > in > Firefox when visiting Google properties. > > About two days after the push was initiated, a Google employee discovered the > post and opened a ticket (2018-01-21 16:10 UTC). This triggered the > remediation procedures, which began in under an hour. > > The issue was resolved about 2 days and 6 hours from the time it was > introduced (2018-01-21 22:56 UTC). Once Google became aware of the issue, it > took 1 hour and 55 minutes to resolve the issue, and an additional 4 hours and > 51 minutes for the fix to be completely deployed. > > No customer reports regarding this issue were sent to the notification > addresses listed in Google's CPSs or on the repository websites for the > duration > of the outage. This extended the duration of the outage. > > Background > ---------- > Google's OCSP Infrastructure works by generating OCSP responses in batches, > with each batch being made up of the certificates issued by an individual CA. > > In the case of GIAG2, this batch is produced in chunks of certificates issued > in > the last 370 days. For each chunk, the GIAG2 CA is asked to produce the > corresponding OCSP responses, the results of which are placed into a separate > .tar file. > > The issuer of GIAG2 has chosen to issue new certificates to GIAG2 > periodically, > as a result GIAG2 has multiple certificates. Two of these certificates no > longer > have unexpired certificates associated with them. As a result, and as > expected, > the CA does not produce responses for the corresponding periods. > > All .tar files produced during this process are then concatenated with the - > concatenate command in GNU tar. This produces a single .tar file containing > all > of the OCSP responses for the given Certificate Authority, then this .tar > file is > distributed to our global CDN infrastructure for serving. > > A change was made in how we batch these responses, specifically instead of > outputting many .tar files within a batch, a concatenation was of all tar > files > was produced. > > The change in question triggered an unexpected behaviour in GNU tar which > then manifested as an empty tarball. These "empty" updates ended up being > distributed to our global CDN, effectively dropping some responses, while > continuing to serve responses for other CAs. > > During testing of the change, this behaviour was not detected, as the tests > did > not cover the scenario in which some chunks did not contain unexpired > certificates. > > Findings > -------- > - The outage only impacted sites with TLS certificates issued by the GIAG2 CA > as it was the only CA that met the required pre-conditions of the bug. > - The bug that introduced this failure manifested itself as an empty > container of > OCSP responses. The root cause of the issue was an unexpected behavior of > GNU tar relating to concatenating tar files. > - The outage was observed by revocation service monitoring as “unknown > certificate” (HTTP 404) errors. HTTP 404 errors are expected in OCSP > responder operations; they typically are the result of poorly configured > clients. > These events are monitored and a threshold does exist for an on-call > escalation. > - Due to a configuration error the designated Google team did not receive an > escalation message. > - External users did not use the contact details Google provided in the CPS. > > Remediation Plan > ---------------- > - A bug fix has been applied to prevent the same issue from happening again. > - Test cases looking for a minimum number of OCSP responses in each tar were > added to the test automation suites to catch similar issues in the future. > - The monitoring system that was misconfigured was updated to use the > correct address for escalations. > - Both the Google Trust Services CPS (found on pki.goog) and the Google CPS > (found on pki.google.com) have been updated to make it clear what email > address is the most expedient path to reach the PKI team for non-security > incidents. > - The Google PKI repository page was updated to show contact details in the > same way the Google Trust Services repository page already did in a hope to > help users find a path of escalation. > - The wizard that is returned for mails to the security email address has been > updated to also include an explicit option for issues related to the “Google > Certificate Authority” in the hopes of helping users who choose this path of > escalation. > - Existing procedures that are relied upon for periodic verification of > effective > escalation have been updated to include unknown certificate checking. > > _______________________________________________ > dev-security-policy mailing list > dev-security-policy@lists.mozilla.org > https://clicktime.symantec.com/a/1/c7XVow9dpuj8IcTSi3RUsAZNao2vvQpjx50 > I-L-Vues=?d=a8bGh4U_daa8sZ6NrNFYldn92rRny4FeSmGVut8w- > EpNntcoPemdf815YVvwKHuqoKWrFl-_FF88KvI- > g6MtPoT7dR8X0p7jIOiMMzFB1Oo7HjzsAY1_9lqhZrLywcjqWbk13D_p3Ll4Lsel0 > FbCfxQg8ZRva7LmdOqP_8fxd4j4zZQZtuK1IaD6sXqMG0L7ytNcn6rF2IUFRa4Qa > VWZK1TzJXCjW_OddQll8kDyKRRM_ygs1cq6S- > igplPwN_yuWgdTc7_rIz0lzmwwvaaTuM20kuHGNPwWaFXn3pVW9313nUNiXz > BLAr8DV4QEgnaRqD_CLgMftm7WfKblze0HRF- > N45Bld6PgwdHDi2xobKs0BSWDW5tOuJmzbtPmfPvBxSTMduaXRBXTQAKl4zf1q > iD0rIGhSVrdmJCz9a69KaAmJjoVcwKfn9h4rwU5h2ydzQ%3D%3D&u=https%3A > %2F%2Flists.mozilla.org%2Flistinfo%2Fdev-security-policy
smime.p7s
Description: S/MIME cryptographic signature
_______________________________________________ dev-security-policy mailing list dev-security-policy@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-security-policy