Re: [Anima] Desaster recover... (was: Re: [homenet] write up of time without clocks)

Max Pritikin (pritikin) Fri, 04 Nov 2016 14:23:54 -0700

There is a lot here. Attempting to comment on it all. 

It would really help if we could relate these discussions to specific text 
sections that could be improved. Otherwise we’re just blowing into the wind. 
(Where it seems obvious I’ve added such notes).

> On Nov 3, 2016, at 2:40 PM, Toerless Eckert <tte+i...@cs.fau.de> wrote:
> 
> IMHO, gear replacement after large outages is quite relevant:
> 
> I have seen equipment become unusable after bad
> power situations (spike, brownout, electrical storms,..) because of
> el-cheapo power supply/circuitry as well as wired interface
> circuitry/protection. Certifications in Telco/IoT make that type of gear
> often better, but those certifcations are a zoo.
> 
> - Large organizations will have spares. With ANIMA, those spares should
>  be stored pre-enrolled to avoid MASA/registrar dependency during
>  recovery/replacements: Receive gear on any site (no central provisioning
>  site needed), but unpack, plug in for short period, then stash away.

Devices that have been brought up and are in the network post bootstrap are not 
relevant to the BRSKI document. 

>  This is imo an easy requirement also useful without desaster requirements
>  - aka: enrolled spares would likely also have better theft/abuse protection
>  than non-enrolled. And yes, you will have to power up these spares
>  once a year. Which IMHO is ok. And of course certs should therefore
>  be somewhat longer than 1 year).

This could be captured as a discussion: what does a device do when its 
credentials are out of date? Does it fall back on full bootstrapping? If so 
this becomes a potential attack vector wherein the attacker can convince the 
device it is out of date and then force it to restart bootstrapping. That would 
be bad. 

So, should there be an injection point into the bootstrapping state machine 
that says something like:

If a device fails to join the ANI after [some number] of attempts it attempts 
to repeat bootstrapping using its original SUDI credentials. During this 
bootstrapping attempt it MUST only bootstrap on a Registrar that provides both 
a valid voucher and an identity recognizably within the same PKI hierarchy the 
device was in previously. This is detected by the device by either (a) the 
domainCAcert is an exact match for the current domain trust anchor known to the 
device or (b) the EST /cacerts response includes a chain that terminates in the 
current domain trust anchor known to the device.  

This extends the lifetime of such a stored device beyond its own certificate 
(maybe 1 year) *and* beyond the current CA certificate (maybe a 2year cert) an 
all the way into the next CA certificate (another 2 years). That is a pretty 
solid lifetime (up to 4 years) for recovery.

> - If sparing would be cross-locations (eg: spares sent from some site to
>  other sites during recovery), then its important that the ANIMA certs do
>  not include any location specific attributes that would prohibit thart
>  movement. So far BRSKY does not include any such attributes, but in
>  discussions, three has been interest to lock into the cert some device
>  role specific attributes, and for a spare piecce of equipment, those
>  attributes may not be predetermined.
> 
>  Aka: Desaster sparing means domain certs need to be simple, devices
>  reuseable across domain.

I agree that certs should include the absolute minimum information about the 
device. The actual ID is sufficient in my book. Attribute Certs or backend 
database lookups or short term tokens obtained by the device in place are all 
methods of distributing additional information. 

> - If spares are done by a shared facility: vendor, FEMO (for fed networks)
>  or the like, one could think of that entity offering (emergency)
>  enrolment service.
> 
>  Such an entity would have desaster proof MASA connectivity on site,
>  and the customers would provide it with (emergency) registrar certs.

Ok. So like a Cisco NERV truck that provides connectivity sufficient to rebuild 
a network in place. 

Issuing a new Registrar ID would mean being able to bring up the PKI 
infrastructure during the recovery process. But this is exactly the time period 
when having as much of the critical keying infrastructure offline and secured 
could be beneficial for longterm security. One could argue that extra attempts 
should be made to ensure the PKI is offline. 

>  (emergency) meaning that the entity is only permitted to enrol during
>  emergencies, and the customer could be able to verify the emergency
>  entities behavior by pulling from the MASA an audit log.

Section 5.6 doesn’t include the Registrar identity only the domainID in the 
log. Do folks think the Registrar ID should be included as well?

If this was done then does it matter if the Registrar cert has role information 
in it? As per the above what would happen if all devices in the ANI could act 
as a registrar? There would be a higher chance that one of them would mess up 
and actually perform registrar activities incorrectly — but it would be logged 
so the admin could watch for that.

Thoughts? I’m worried but see the conversation going that way.

>  If the customer would give the emergeny registrar credentials
>  cert/private/public-key to the emergeny entities, then the customer
>  itself also has the cert/private-key and could ask the MASA.
>  This option would work with current BRSKY text. I am just not sure
>  about security BCP (having two copies of a public key pair…).

Sorry - I can’t write text around copying a private key. If we’re going that 
route lets dump all the PKI stuff and just use bearer tokens and be up front 
about the security we’re providing. See above for different ideas. 

> - If the shared spare storage facility is Best Buy, and the Jones
>  family wants to replace some electric storm victims @home while
>  their ISP takes its days (my ISP C*...*T took almost 48 hours in one case),
>  then the gear should probably come with a nonceless voucher
>  printed on on a scratcher piece of paper inside the box.

Even a nonceless voucher includes the domainCAcert so this idea doesn’t fly 
with the current text. What you’re looking for is some form of “bearer voucher” 
that contains a symmetric key. Anybody who can scan it off the box can deploy 
the device. 

I’d normally rail against this but am actually pleased to see it fit so well 
into the model. A bearer voucher sorta sucks but creating one didn’t change any 
of our messaging flow or anything. Interesting positive that… 

I’m thinking that it is time for a section 3.7 “Voucher types” that digs into 
what we’re talking about for these different voucher types. Currently the text 
is scattered in the other sections. 

- max

> Beyond these BRSKY considerations, the next cool ANIMA piece to consider
> for these use-cases is of course the ASA driven "re-configuration" of the
> spare device, aka: In larger outages, you should not except the whole
> network provisioning backend to be up and running. And its IMHO
> illusionary to expect devices to ONLY be driven from intent for many
> years (in complex networks... in a clean homenet they are driven by
> intent today..). 
> 
> Aka: for pre-intent-only desaster proof networks, we'd need:
>  - ASA to cache neighbors configs.
>  - reapply neighbor stored config when replacement gear is inserted.
>  - IMHO: An anima definition for a "device-role-id" which is not
>    the ACP IPv6 address (which will change upon gear replacement
>    without enrolment support).
> 
> Cheers
>    toerless

_______________________________________________
Anima mailing list
Anima@ietf.org
https://www.ietf.org/mailman/listinfo/anima

Re: [Anima] Desaster recover... (was: Re: [homenet] write up of time without clocks)

Reply via email to