Re: [OFFTOPIC] Re: Trusting trust [was: PARTIAL DIAGNOSIS of Installation problems]

2021-03-05 Thread Gene Heskett
On Friday 05 March 2021 03:14:51 to...@tuxteam.de wrote:

> On Fri, Mar 05, 2021 at 01:33:14AM -0500, Stefan Monnier wrote:
> > > AIUI compilers have been studied so extensively that their
> > > production is largely automated.
> >
> > Oh, no.  There are some parts we know how to automate, but by and
> > large it's all hand written code.
> >
> :-)
>
> https://m.xkcd.com/224/
>
Thanks for my morning chuckle, Tomas. I needed that, badly.

> Cheers
>  - t


Cheers, Gene Heskett
-- 
"There are four boxes to be used in defense of liberty:
 soap, ballot, jury, and ammo. Please use in that order."
-Ed Howdershelt (Author)
If we desire respect for the law, we must first make the law respectable.
 - Louis D. Brandeis
Genes Web page 



Re: [OFFTOPIC] Re: Trusting trust [was: PARTIAL DIAGNOSIS of Installation problems]

2021-03-05 Thread tomas
On Fri, Mar 05, 2021 at 01:33:14AM -0500, Stefan Monnier wrote:
> > AIUI compilers have been studied so extensively that their production is
> > largely automated.
> 
> Oh, no.  There are some parts we know how to automate, but by and large
> it's all hand written code.

:-)

https://m.xkcd.com/224/

Cheers
 - t


signature.asc
Description: Digital signature


Re: [OFFTOPIC] Re: Trusting trust [was: PARTIAL DIAGNOSIS of Installation problems]

2021-03-04 Thread Stefan Monnier
> AIUI compilers have been studied so extensively that their production is
> largely automated.

Oh, no.  There are some parts we know how to automate, but by and large
it's all hand written code.

> Create an EBNF specification, feed it through a tool
> chain (lex, yacc, cc, as, ld, etc.), and you end up with a compiler.

The EBNF specification only gives you the syntax of the source language.
It's barely sufficient for a pretty printer, but lacks all the
information about typing rules and dynamic semantics of the source
language, as well as any information about the syntax and semantics of
the target language, and doesn't say anything about to optimize the
code either.

The part you can automate with lex/yacc and friends is a tiny fraction
of a compiler, except for very naive toy compilers.

> The process is known and the results are predictable; especially with
> standards-based languages such as C.  So, a skilled attacker will know
> what you're doing, how you are doing it, and may be able to produce a
> 'cA' that infects both 'A' and 'T'.

That is a risk, indeed.

> If you are going to produce source code for a trusted compiler 'T', then you
> should also produce an executable 'cT'.

That could be significantly harder.

> AIUI this can be done by writing a simplified compiler in some other
> language 'a',

Indeed, actually your trusted compiler `T` doesn't need to be compiled
with `cA` (nor written in the same source language), it just needs to be
used somehow to compile `A` to `cA2` so it can be compared to `cA` to
see if there's a backdoor.


Stefan




Re: [OFFTOPIC] Re: Trusting trust [was: PARTIAL DIAGNOSIS of Installation problems]

2021-03-04 Thread David Christensen

On 3/4/21 9:28 PM, David Christensen wrote:

(One more step of 'cT = cT(a)' may be required


Correction:  cT = cT(T)


David



Re: [OFFTOPIC] Re: Trusting trust [was: PARTIAL DIAGNOSIS of Installation problems]

2021-03-04 Thread David Christensen

On 3/4/21 6:50 PM, Stefan Monnier wrote:

The abstract states:

"In the DDC technique, source code is compiled twice: once with a 
second (trusted) compiler (using the source code of the compiler’s 
parent), and then the compiler source code is compiled using the 
result of the first compilation. If the result is bit-for-bit 
identical with the untrusted executable, then the source code 
accurately represents the executable."



I find the above to be unclear:


Of course, it's an abstract.  You can read the paper for more
details. Basically:

Take an existing untrusted compiler whose source code is A and binary
is cA.  You check that:

cA == cA(A)

if it's not the case (or if you don't have access to the source code
A), the DDC technique can't be used.  If it is the case, you have
just checked that `A` is indeed the source code for `cA`.



You have checked that 'cA' can reproduce itself given 'A' as input. 
This is the key feature of the TT attack -- the machine code executable 
'cA' contains a backdoor, but the source code 'A' does not.



The backdoor source code was removed from 'A' once the backdoor machine 
code injection was working in 'cA'.  The backdoor machine code injection 
in 'cA' is what perpetuates the TT attack whenever 'cA' is recompiled 
from 'A'.




Then take a trusted compiler whose source code is T. Now compile it
with `cA`:

cT = cA(T)

and then use this new compiler binary `cT` to compile `A` a second
time:

cA2 = cT(A)

finally compare `cA` and `cA2`. If they're bit-for-bit identical,
then you're good: `cA` doesn't seem to have any hidden trojan horse.


You need to not only trust that [T] does what you think it does, but 
also that any attacker who may have infected `cA` hasn't seen [T]

and can't have guessed enough of its content to be able to properly
infect `cT`.


Thanks for the explanation.


AIUI compilers have been studied so extensively that their production is
largely automated.  Create an EBNF specification, feed it through a tool
chain (lex, yacc, cc, as, ld, etc.), and you end up with a compiler.
The process is known and the results are predictable; especially with
standards-based languages such as C.  So, a skilled attacker will know
what you're doing, how you are doing it, and may be able to produce a
'cA' that infects both 'A' and 'T'.


If you are going to produce source code for a trusted compiler 'T', then 
you should also produce an executable 'cT'.  AIUI this can be done by 
writing a simplified compiler in some other language 'a', using that to 
create an intermediate compiler 'aT = a(T)', and using that to produce 
the compiler 'cT = aT(T)'.  (One more step of 'cT = cT(a)' may be 
required to obtain a fixed point binary.)  Now you can compare 'cA' to 
'cT(A)' and see the back door directly.



David



[OFFTOPIC] Re: Trusting trust [was: PARTIAL DIAGNOSIS of Installation problems]

2021-03-04 Thread Stefan Monnier
> The abstract states:
>
> "In the DDC technique, source code is compiled twice: once with a
> second (trusted) compiler (using the source code of the compiler’s
> parent), and then the compiler source code is compiled using the
> result of the first compilation. If the result is bit-for-bit
> identical with the untrusted executable, then the source code
> accurately represents the executable."
>
>
> I find the above to be unclear:

Of course, it's an abstract.  You can read the paper for more details.
Basically:

Take an existing untrusted compiler whose source code is A and binary is
cA.  You check that:

cA == cA(A)

if it's not the case (or if you don't have access to the source code A),
the DDC technique can't be used.  If it is the case, you have just
checked that `A` is indeed the source code for `cA`.

Then take a trusted compiler whose source code is T.
Now compile it with `cA`:

cT = cA(T)

and then use this new compiler binary `cT` to compile `A` a second time:

cA2 = cT(A)

finally compare `cA` and `cA2`.
If they're bit-for-bit identical, then you're good: `cA` doesn't seem to
have any hidden trojan horse.

If they're not, then either cA is compromised, or maybe it's simply that
the compilers A and T don't agree sufficiently on the source language
in which they're both written.

> 1.  What source code is compiled twice?

The source code `A` of the untrusted compiler.

> 2.  Where do I get the second (trusted) compiler?

You write it yourself by hand.  You also have to make sure that it
matches the semantics of `A` sufficiently to avoid false negatives.
You need to not only trust that it does what you think it does, but also
that any attacker who may have infected `cA` hasn't seen that code and
can't have guessed enough of its content to be able to properly infect
`cT`.

> 7.  What if the compiler, by design, does not produce identical output for
>identical input?

Then you can't use that technique and you're left wondering if it may
have a hidden self-perpetuating backdoor.


Stefan