On Sun, Jun 03, 2018 at 05:03:44PM -0400, Theodore Y. Ts'o wrote:
> If you don't think a potential 2x -- 10x performance hit isn't a
> blocking factor --- sure, go ahead and try implementing it.  And good
> luck to you.  And this is not a guarantee that it won't get rejected.
> I certainly don't have the power to make that guarantee.

I do not want or expect a guarantee, or even a probability, of course. 
Just trying to avoid "STRONG REJECT. We could have said you before you 
even started implementing. Why didn't you discuss this beforehand?"

One would simply change something like

author A U Thor <aut...@example.com> 1465982009 +0000

into something like

author 21bbba8e9ce9734022d2c23df247a2704c0320ad7d43c02e8bdecdfae27e23b4 A U 
Thor <aut...@example.com>
author-hash 469bb107e38f8e59dddb3bbd6f8646e052bf73d48427865563c7358a64467f2c
authordate c444f739ca317e09dbd3dae1207065585ae2c2e18cd0fc434b5bde08df1e0569 
1465982009 +0000
authordate-hash 199875e5aedb6cb164a2b40c16209dc5bb37f34c059a56c6d96766440fb0fe68

and then compute the commit id without the "author" and the 
"authordate" lines.

The *-hash values were obtained as follows:

echo -n '21bbba8e9ce9734022d2c23df247a2704c0320ad7d43c02e8bdecdfae27e23b4 A U 
Thor <aut...@example.com>' | sha3sum -a 256
echo -n 'c444f739ca317e09dbd3dae1207065585ae2c2e18cd0fc434b5bde08df1e0569 
1465982009 +0000' | sha3sum -a 256

The hex values here are simply the $huge_random_numbers

Verifying the commit ID by itself wouldn't be any less efficient than 
before. Admitteldly, it wouldn't verify the author and authordate 
integrity anymore without additional work. That would be some overhead, 
sure, and could be done on demand, and would mostly affect clones. I 
don't think it would be that much of a problem. It can be parallelized 
easily. The hashes for each field are independent of each other. They 
can all be verified in parallel in different threads running on 
different cores.

On djb's typical 2015 skylake machine the supercop benchmark tells us 
that sha3-256 (~=keccakc512) has a speed of about 20 cycles/byte for 
blocks of 64 bytes of data, see 
https://bench.cr.yp.to/results-sha3.html#amd64-skylake

Let's say we have 128 bytes of data on average for the author field, so 
conservatively speaking it takes about 3000 cycles (> 128*20) to hash 
and compare the hash.

At 3000 MHz, we can thus do roughly about 1000 verifications per second 
per core.

Let's assume we have 10 anonymizable fields of this kind per commit.

Then the overhead would be one second per 100 x ncores commits.

How many commits are we talking about in a huge repository? And how 
long does a clone of such a huge repository take at the moment? Do you 
have any numbers?

> If you don't have time to implement, why do you think it's fair to
> inflict on everyone else the request for time to do a design review
> for something for which the need hasn't even been established?

I do not request from anyone to even reply to my messages. I just see a 
lot of time being wasted by discussing things about my proposal that 
are technically irrelevant. If that time were put into reviewing the 
design, it would be spent better.

Please don't devalue a proposal. It is not true that the only value is 
in actual code and proposals are "bullshit".

I was not the first to raise the issue, as I clearly showed in my 
initial email.

The demand is in fact high; very high. At present, that demand is 
satisfied by lawyers. Who are writing snake oil disclaimers and such 
for enormous sums of money. In a lot of companies. To "solve" a 
technical issue by pseudo-legal means by finding excuses for why the 
"right to be forgotten" doesn't have to be implemented in specific 
cases such as git. What if all that lawyer money were put into actually 
solving the technical issues as technical issues? Engineers are 
apparently bad at marketing, the lawyers seem more successful in that 
respect.

Best wishes
Peter

-- 
Peter Backes, r...@helen.plasma.xg8.de

Reply via email to