Re: publish from certain commit onward, keeping earlier history private, but provable

2015-12-09 Thread Jeff King
On Wed, Dec 09, 2015 at 02:29:12PM -0800, Stefan Beller wrote:

> On Wed, Dec 9, 2015 at 2:24 PM, Jeff King  wrote:
> > On Wed, Dec 09, 2015 at 05:20:41PM -0500, Jeff King wrote:
> >
> >> Of course that is a bitter pill to swallow if you have reasons for
> >> wanting to use the old sha1s. E.g., you have internal development
> >> proceeding against the old tree and want to share a truncated version
> >> with the public.
> >
> > After re-reading your email, it looks like your use case is just to be
> > able to later prove the existence of the original history. You could
> > that by mentioning the original "C" in your truncated "D", but in a way
> > that git does not traverse reachability. For instance, amend D's commit
> > message to say:
> >
> >   This is based on earlier, unpublished work going up to commit C.
> >
> > Then retain C for yourself, and show it only to those you want to prove
> > its contents to.
> 
> I'd rather keep D for yourself and create a D' which is D just without
> parent and
> the note above, such that the tree of D and parts of the commit message
> is obvious by looking at D'. All that is secret is Ds parent and the commit
> information such as exact date. (committer could be guessed easily)

I think the point is that all of this is happening at time t (let's say
2015), and the proof may be needed at time t+N (let's say 2020).

Showing the original D (or C, or whatever) at that point proves nothing,
as you could have created a fake history in 2020 that "ends up" at the
D' tree. You need to publish _something_ in 2015 that says "I know this
thing, but I am not willing to show it to you yet".

The classic way of doing this is to take out a small ad in the
classified section of a print newspaper with a hash of your data.
Libraries keep archives of the paper, so later you can prove that you
have the data that matches the hash, and its timestamp is certified by
the library archives.

Here we're abusing Git as the notary. If everyone spends the years from
2015-2020 building on top of D', then they can all reasonably agree that
the content of D' was written in 2015, and any commit hash it mentions
had to have existed then. Revealing C (or the original D, or whatever
hash you want to mention) proves the data.

-Peff
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: publish from certain commit onward, keeping earlier history private, but provable

2015-12-09 Thread Stefan Beller
On Wed, Dec 9, 2015 at 2:24 PM, Jeff King  wrote:
> On Wed, Dec 09, 2015 at 05:20:41PM -0500, Jeff King wrote:
>
>> Of course that is a bitter pill to swallow if you have reasons for
>> wanting to use the old sha1s. E.g., you have internal development
>> proceeding against the old tree and want to share a truncated version
>> with the public.
>
> After re-reading your email, it looks like your use case is just to be
> able to later prove the existence of the original history. You could
> that by mentioning the original "C" in your truncated "D", but in a way
> that git does not traverse reachability. For instance, amend D's commit
> message to say:
>
>   This is based on earlier, unpublished work going up to commit C.
>
> Then retain C for yourself, and show it only to those you want to prove
> its contents to.

I'd rather keep D for yourself and create a D' which is D just without
parent and
the note above, such that the tree of D and parts of the commit message
is obvious by looking at D'. All that is secret is Ds parent and the commit
information such as exact date. (committer could be guessed easily)

>
> -Peff
> --
> To unsubscribe from this list: send the line "unsubscribe git" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: publish from certain commit onward, keeping earlier history private, but provable

2015-12-09 Thread Jeff King
On Wed, Dec 09, 2015 at 05:20:41PM -0500, Jeff King wrote:

> Of course that is a bitter pill to swallow if you have reasons for
> wanting to use the old sha1s. E.g., you have internal development
> proceeding against the old tree and want to share a truncated version
> with the public.

After re-reading your email, it looks like your use case is just to be
able to later prove the existence of the original history. You could
that by mentioning the original "C" in your truncated "D", but in a way
that git does not traverse reachability. For instance, amend D's commit
message to say:

  This is based on earlier, unpublished work going up to commit C.

Then retain C for yourself, and show it only to those you want to prove
its contents to.

-Peff
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: publish from certain commit onward, keeping earlier history private, but provable

2015-12-09 Thread Jeff King
On Wed, Dec 09, 2015 at 02:45:44PM +0100, Jörn Hees wrote:

> I've been hacking away on a library for quite some time and have a lot of 
> commits in my private repository:
> 
> A -> B -> C -> D -> E
> 
> Finally, I'm nearing completion of a first version, and want to
> publish it to a remote called public from D onward keeping A..C to
> myself, so public should afterwards look like this:
> 
> D -> E

The short answer is that you cannot do this without changing the names
(i.e., sha1 commit ids) of D and E.

One of the fundamental assumptions git makes is that if a repository has
an object X, it also has all of the objects reachable from it (past
commits, their trees, subtrees, and blobs). This is what makes the
push/fetch object transfer efficient (one side says only "I have X" and
the other side knows "Ah, that is a whole chunk of objects I do not have
to bother sending", without the names of those objects going over the
wire).

The exception, of course, is shallow clones, where one side tells the
other "I am shallow at cutoff point Y; don't assume I have anything
below there". This does work, but there are some downsides (for
instance, we cannot apply some of the same reachability optimizations
for serving fetches).

>   I can verify that local_public only contains D -> E and that the
>   commit, tree and parent hashes are the same, which is exactly what i
>   want.
>   
>   The problem is that when i try to push to an added public remote
>   from local_public i get an error like this:
>   
>   ! [remote rejected] master -> master (shallow update not allowed)

Right. The receiver must be explicitly configured to accept a shallow
push (I do not recall offhand whether clients fetching from you would
also need an explicit config to accept a shallow history).

So the usual path here is to rewrite D and E (with the same trees, but
they will get new commit ids). If you want to retain the older history
(commits A-C), you can distribute it separately and use git-replace to
"graft" it onto the newer history at run-time.

You can do that with:

  # set up a run-time replacement view so that D appears to have
  # no parents; this doesn't impact the objects themselves, but
  # rather git will use our parent-less "replacement" D anytime
  # somebody mentions the original
  git replace --graft D

  # verify that the history is what you want; if you have a non-linear
  # history you may have to make several such "cuts" in the graph
  git log

  # now cement it into place by rewriting
  git filter-branch

Of course that is a bitter pill to swallow if you have reasons for
wanting to use the old sha1s. E.g., you have internal development
proceeding against the old tree and want to share a truncated version
with the public.  In that case I still think the least painful thing is
to rewrite the truncated history, have _everyone_, internal and public
work against that, and let internal folks graft the old history on for
their own use. They can do that with:

  git replace --graft the-rewritten-D the-original-C

-Peff
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: publish from certain commit onward, keeping earlier history private, but provable

2015-12-09 Thread Johannes Löthberg

On 09/12, Jörn Hees wrote:

Hi,

I've been hacking away on a library for quite some time and have a lot 
of commits in my private repository:


A -> B -> C -> D -> E

Finally, I'm nearing completion of a first version, and want to publish 
it to a remote called public from D onward keeping A..C to myself, so 
public should afterwards look like this:


D -> E

My main motivation is that i don't really want to put ridiculously 
first trials online, but still (on demand) I'd like to be able to prove 
how i arrived at D (think of copyright claims, etc).


As (at the moment) it's pretty much impossible to reverse-engineer the 
hashes of commits in the chain with times and changesets, i thought 
just keeping D's parent pointer to C would be one of the genius 
advantages of git. Sadly i can't find a way to actually make this work.


Can i somehow push D -> E to public making it a fully functional public 
repository with all the necessary objects included to checkout D or E 
and D still pointing to C as parent? If not, why is that?




Take a look at git-replace[0][1].

[0]: https://git-scm.com/2010/03/17/replace.html
[1]: https://www.kernel.org/pub/software/scm/git/docs/git-replace.html

--
Sincerely,
 Johannes Löthberg
 PGP Key ID: 0x50FB9B273A9D0BB5
 https://theos.kyriasis.com/~kyrias/


signature.asc
Description: PGP signature


publish from certain commit onward, keeping earlier history private, but provable

2015-12-09 Thread Jörn Hees
Hi,

I've been hacking away on a library for quite some time and have a lot of 
commits in my private repository:

A -> B -> C -> D -> E

Finally, I'm nearing completion of a first version, and want to publish it to a 
remote called public from D onward keeping A..C to myself, so public should 
afterwards look like this:

D -> E

My main motivation is that i don't really want to put ridiculously first trials 
online, but still (on demand) I'd like to be able to prove how i arrived at D 
(think of copyright claims, etc).

As (at the moment) it's pretty much impossible to reverse-engineer the hashes 
of commits in the chain with times and changesets, i thought just keeping D's 
parent pointer to C would be one of the genius advantages of git. Sadly i can't 
find a way to actually make this work.

Can i somehow push D -> E to public making it a fully functional public 
repository with all the necessary objects included to checkout D or E and D 
still pointing to C as parent? If not, why is that?

What doesn't seem to work:

- push with range
  
  git push public D..E:master
  error: src refspec D..E does not match any.
  error: failed to push some refs to ''

- any form of squashing / history rewriting
  
  As far as i know squashing A..D would introduce a new commit removing the 
parent pointer to C and thereby removing provability of the existence of A..C. 
(Simple example: say C reversed B, then you'd never be able to prove B was in 
there at some point.)
  
  I could obviously manually note the hash of C in the description of the 
squash commit, but there already is a parent pointer field, why not use it?
  
  Also in order to contribute further changes to public I'd have to rebase my 
private branches on top of this new squashed commit, which just seems as 
wrong...

- push from local clone with limited depth
  
  I thought i found a solution to this by first creating a local clone 
local_public with the desired depth before pushing that clone to public like 
this:
  
  git clone --depth 2 file:/// local_public
  
  With
  
  git log --pretty=raw
  
  I can verify that local_public only contains D -> E and that the commit, tree 
and parent hashes are the same, which is exactly what i want.
  
  The problem is that when i try to push to an added public remote from 
local_public i get an error like this:
  
  ! [remote rejected] master -> master (shallow update not allowed)


Any ideas how to make this work?

Cheers,
Jörn

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html