Re: publish from certain commit onward, keeping earlier history private, but provable

2015-12-09 Thread Stefan Beller
On Wed, Dec 9, 2015 at 2:24 PM, Jeff King  wrote:
> On Wed, Dec 09, 2015 at 05:20:41PM -0500, Jeff King wrote:
>
>> Of course that is a bitter pill to swallow if you have reasons for
>> wanting to use the old sha1s. E.g., you have internal development
>> proceeding against the old tree and want to share a truncated version
>> with the public.
>
> After re-reading your email, it looks like your use case is just to be
> able to later prove the existence of the original history. You could
> that by mentioning the original "C" in your truncated "D", but in a way
> that git does not traverse reachability. For instance, amend D's commit
> message to say:
>
>   This is based on earlier, unpublished work going up to commit C.
>
> Then retain C for yourself, and show it only to those you want to prove
> its contents to.

I'd rather keep D for yourself and create a D' which is D just without
parent and
the note above, such that the tree of D and parts of the commit message
is obvious by looking at D'. All that is secret is Ds parent and the commit
information such as exact date. (committer could be guessed easily)

>
> -Peff
> --
> To unsubscribe from this list: send the line "unsubscribe git" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: publish from certain commit onward, keeping earlier history private, but provable

2015-12-09 Thread Jeff King
On Wed, Dec 09, 2015 at 05:20:41PM -0500, Jeff King wrote:

> Of course that is a bitter pill to swallow if you have reasons for
> wanting to use the old sha1s. E.g., you have internal development
> proceeding against the old tree and want to share a truncated version
> with the public.

After re-reading your email, it looks like your use case is just to be
able to later prove the existence of the original history. You could
that by mentioning the original "C" in your truncated "D", but in a way
that git does not traverse reachability. For instance, amend D's commit
message to say:

  This is based on earlier, unpublished work going up to commit C.

Then retain C for yourself, and show it only to those you want to prove
its contents to.

-Peff
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: publish from certain commit onward, keeping earlier history private, but provable

2015-12-09 Thread Jeff King
On Wed, Dec 09, 2015 at 02:45:44PM +0100, Jörn Hees wrote:

> I've been hacking away on a library for quite some time and have a lot of 
> commits in my private repository:
> 
> A -> B -> C -> D -> E
> 
> Finally, I'm nearing completion of a first version, and want to
> publish it to a remote called public from D onward keeping A..C to
> myself, so public should afterwards look like this:
> 
> D -> E

The short answer is that you cannot do this without changing the names
(i.e., sha1 commit ids) of D and E.

One of the fundamental assumptions git makes is that if a repository has
an object X, it also has all of the objects reachable from it (past
commits, their trees, subtrees, and blobs). This is what makes the
push/fetch object transfer efficient (one side says only "I have X" and
the other side knows "Ah, that is a whole chunk of objects I do not have
to bother sending", without the names of those objects going over the
wire).

The exception, of course, is shallow clones, where one side tells the
other "I am shallow at cutoff point Y; don't assume I have anything
below there". This does work, but there are some downsides (for
instance, we cannot apply some of the same reachability optimizations
for serving fetches).

>   I can verify that local_public only contains D -> E and that the
>   commit, tree and parent hashes are the same, which is exactly what i
>   want.
>   
>   The problem is that when i try to push to an added public remote
>   from local_public i get an error like this:
>   
>   ! [remote rejected] master -> master (shallow update not allowed)

Right. The receiver must be explicitly configured to accept a shallow
push (I do not recall offhand whether clients fetching from you would
also need an explicit config to accept a shallow history).

So the usual path here is to rewrite D and E (with the same trees, but
they will get new commit ids). If you want to retain the older history
(commits A-C), you can distribute it separately and use git-replace to
"graft" it onto the newer history at run-time.

You can do that with:

  # set up a run-time replacement view so that D appears to have
  # no parents; this doesn't impact the objects themselves, but
  # rather git will use our parent-less "replacement" D anytime
  # somebody mentions the original
  git replace --graft D

  # verify that the history is what you want; if you have a non-linear
  # history you may have to make several such "cuts" in the graph
  git log

  # now cement it into place by rewriting
  git filter-branch

Of course that is a bitter pill to swallow if you have reasons for
wanting to use the old sha1s. E.g., you have internal development
proceeding against the old tree and want to share a truncated version
with the public.  In that case I still think the least painful thing is
to rewrite the truncated history, have _everyone_, internal and public
work against that, and let internal folks graft the old history on for
their own use. They can do that with:

  git replace --graft the-rewritten-D the-original-C

-Peff
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: publish from certain commit onward, keeping earlier history private, but provable

2015-12-09 Thread Jeff King
On Wed, Dec 09, 2015 at 02:29:12PM -0800, Stefan Beller wrote:

> On Wed, Dec 9, 2015 at 2:24 PM, Jeff King  wrote:
> > On Wed, Dec 09, 2015 at 05:20:41PM -0500, Jeff King wrote:
> >
> >> Of course that is a bitter pill to swallow if you have reasons for
> >> wanting to use the old sha1s. E.g., you have internal development
> >> proceeding against the old tree and want to share a truncated version
> >> with the public.
> >
> > After re-reading your email, it looks like your use case is just to be
> > able to later prove the existence of the original history. You could
> > that by mentioning the original "C" in your truncated "D", but in a way
> > that git does not traverse reachability. For instance, amend D's commit
> > message to say:
> >
> >   This is based on earlier, unpublished work going up to commit C.
> >
> > Then retain C for yourself, and show it only to those you want to prove
> > its contents to.
> 
> I'd rather keep D for yourself and create a D' which is D just without
> parent and
> the note above, such that the tree of D and parts of the commit message
> is obvious by looking at D'. All that is secret is Ds parent and the commit
> information such as exact date. (committer could be guessed easily)

I think the point is that all of this is happening at time t (let's say
2015), and the proof may be needed at time t+N (let's say 2020).

Showing the original D (or C, or whatever) at that point proves nothing,
as you could have created a fake history in 2020 that "ends up" at the
D' tree. You need to publish _something_ in 2015 that says "I know this
thing, but I am not willing to show it to you yet".

The classic way of doing this is to take out a small ad in the
classified section of a print newspaper with a hash of your data.
Libraries keep archives of the paper, so later you can prove that you
have the data that matches the hash, and its timestamp is certified by
the library archives.

Here we're abusing Git as the notary. If everyone spends the years from
2015-2020 building on top of D', then they can all reasonably agree that
the content of D' was written in 2015, and any commit hash it mentions
had to have existed then. Revealing C (or the original D, or whatever
hash you want to mention) proves the data.

-Peff
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: publish from certain commit onward, keeping earlier history private, but provable

2015-12-09 Thread Johannes Löthberg

On 09/12, Jörn Hees wrote:

Hi,

I've been hacking away on a library for quite some time and have a lot 
of commits in my private repository:


A -> B -> C -> D -> E

Finally, I'm nearing completion of a first version, and want to publish 
it to a remote called public from D onward keeping A..C to myself, so 
public should afterwards look like this:


D -> E

My main motivation is that i don't really want to put ridiculously 
first trials online, but still (on demand) I'd like to be able to prove 
how i arrived at D (think of copyright claims, etc).


As (at the moment) it's pretty much impossible to reverse-engineer the 
hashes of commits in the chain with times and changesets, i thought 
just keeping D's parent pointer to C would be one of the genius 
advantages of git. Sadly i can't find a way to actually make this work.


Can i somehow push D -> E to public making it a fully functional public 
repository with all the necessary objects included to checkout D or E 
and D still pointing to C as parent? If not, why is that?




Take a look at git-replace[0][1].

[0]: https://git-scm.com/2010/03/17/replace.html
[1]: https://www.kernel.org/pub/software/scm/git/docs/git-replace.html

--
Sincerely,
 Johannes Löthberg
 PGP Key ID: 0x50FB9B273A9D0BB5
 https://theos.kyriasis.com/~kyrias/


signature.asc
Description: PGP signature