Re: RFC: reverse history tree, for faster background clones

2015-06-14 Thread Andres G. Aragoneses

On 12/06/15 14:33, Dennis Kaarsemaker wrote:

On vr, 2015-06-12 at 13:39 +0200, Andres G. Aragoneses wrote:

On 12/06/15 13:33, Dennis Kaarsemaker wrote:

On vr, 2015-06-12 at 13:26 +0200, Andres G. Aragoneses wrote:


AFAIU git stores the contents of a repo as a sequence of patches in the
.git metadata folder.


It does not, it stores full snapshots of files.


In bare repos too?


Yes. A bare repo is nothing more than the .git dir of a non-bare repo
with the core.bare variable set to True :)


1. `git clone --depth 1` would be way faster, and without the need of
on-demand compressing of packfiles in the server side, correct me if I'm
wrong?


You're wrong due to the misunderstanding of how git works :)


Thanks for pointing this out, do you mind giving me a link of some docs
where I can correct my knowledge about this?


http://git-scm.com/book/en/v2/Git-Internals-Git-Objects should help.


Wow, now I wonder if I should also propose a change to make git 
optionally not store the full snapshots, so save disk space. Thanks for 
pointing this out to me.




2. `git clone` would be able to allow a fast operation, complete in the
background mode that would allow you to download the first snapshot of
the repo very quickly, so that the user would be able to start working
on his working directory very quickly, while a background job keeps
retreiving the history data in the background.


This could actually be a good thing, and can be emulated now with git
clone --depth=1 and subsequent fetches in the background to deepen the
history. I can see some value in clone doing this by itself, first doing
a depth=1 fetch, then launching itself into the background, giving you a
worktree to play with earlier.


You're right, didn't think about the feature that converts a --depth=1
repo to a normal one. Then a patch that would create a --progressive
flag (for instance, didn't think of a better name yet) for the `clone`
command would actually be trivial to create, I assume, because it would
just use `depth=1` and then retrieve the rest of the history in the
background, right?


A naive implementation that does just clone --depth=1 and then fetch
--unshallow would probably not be too hard, no. But whether that would
be the 'right' way of implementing it, I wouldn't know.


Ok, anyone else that can give an insight here?

I imagine that I would not get real feedback until I send a [PATCH]...

I guess I would use a user-facing message like this one:

Finished cloning the last snapshot of the repository.
Auto downloading the rest of the history in background.

(Since there's already a similar background feature already wrt 
auto-packing the repository: `Auto packing the repository in background 
for optimum performance. See git help gc for manual housekeeping.`.)


Thanks


--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RFC: reverse history tree, for faster background clones

2015-06-12 Thread Andres G. Aragoneses

Hello git devs,

I'm toying with an idea of an improvement I would like to work on, but 
not sure if it would be desirable enough to be considered good to merge 
in the end, so I'm requesting your opinions before I work on it.


AFAIU git stores the contents of a repo as a sequence of patches in the 
.git metadata folder. So then let's look at an example to illustrate my 
point more easily.


Repo foo contains the following 2 commits:

1 file, first commit, with the content:
+First Line
+Second Line
+Third Line

2nd and last commit:
 First Line
 Second Line
-Third Line
+Last Line

Simple enough, right?

But, what if we decided to store it backwards in the metadata?

So first commit would be:
1 file, first commit, with the content:
+First Line
+Second Line
+Last Line

2nd commit:
 First Line
 Second Line
-Last Line
+Third Line


This would bring some advantages, as far as I understand:

1. `git clone --depth 1` would be way faster, and without the need of 
on-demand compressing of packfiles in the server side, correct me if I'm 
wrong?
2. `git clone` would be able to allow a fast operation, complete in the 
background mode that would allow you to download the first snapshot of 
the repo very quickly, so that the user would be able to start working 
on his working directory very quickly, while a background job keeps 
retreiving the history data in the background.

3. Any more advantages you see?


I'm aware that this would have also downsides, but IMHO the benefits 
would outweigh them. The ones I see:
1. Everytime a commit is made, a big change of the history-metadata tree 
would need to happen. (Well but this is essentially equivalent to 
enabling an INDEX in a DB, you make WRITES more expensive in order to 
improve the speed of READS.)
2. Locking issues? I imagine that rewriting the indexes would open 
longer time windows to have locking issues, but I'm not an expert in 
this, please expand.

3. Any more downsides you see?


I would be glad for any feedback you have. Thanks, and have a great day!

  Andrés

--






--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RFC: reverse history tree, for faster background clones

2015-06-12 Thread Dennis Kaarsemaker
On vr, 2015-06-12 at 13:39 +0200, Andres G. Aragoneses wrote:
 On 12/06/15 13:33, Dennis Kaarsemaker wrote:
  On vr, 2015-06-12 at 13:26 +0200, Andres G. Aragoneses wrote:
 
  AFAIU git stores the contents of a repo as a sequence of patches in the
  .git metadata folder.
 
  It does not, it stores full snapshots of files.
 
 In bare repos too?

Yes. A bare repo is nothing more than the .git dir of a non-bare repo
with the core.bare variable set to True :)

  1. `git clone --depth 1` would be way faster, and without the need of
  on-demand compressing of packfiles in the server side, correct me if I'm
  wrong?
 
  You're wrong due to the misunderstanding of how git works :)
 
 Thanks for pointing this out, do you mind giving me a link of some docs 
 where I can correct my knowledge about this?

http://git-scm.com/book/en/v2/Git-Internals-Git-Objects should help.

  2. `git clone` would be able to allow a fast operation, complete in the
  background mode that would allow you to download the first snapshot of
  the repo very quickly, so that the user would be able to start working
  on his working directory very quickly, while a background job keeps
  retreiving the history data in the background.
 
  This could actually be a good thing, and can be emulated now with git
  clone --depth=1 and subsequent fetches in the background to deepen the
  history. I can see some value in clone doing this by itself, first doing
  a depth=1 fetch, then launching itself into the background, giving you a
  worktree to play with earlier.
 
 You're right, didn't think about the feature that converts a --depth=1 
 repo to a normal one. Then a patch that would create a --progressive 
 flag (for instance, didn't think of a better name yet) for the `clone` 
 command would actually be trivial to create, I assume, because it would 
 just use `depth=1` and then retrieve the rest of the history in the 
 background, right?

A naive implementation that does just clone --depth=1 and then fetch
--unshallow would probably not be too hard, no. But whether that would
be the 'right' way of implementing it, I wouldn't know.

-- 
Dennis Kaarsemaker
http://www.kaarsemaker.net

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RFC: reverse history tree, for faster background clones

2015-06-12 Thread Andres G. Aragoneses

On 12/06/15 13:33, Dennis Kaarsemaker wrote:

On vr, 2015-06-12 at 13:26 +0200, Andres G. Aragoneses wrote:


AFAIU git stores the contents of a repo as a sequence of patches in the
.git metadata folder.


It does not, it stores full snapshots of files.


In bare repos too?



1. `git clone --depth 1` would be way faster, and without the need of
on-demand compressing of packfiles in the server side, correct me if I'm
wrong?


You're wrong due to the misunderstanding of how git works :)


Thanks for pointing this out, do you mind giving me a link of some docs 
where I can correct my knowledge about this?




2. `git clone` would be able to allow a fast operation, complete in the
background mode that would allow you to download the first snapshot of
the repo very quickly, so that the user would be able to start working
on his working directory very quickly, while a background job keeps
retreiving the history data in the background.


This could actually be a good thing, and can be emulated now with git
clone --depth=1 and subsequent fetches in the background to deepen the
history. I can see some value in clone doing this by itself, first doing
a depth=1 fetch, then launching itself into the background, giving you a
worktree to play with earlier.


You're right, didn't think about the feature that converts a --depth=1 
repo to a normal one. Then a patch that would create a --progressive 
flag (for instance, didn't think of a better name yet) for the `clone` 
command would actually be trivial to create, I assume, because it would 
just use `depth=1` and then retrieve the rest of the history in the 
background, right?


Thanks


--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RFC: reverse history tree, for faster background clones

2015-06-12 Thread Dennis Kaarsemaker
On vr, 2015-06-12 at 13:26 +0200, Andres G. Aragoneses wrote:

 AFAIU git stores the contents of a repo as a sequence of patches in the 
 .git metadata folder. 

It does not, it stores full snapshots of files.

[I've cut the example, as it's not how git works]

 1. `git clone --depth 1` would be way faster, and without the need of 
 on-demand compressing of packfiles in the server side, correct me if I'm 
 wrong?

You're wrong due to the misunderstanding of how git works :)

 2. `git clone` would be able to allow a fast operation, complete in the 
 background mode that would allow you to download the first snapshot of 
 the repo very quickly, so that the user would be able to start working 
 on his working directory very quickly, while a background job keeps 
 retreiving the history data in the background.

This could actually be a good thing, and can be emulated now with git
clone --depth=1 and subsequent fetches in the background to deepen the
history. I can see some value in clone doing this by itself, first doing
a depth=1 fetch, then launching itself into the background, giving you a
worktree to play with earlier.

-- 
Dennis Kaarsemaker
http://www.kaarsemaker.net

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html