Re: [git-sizer] Implications of a large commit object

2018-03-14 Thread Jeff King
On Wed, Mar 14, 2018 at 09:33:32AM +0100, Michael Haggerty wrote:

> Maybe your migration tool created a huge commit message, for example
> listing each of the files that was changed.
> 
> AFAIK this won't cause Git itself any problems, but it's likely to be
> inconvenient. For example, when you type `git log` and 7 million
> characters page by. Or when you use some GUI tool to view your history
> and it performs badly because it wasn't built to handle such enormous
> commit messages.

Probably one such commit won't break the bank, but it will make history
traversals that cross it slower (e.g., "--contains", merge-bases, etc).
We'll load the whole 7MB object just to find its parents. If you imagine
the average commit object is more like 1k and that current traversals
bottleneck on loading the commit object bytes (both of which I think are
roughly accurate), then crossing that one commit in a traversal is
equivalent to crossing 7000 "normal" commits in cost.

At least until Stolee's serialized commit graph work is merged, at which
point it will only be expensive if we actually try to show the commit
message for that particular object.

-Peff


Re: [git-sizer] Implications of a large commit object

2018-03-14 Thread Lars Schneider

> On 14 Mar 2018, at 09:33, Michael Haggerty  wrote:
> 
> On Wed, Mar 14, 2018 at 9:14 AM, Lars Schneider
>  wrote:
>> I am using Michael's fantastic Git repo analyzer tool "git-sizer" [*]
>> and it detected a very large commit of 7.33 MiB in my repo (see chart
>> below).
>> 
>> This large commit is expected. I've imported that repo from another
>> version control system but excluded all binary files (e.g. images) and
>> some 3rd party components as their history is not important [**]. I've
>> reintroduced these files in the head commit again. This is where the
>> large commit came from.
>> 
>> This repo is not used in production yet but I wonder if this kind of
>> approach can cause trouble down the line? Are there any relevant
>> implication of a single large commit like this in history?
>> [...]
>> 
>> ###
>> ## git-sizer output
>> 
>> [...]
>> | Name | Value | Level of concern   |
>> |  | - | -- |
>> [...]
>> | Biggest objects  |   ||
>> | * Commits|   ||
>> |   * Maximum size [1] |  7.33 MiB | !! |
>> [...]
> 
> The "commit size" that is being referred to here is the size of the
> actual commit object; i.e., the author name, parent commits, etc plus
> the log message. So a huge commit probably means that you have a huge
> log message. This has nothing to do with the number or sizes of the
> files added by the commit.
> 
> Maybe your migration tool created a huge commit message, for example
> listing each of the files that was changed.


D'oh! Of course. I was so focused on that commit with the large number of
files that I missed that. Looking at the reference [1] reveals the
problem. Sorry for wasting your time!


> AFAIK this won't cause Git itself any problems, but it's likely to be
> inconvenient. For example, when you type `git log` and 7 million
> characters page by. Or when you use some GUI tool to view your history
> and it performs badly because it wasn't built to handle such enormous
> commit messages.


Thank you,
Lars


Re: [git-sizer] Implications of a large commit object

2018-03-14 Thread Michael Haggerty
On Wed, Mar 14, 2018 at 9:14 AM, Lars Schneider
 wrote:
> I am using Michael's fantastic Git repo analyzer tool "git-sizer" [*]
> and it detected a very large commit of 7.33 MiB in my repo (see chart
> below).
>
> This large commit is expected. I've imported that repo from another
> version control system but excluded all binary files (e.g. images) and
> some 3rd party components as their history is not important [**]. I've
> reintroduced these files in the head commit again. This is where the
> large commit came from.
>
> This repo is not used in production yet but I wonder if this kind of
> approach can cause trouble down the line? Are there any relevant
> implication of a single large commit like this in history?
> [...]
>
> ###
> ## git-sizer output
>
> [...]
> | Name | Value | Level of concern   |
> |  | - | -- |
> [...]
> | Biggest objects  |   ||
> | * Commits|   ||
> |   * Maximum size [1] |  7.33 MiB | !! |
> [...]

The "commit size" that is being referred to here is the size of the
actual commit object; i.e., the author name, parent commits, etc plus
the log message. So a huge commit probably means that you have a huge
log message. This has nothing to do with the number or sizes of the
files added by the commit.

Maybe your migration tool created a huge commit message, for example
listing each of the files that was changed.

AFAIK this won't cause Git itself any problems, but it's likely to be
inconvenient. For example, when you type `git log` and 7 million
characters page by. Or when you use some GUI tool to view your history
and it performs badly because it wasn't built to handle such enormous
commit messages.

Michael


[git-sizer] Implications of a large commit object

2018-03-14 Thread Lars Schneider
Hi,

I am using Michael's fantastic Git repo analyzer tool "git-sizer" [*]
and it detected a very large commit of 7.33 MiB in my repo (see chart 
below).

This large commit is expected. I've imported that repo from another
version control system but excluded all binary files (e.g. images) and
some 3rd party components as their history is not important [**]. I've 
reintroduced these files in the head commit again. This is where the 
large commit came from.

This repo is not used in production yet but I wonder if this kind of
approach can cause trouble down the line? Are there any relevant
implication of a single large commit like this in history? 

Thanks,
Lars


 [*] https://github.com/github/git-sizer
[**] I know some of this stuff shouldn't be in the repo in the first 
 place, but I am constrained in the things I can change.


###
## git-sizer output

Processing blobs: 543782
Processing trees: 517104
Processing commits: 43365
Matching commits to trees: 43365
Processing annotated tags: 3
Processing references: 123
| Name | Value | Level of concern   |
|  | - | -- |
| Overall repository size  |   ||
| * Blobs  |   ||
|   * Total size   |  18.8 GiB | ** |
|  |   ||
| Biggest objects  |   ||
| * Commits|   ||
|   * Maximum size [1] |  7.33 MiB | !! |
| * Trees  |   ||
|   * Maximum entries  [2] |  6.84 k   | ** |
|  |   ||
| History structure|   ||
| * Maximum tag depth  [3] | 1 | *  |
|  |   ||
| Biggest checkouts|   ||
| * Number of directories  [4] |  21.9 k   | ** |
| * Maximum path depth [4] |18 | *  |
| * Maximum path length[5] |   225 B   | ** |
| * Number of files[4] |   256 k   | *  |
| * Total size of files[6] |  2.08 GiB | ** |