Hello Eric,

Thanks for all the advices. I have played with several repositories (both on 
32bit and 64bit machines). You were correct most of the memory if used by 
mapped files and yes it doesn't cause any problems, even a 32bit machine with 
500Mb of memory works normally with a heavy loaded git-cat-file.

Thanks also for the advice to use git gc config options, I tested gc.auto=0 and 
it lead to the same behavior as my setting MALLOC_LIMIT, however it is more 
correct way to get this effect =)

I agree that we shouldn't worry about mapped files.

--
Best Regards,
Victor
________________________________________
From: Eric Wong [normalper...@yhbt.net]
Sent: Wednesday, September 23, 2015 12:22 PM
To: Victor Leschuk
Cc: Junio C Hamano; git@vger.kernel.org
Subject: Re: [PATCH] git-svn: make batch mode optional for git-cat-file

Victor Leschuk <vlesc...@accesssoftek.com> wrote:
> Hello Eric, thanks for looking into it.
>
> >> git-cat-file has outgrown the parent perl process several times
> >> (git-cat-file - ~3-4Gb, perl - 400Mb).
>
> > Ugh, that sucks.
> > Even the 400Mb size of Perl annoys me greatly and I'd work
> > on fixing it if I had more time.
>
> I was going to look at this problem also, but first I'd like to improve the 
> situation with cat-file as on large repos it is larger problem. By the way, 
> what direction would you suggest to begin with?

See below :)

<snip anonymous memory stuff, it doesn't seem to be a culprit>

> > > git-cat-file has outgrown the parent perl process several times
> > > (git-cat-file - ~3-4Gb, perl - 400Mb).
>
> > How much of that is anonymous memory, though?
>
> Haven't measured on this particular repo: didn't redo the 2 week
> experiment =) However I checked on a smaller test repo and anon memory
> is about 12M out of 40M total. Most of memory is really taken by
> mmaped *.pack and *idx files.

If it's mmap-ed files, that physical memory is only used on-demand
and can be dropped at any time because it's backed by disk.

In other words, I would not worry about any file-backed mmap at all
(unless you're on 32-bit, but I think git has workarounds for that)

Do you still have that giant repo around?

Are the combined size of the pack + idx files are at least 3-4 GB?

This should cat all the blobs in history without re-running git-svn:

        git log --all --raw -r --no-abbrev | \
          awk '/^:/ {print $3; print $4}' | git cat-file --batch

git log actually keeps growing, but the cat-file process shouldn't
use anonymous memory much if you inspect it with pmap.

> Actually I accidentally found out that if I export GIT_MALLOC_LIMIT
> variable set to several megabytes it has the following effect:

>  * git-svn.perl launches git-gc
>  * git-gc can't allocate enough memory and thus doesn't create any pack files
>  * git-cat-file works only with pure blob object, not packs, and it's
> memory usage doesn't grow larger than 4-5M
>
> It gave me a thought that maybe we could get rid of "git gc" calls
> after each commit in perl code and just perform one large gc operation
> at the end. It will cost disk space during clone but save us memory.
> What do you think?

You can set gc.auto to zero in your $GIT_CONFIG to disable gc.
The "git gc" calls were added because unpacked repos were growing
too large and caused problems for other people.

Perhaps play with some other pack* options documented in
Documentation/config to limit maximum pack size/depth.

Is this a 32-bit or 64-bit system?

> As for your suggestion regarding periodic restart of batch process
> inside git-cat-file, I think we could give it a try, I can prepare a
> patch and run some tests.

I am not sure if we need it for git-svn.

In another project, the only reason I've found to restart
"cat-file --batch" is in case the repo got repacked and old packs
got unlinked, cat-file would hold a reference onto the old file
and suck up space.   It might be better if "cat-file --batch" learned
to detect unlinked files and then munmap + close them.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to