On Tue, Apr 18, 2017 at 06:14:36PM +0200, Lars Schneider wrote: > > Both Git and the filter are going to have to keep these paths in memory > > somewhere, be that in-process, or on disk. That being said, I can see > > potential > > troubles with a large number of long paths that exceed the memory available > > to > > Git or the filter when stored in a hashmap/set. > > > > On Git's side, I think trading that for some CPU time might make sense. If > > Git > > were to SHA1 each path and store that in a hashmap, it would consume more > > CPU > > time, but less memory to store each path. Git and the filter could then > > exchange > > path names, and Git would simply SHA1 the pathname each time it needed to > > refer > > back to memory associated with that entry in a hashmap. > > I would be surprised if this would be necessary. If we filter delay 50,000 > files (= a lot!) with a path length of 1000 characters (= very long!) then we > would use 50MB plus some hashmap data structures. Modern machines should have > enough RAM I would think...
I agree, and thanks for correcting my thinking here. I ran a simple command to get the longest path names in a large repository, as: $ find . -type f | awk '{ print length($1) }' | sort -r -n | uniq -c And found a few files close to the 200 character mark as the longest pathnames in the repository. I think 50k files at 1k bytes per pathname is quite enough head-room :-). -- Thanks, Taylor Blau