- **status**: review --> in-progress
- **Comment**:
The results here are great. Including the repo refresh backend logic. But it
is several changes and some quite big changes, and so naturally there's a good
handful of tweaks needed to polish it up:
#### general
* Now the commit view doesn't show binary diffs, good. But the table listing
all the files has binary files linked up still and the links don't go anywhere.
* Can you add a test for the `has_html_view` method's new functionality for
fast binary detection?
* "refresh" logic is fast now too, yay!
* I guess this should be a separate ticket, but it'd be nice to sort by
filename across all change types, instead of showing adds, then removes, etc.
Maybe same ticket as displaying copies vs renames better.
* Down in the diff list, it says "File was copied or renamed." We should be
able to say exactly which now.
* A rename shows up as `{'new': u'README.txt', 'old': u'README', 'diff': '',
'ratio': 1}` in the diff section and also says `Can't load diff`
* Is it ok that we set diff to `''` in many places?
#### hg & svn
* The `[:]` slice would be better on the `for` loop than the `if` line right?
#### hg
* cleanup: move imports to top of file
#### git
* Testing with walrustech repo, in the 2nd commit, only the `Flan` dir shows up
as having changes. Nothing shown for `options.txt` or `bin/` or `mods/` but
they did have changes. You can see this with ?limit=1000. And if you use the
default limit, the pages at the end are all blank.
* I think we don't want to use `--find-copies-harder`
* Performance wise on a big repo my timing measurement is 0m0.035s without
it and 0m0.135s with it. Noticable but not huge
* A bigger impact is the semantics of it. It can make an incorrect
association of files being "copied" if the contents are common contents. A
very good example of common contents is no content, an empty file. I've found
a diff that says one `__init__.py` file was copied to another, but really it's
just a new file. And another file that is new but has a lot of test
boilerplate so git thinks its a 56% similar copy. Thus I think we should drop
`--find-copies-harder`
* After doing a straight copy or rename in git and committing it, I get:
~~~~
File
'/home/dbrondsema/dbrondsema-1019/forge/ForgeGit/forgegit/model/git_repo.py',
line 682 in paged_diffs
for i in xrange(0, result['total'] + 1, 2)]
IndexError: list index out of range
~~~~
---
** [tickets:#7925] Speed up diff processing with binary files**
**Status:** in-progress
**Milestone:** unreleased
**Labels:** sf-2 sf-current performance
**Created:** Mon Jul 13, 2015 03:04 PM UTC by Heith Seewald
**Last Updated:** Mon Jul 27, 2015 08:28 PM UTC
**Owner:** Heith Seewald
In a git repo with a large amount of binary files, our diff processing can be
very inefficient. We should test if a file is binary and exclude it from the
diff processing section.
---
Sent from forge-allura.apache.org because [email protected] is subscribed
to https://forge-allura.apache.org/p/allura/tickets/
To unsubscribe from further messages, a project admin can change settings at
https://forge-allura.apache.org/p/allura/admin/tickets/options. Or, if this is
a mailing list, you can unsubscribe from the mailing list.