- **status**: in-progress --> review
- **Comment**:

Basically, we populate these collection whenever repo update happens (via hook 
and UI) and use it to display info to users, except for log view, which talks  
directly to SCM.

Query usage pretty much boils down to these controllers:

- `MergeRequestController`:
    - `index` - uses `Commit` in template 
'jinja:allura:templates/repo/merge_request.html' via `req.commits`
    - `do_request_merge_edit` - uses `Commit`
- `BranchBrowser`:
    - `log` - uses `Commit`
- `CommitBrowser`:
    - `__init__` - uses `Commit`
    - `index`:
        - uses `Tree` via `self._commit.tree`
        - `DiffInfoDoc` via `self._commit.paged_diffs`
    - `basic` - `DiffInfoDoc` (in `commit_basic.html` via `commit.diffs`)
- `TreeBrowser`:
    - uses `Tree` (passed to `__init__`, in lookup uses `Tree.__getitem__`)
    - `LastCommit` in `Allura/allura/templates/widgets/repo/tree_widget.html` 
(via `tree.ls()`)
- `FileBrowser` - uses `Tree` & `Commit`

Git, Hg, SVN (all of has a version of this):

- `BranchBrowser` - uses `Commit` via `repo.latest`

Also:

- Macro for including file from a repo 
`Allura/allura/lib/macro.py:include_file` uses `Commit.get_path`, which use 
`Commit.get_tree`, which uses `Tree`
- `Allura/allura/model/stats.py` uses `DiffInfoDoc` via `commit.diffs`, but I 
don't sure where `stats.py` is used

The above include only places that query one or more of those collections to 
display something useful to user. We should be able to get rid of these 
collection by rewriting those controllers/templates above (I might have missed 
something so we should estimate it a bit higher). 

More about overall usage below:

##### `TreesDoc, repo_trees`

- `ForgeSVN/forgesvn/model/svn.py:compute_tree_new` (upserts)
    - `Commit.get_tree()`
        - `Commit.tree`
            - almost everywhere
        - `Commit.get_path()`
            - Diff between two revisions of the same file (`FileBrowser.diff`)
            - Macro for including file from a repo 
`Allura/allura/lib/macro.py:include_file`
            - `Commit.has_path()`
                - nowhere
    - `Tree.__getitem__()`
        - `TreeBrowser` controller uses it, maybe also used elsewhere, it's 
hard to tell
  - `Allura/allura/model/repo_refresh.py:refresh_commit_trees` (creates)
      - `refresh_repo`
          - `Repository.refresh`
          - git hook
          - basically whenever repo refresh happens we create `TreesDoc` for 
new commits
- `Allura/allura/model/repo_refresh.py:compute_diffs` (queries)
    - `refresh_repo`
    - script task `Allura/allura/scripts/refresh_last_commits.py`
    - uses info from `TreesDoc` to compute diffs between current and parent 
commits
- `Allura/allura/model/repo_refresh.py:compute_lcds` (gets from `ModelCache`)
    - pretty much same places as `compute_diffs`, computes last commit for each 
tree (presumably to show it on repo browse pages)


##### `Tree, TreeDoc, repo_tree`

- `Allura/allura/model/repo_refresh.py:trees` (query)
    - seems like not used
- `Allura/allura/model/repo_refresh.py:compute_diffs:_update_cache` (query)
    - well, used only inside, to help calculate diffs
- `GitImplementation.refresh_tree_info` (create)
- `HgImplementation.refresh_tree_info` (create)
- `CommitBrowser.__init__` via `Commit.tree`, which always creates new 
`TreeDoc` via `repo.compute_tree_new`
- `SVNImplementation.compute_tree_new` (query & upsert)
- `Tree.__getitem__`
- `Allura/allura/model/repo_refresh.py`:_pull_tree & _update_tree_cache` - 
helpers, so don't really care

##### `LastCommit`, `LastCommitDoc`, `repo_last_commit`

- `compute_lcds` - produces `LastCommit`, which is used in:
    - `SVNImplementation.compute_tree_new` (updates)
    - in `tree_widget.html` (via `tree.ls()`)

##### `DiffInfoDoc`, `repo_diffinfo`

- `compute_diffs` - produces `DiffInfoDoc`, which is used in:
    - `Commit.paged_diffs` (queries)
        - Displays diffs for commit. `CommitBrowser.index`
        - `Commit.diffs`
            - `Allura/allura/model/stats.py` - ?
            - `Allura/allura/templates/repo/commit_basic.html` 
(`CommitBrowser.basic`)
    - `Commit.added_paths` (queries)
        - `LastCommit._prev_commit_id` (only as optimization to exit early if 
prev commit don't exist`
            - `LastCommit._build`
    - `Allura/allura/scripts/refresh_last_commits.py`, 
`Allura/allura/scripts/refreshrepo.py` - only deleting
    - `SVNImplementation.refresh_commit_info` (creates)



---

** [tickets:#7828] Analyze & document usage of repo collections**

**Status:** review
**Milestone:** unreleased
**Labels:** 42cc sf-current sf-2 indexless 
**Created:** Mon Feb 09, 2015 04:13 PM UTC by Dave Brondsema
**Last Updated:** Mon Feb 16, 2015 01:42 PM UTC
**Owner:** Igor Bondarenko

We've done some work in the past for our SCM repos to be "indexless", that is 
use the git/hg/svn repo directly instead of indexing in mongo and using that.  
(Storing in mongo can take up a lot of space and also adds delay for the 
indexing process to run)

Analyze where each of the following collections (models) is used.  Perhaps 
cross-reference by page or function (e.g. browse repo, view commit, etc; also 
git/svn/hg).  Then we can plan which pages' functionality needs to be updated 
to be able to remove them.

Collections (with relative size factors based on sf.net data):

* repo_trees (4x)
* repo_tree (2x)
* repo_last_commit (2x)
* repo_diffinfo (1x)
* repo_ci (very tiny)
* repo_commitrun (very tiny)


---

Sent from forge-allura.apache.org because [email protected] is subscribed 
to https://forge-allura.apache.org/p/allura/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://forge-allura.apache.org/p/allura/admin/tickets/options.  Or, if this is 
a mailing list, you can unsubscribe from the mailing list.

Reply via email to