Any puppet users? Drafting PPG

2013-04-23 Thread Martin Langhoff
Puppet is often used with git as the mechanism to publish/distribute
the configuration. This sidesteps the not-very-scalable central Puppet
server.

But the use of git isn't sophisticated in the least. Git can help in a
few ways, IMO, and this is my initial approach at the topic:

https://groups.google.com/forum/?fromgroups#!topic/puppet-users/OilxMytnD_k

No fun in building this bike shed all alone :-)



m
--
 martin.langh...@gmail.com
 -  ask interesting questions
 - don't get distracted with shiny stuff  - working code first
 ~ http://docs.moodle.org/en/User:Martin_Langhoff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Pitfalls in auto-fast-forwarding heads that are not checked out?

2013-05-03 Thread Martin Langhoff
I am building a small git wrapper around puppet, and one of the
actions it performs is auto-fastforwarding of branches without
checking them out.

In simplified code... we ensure that we are on a head called master,
and in some cases ppg commit, will commit to master and...

  ## early on
  # sanity-check we are on master
  headname=$(git rev-parse --symbolic-full-name --revs-only HEAD)
  if [ $headname -ne refs/heads/headname ]; then
  echo 2 ERROR: can only issue --immediate commit from the
master branch!
  exit 1
  fi

  ## then
  git commit -bla blarg baz

  ## and then...

  # ensure we can ff
  head_sha1=$(git rev-parse --revs-only master)
  mb=$(git merge-base $production_sha1 refs/heads/master)
  if [[ $mb -ne $production_sha1 ]]; then
  echo 2 ERROR: cannot fast-forward master to production
  exit 1
  fi
  $GIT_EXEC_PATH/git-update-ref -m ppg immediate commit
refs/heads/production $head_sha1 $production_sha1 || exit 1

Are there major pitfalls in this approach? I cannot think of any, but
git has stayed away from updating my local tracking branches; so maybe
there's a reason for that...

cheers,


m
--
 martin.langh...@gmail.com
 -  ask interesting questions
 - don't get distracted with shiny stuff  - working code first
 ~ http://docs.moodle.org/en/User:Martin_Langhoff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Pitfalls in auto-fast-forwarding heads that are not checked out?

2013-05-04 Thread Martin Langhoff
On Sat, May 4, 2013 at 3:34 AM, Johannes Sixt j...@kdbg.org wrote:
 You mean refs/heads/master and != here because -ne is numeric
 comparison in a shell script.

thanks! Yeah, I fixed those up late last night :-)

 Since git 1.8.0 you can express this check as

 if git merge-base --is-ancestor $production_sha1 refs/heads/master

Ah, that's great! Unfortunate it's not there in earlier / more widely
used releases of git.

 Are there major pitfalls in this approach?

 I don't think there are.

Thanks...

 I cannot think of any, but
 git has stayed away from updating my local tracking branches; so maybe
 there's a reason for that...

 I don't understand what you are saying here. What is that?

When I do git pull, git is careful to only update the branch I have
checked out (if appropriate). It leaves any other branches that track
branches on the remote that has just been fetched untouched. I always
thought that at some point git pull would learn to evaluate those
branches and auto-merge them if the merge is a ff.

I would find that a natural bit of automation in git pull. Of course
it would mean a change of semantics, existing scripts could be
affected.

cheers,



m
--
 martin.langh...@gmail.com
 -  ask interesting questions
 - don't get distracted with shiny stuff  - working code first
 ~ http://docs.moodle.org/en/User:Martin_Langhoff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


offtopic: ppg design decisions - encapsulation

2013-05-06 Thread Martin Langhoff
[ Unashamedly offtopic... asking here because I like git design and
coding style, and ppg is drawing plenty of inspiration from the old
git shell scripts. Please kindly flame me privately... ]

ppg is a wrapper around git to maintain and distribute Puppet configs,
adding a few niceties.

Now, ppg will actuall manage two git repositories -- one for the
puppet configs, the second one for ferrying back the puppet run
results to the originating repo (were they get loaded in a puppet
dashboard server for cute webbased reporting). The puppet config repo
is a normally-behaved git repo. The reports repo is a bit of a hack
-- never used directly by the user, it will follow a store-and-forward
scheme, where I should trim old history or just use something other
than git.

So I see two possible repo layouts:

- Transparent, nested
 .git/ # holding puppet configs, allows direct use of git commands
 .git/reports.git # nested inside puppet configs repo

- Mediated, parallel
 .ppg/puppet.git # all git commands must be wrapped
 .ppg/reports.git

My laziness and laisses-faire take on things drives to to use the
transparentnested approach. Let the user do anything in there
directly with git.

OTOH, the mediated approach allows for more complete support,
including sanity checks on commands that don't have hooks available. I
already have a /usr/bin/ppg wrapper, which I could use to wrap all git
commands, setting GIT_DIR=.ppg/puppet.git for all ops. It would force
ops to be from the top of the tree (unless I write explicit support)
and I would have to implement explicit. And it would break related
tools that are not mediated via /usr/bin/git (gitk!).

Written this way, it seems to be a minimal lazy approach vs DTRT.

Am I missing any important aspect or option? Thoughts?

cheers,



m
--
 martin.langh...@gmail.com
 -  ask interesting questions
 - don't get distracted with shiny stuff  - working code first
 ~ http://docs.moodle.org/en/User:Martin_Langhoff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Pitfalls in auto-fast-forwarding heads that are not checked out?

2013-05-07 Thread Martin Langhoff
On Sat, May 4, 2013 at 2:51 PM, Jonathan Nieder jrnie...@gmail.com wrote:
 Another trick is to use git push:
 git push . $production_sha1:refs/heads/master

Great trick -- thanks! In use in ppg now :-)



m
--
 martin.langh...@gmail.com
 -  ask interesting questions
 - don't get distracted with shiny stuff  - working code first
 ~ http://docs.moodle.org/en/User:Martin_Langhoff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: offtopic: ppg design decisions - encapsulation

2013-05-07 Thread Martin Langhoff
On Mon, May 6, 2013 at 11:53 AM, John Keeping j...@keeping.me.uk wrote:
 I'm not sure I fully understand what the reports are, but it sounds like
 they are closely related to original configuration commits.  If that is
 the case, have you considered using Git notes instead of a separate
 repository?

Interesting suggestion! I read up on git-notes.

Yes, reports are closely related to a commit -- it's a lot of the
execution of puppet with that config on a client node. At the same
time, we have one report per change deployment, per client -- with
thousands of clients. So it will be a large dataset, and a transient
one -- I intend to use git as a store-and-forward mechanism for the
reports, and it is safesane to forget old reports.

I don't see much ease-of-expiry in the notes, so I guess I would have
to write that myself, which complicates things a bit :-)

cheers,


m
--
 martin.langh...@gmail.com
 -  ask interesting questions
 - don't get distracted with shiny stuff  - working code first
 ~ http://docs.moodle.org/en/User:Martin_Langhoff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] gitk: add support for -G'regex' pickaxe variant

2013-05-07 Thread Martin Langhoff
I just did git rebase origin/master for the umpteenth time, which
reminded me this nice patch is still pending.

ping?



m

On Thu, Jun 14, 2012 at 2:34 PM, Zbigniew Jędrzejewski-Szmek
zbys...@in.waw.pl wrote:
 From: Martin Langhoff mar...@laptop.org

 git log -G'regex' is a very usable alternative to the classic
 pickaxe. Minimal patch to make it usable from gitk.

 [zj: reword message]
 Signed-off-by: Zbigniew Jędrzejewski-Szmek zbys...@in.waw.pl
 ---
 Martin's off on holidays, so I'm sending v2 after rewording.

  gitk-git/gitk | 5 -
  1 file changed, 4 insertions(+), 1 deletion(-)

 diff --git a/gitk-git/gitk b/gitk-git/gitk
 index 22270ce..24eaead 100755
 --- a/gitk-git/gitk
 +++ b/gitk-git/gitk
 @@ -2232,7 +2232,8 @@ proc makewindow {} {
  set gm [makedroplist .tf.lbar.gdttype gdttype \
 [mc containing:] \
 [mc touching paths:] \
 -   [mc adding/removing string:]]
 +   [mc adding/removing string:] \
 +   [mc with changes matching regex:]]
  trace add variable gdttype write gdttype_change
  pack .tf.lbar.gdttype -side left -fill y

 @@ -4595,6 +4596,8 @@ proc do_file_hl {serial} {
 set gdtargs [concat -- $relative_paths]
  } elseif {$gdttype eq [mc adding/removing string:]} {
 set gdtargs [list -S$highlight_files]
 +} elseif {$gdttype eq [mc with changes matching regex:]} {
 +   set gdtargs [list -G$highlight_files]
  } else {
 # must be containing:, i.e. we're searching commit info
 return
 --
 1.7.11.rc3.129.ga90bc7a.dirty

 --
 To unsubscribe from this list: send the line unsubscribe git in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html



-- 
 martin.langh...@gmail.com
 -  ask interesting questions
 - don't get distracted with shiny stuff  - working code first
 ~ http://docs.moodle.org/en/User:Martin_Langhoff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Misusing git: trimming old commits

2013-05-09 Thread Martin Langhoff
I am misusing git as a store-and-forward tool to transfer reports to a
server in a resilient manner.

The context is puppet (and ppg, I've spammed the list about it... ).
The reports are small, with small deltas, but created frequently.

With the exaction of the final destination, I want to expire reports
that are old and successfully transferred.

My current approach is (bash-y pseudocode):

  git push bla bla || exit $?
  prunehash=$(git log -n1 --until=one.month.ago --pretty=format:%H)
  test -z $prunehash  exit 0
  # prep empty commit
  # skip git read-tree --empty, we get the same with
  export GIT_INDEX_FILE=/does/not/exist
  empty_tree=$(git write-tree)
  unset GIT_INDEX_FILE
  empty_commit=$(git commit-tree -m empty $empty_tree)
  echo ${prunehash} ${empty_commit}  .git/info/grafts
  git gc
  # TODO: cleanup stale grafts :-)

is this a reasonable approach?



m
--
 martin.langh...@gmail.com
 -  ask interesting questions
 - don't get distracted with shiny stuff  - working code first
 ~ http://docs.moodle.org/en/User:Martin_Langhoff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Misusing git: trimming old commits

2013-05-11 Thread Martin Langhoff
On Thu, May 9, 2013 at 11:40 AM, Martin Langhoff
martin.langh...@gmail.com wrote:
 With the exaction of the final destination, I want to expire reports
 that are old and successfully transferred.

OK, that took some effort to make work. Make sure you are not using
reflogs (or that reflogs are promptly expired).

# right after a successful push of all heads to the receiving server...
for head in $(git branch|sed 's/^..//'); do
# FIXME period
graft_sha1=$(git log --until=one.month.ago -n1 --pretty=format:%H ${head})
if [[ -z $graft_sha1 ]]; then
# nothing to prune
continue
fi
# is it already grafted?
if grep -q ^${graft_sha1}  ${GIT_DIR}/info/grafts /dev/null ; then
# don't duplicate the graft
continue
fi
some_grafted='true'
# prep empty commit
# skip git read-tree --empty, we get the same with
export GIT_INDEX_FILE=/tmp/ppg-emptytree.$$
empty_tree=$(git write-tree)
rm ${GIT_INDEX_FILE}
unset GIT_INDEX_FILE
empty_commit=$(git commit-tree -m empty ${empty_tree})
echo ${graft_sha1} ${empty_commit}  ${GIT_DIR}/info/grafts
done

if [[ -z $some_grafted ]]; then
# nothing to do
exit 0
fi

# ppg-repack makes the unreachable objects loose
# (it is git-repack hacked to remove --keep-true-parents),
# git prune --expire actually deletes them
$PPG_EXEC_PATH/ppg-repack -AFfd
git prune --expire=now

### Cleanup stale grafts
# current grafts points are reachable,
# pruned graft points (made obsolete by a newer graft)
# cannot be retrieved and git cat-file exit code is 128
touch ${GIT_DIR}/info/grafts.tmp.$$
while read line; do
graftpoint=$(echo ${line} | cut -d' ' -f1)
if git cat-file commit ${graftpoint} /dev/null ; then
echo ${line}  ${GIT_DIR}/info/grafts.tmp.$$
fi
done  ${GIT_DIR}/info/grafts

if [ -s ${GIT_DIR}/info/grafts.tmp.$$ ]; then
mv ${GIT_DIR}/info/grafts.tmp.$$ ${GIT_DIR}/info/grafts
fi

Perhaps it helps someone else trying to run git as a spooler :-)

cheers,



m
--
 martin.langh...@gmail.com
 -  ask interesting questions
 - don't get distracted with shiny stuff  - working code first
 ~ http://docs.moodle.org/en/User:Martin_Langhoff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] gitk: add support for -G'regex' pickaxe variant

2013-05-13 Thread Martin Langhoff
On Mon, May 13, 2013 at 2:55 PM, Jonathan Nieder jrnie...@gmail.com wrote:
 My experience is the opposite.  I wonder What did the author of this
 nonsense comment mean? or What is the purpose of this strange
 condition in this if () statement?.  Then git log -S finds the
 culprit

Only if that if () statement looks that way from a single commit.
That's my point. If the line code bit you are looking at is the result
of several changes, your log -S will grind a while and find you
nothing.

cheers,



m
--
 mar...@laptop.org
 - ask interesting questions
 - don't get distracted with shiny stuff  - working code first
 - http://wiki.laptop.org/go/User:Martinlanghoff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] gitk: add support for -G'regex' pickaxe variant

2013-05-13 Thread Martin Langhoff
On Mon, May 13, 2013 at 3:33 PM, Jonathan Nieder jrnie...@gmail.com wrote:
 Well, no, it should find the final change that brought it into the
 current form.  Just like git blame.

 Has it been finding zero results in some cases where the current code
 matches the pattern?  That sounds like a bug.

Ummm, maybe. You are right, with current git it does work as I would
expect (usefully ;-) ).

I know I struggled quite a bit with log -S not finding stuff I thought
it should and that log -G did find, back a year ago.

Damn, I don't have a precise record of what git it was on, nor a good
repro example. Too long ago,



m
--
 martin.langh...@gmail.com
 -  ask interesting questions
 - don't get distracted with shiny stuff  - working code first
 ~ http://docs.moodle.org/en/User:Martin_Langhoff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Fwd: git cvsimport implications

2013-05-17 Thread Martin Langhoff
On Fri, May 17, 2013 at 5:10 AM, Michael Haggerty mhag...@alum.mit.edu wrote:
 For one-time imports, the fix is to use a tool that is not broken, like
 cvs2git.

As one of the earlier maintainers of cvsimport, I do believe that
cvs2git is less broken, for one-shot imports, than cvsimport. Users
looking for a one-shot import should not use cvsimport as there are
better options there. Myself, I have used parsecvs (long ago, so
perhaps it isn't the best of the crop nowadays).

TBH, I am puzzled and amused at all the chest-thumping about cvs
importers. Yeah, yours is a bit better or saner, but we all wade in
the muddle of essentially broken data. So is not broken is rather
misleading when talking to end users. It carries so many caveats about
whether it'll work on the users' particular repo that it is not a
generally truthful statement.

I am very glad to hear it is better than cvsimport, and even more glad
to hear its limitations are better understood and documented. It has
had a testsuite for the longest of times!

And very likely has the best chance of success across the available
importers :-)

Oh, and why is cvsimport so vague? Because it is just driven by cvsps.
It creates a repo based on what cvsps understands from the CVS data.

At the time, I looked into trying to use cvs2svn (precursor to
cvs2git) as the CVS read side of cvsimport, but it did not support
incremental imports at all, and it took forever to run.

It was a time when git was new and people were dipping their toes in
the pool, and some developers were pining to use git on projects that
used CVS (like we use git-svn now). Incremental imports were a must.

One of the nice features of cvsimport is that it can do incrementals
on a repo imported with another tool. That earns it a place under the
sun. If it didn't have that, I'd be voting for removal (after a review
that the replacement *is* actually better ;-) across a number of test
repos).

cheers,



m
--
 martin.langh...@gmail.com
 -  ask interesting questions
 - don't get distracted with shiny stuff  - working code first
 ~ http://docs.moodle.org/en/User:Martin_Langhoff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Fwd: git cvsimport implications

2013-05-17 Thread Martin Langhoff
On Fri, May 17, 2013 at 9:34 AM, Andreas Krey a.k...@gmx.de wrote:
 On Fri, 17 May 2013 15:14:58 +, Michael Haggerty wrote:
 ...
 We both know that the CVS history omits important data, and that the
 history is mutable, etc.  So there are lots of hypothetical histories
 that do not contradict CVS.  But some things are recorded unambiguously
 in the CVS history, like

 * The contents at any tag or the tip of any branch (i.e., what is in the
 working tree when you check it out).

 Except that the tags/branches may be made in a way that can't
 be mapped onto any commit/point of history otherwise exported,
 with branches that are done on parts of the trees first, or
 likewise tags.

Yeah, that's what I remember too.  It is perfectly fine in CVS to add
a tag to a file at a much later date than the rest of the tree. And it
happened too (oh, I didn't have directory support/some-os checked out
when I tagged the release yesterday! let me check it out and add the
tag, nevermind that the branch has moved forward in the interim...).

I would add the long history of cvs repository manipulation. Bad,
ugly stuff, but it happened in every major project I've seen. Mozilla,
X.org, etc.

TBH I am very glad that Michael cares deeply about the correctness,
and it leads to a much better tool. No doubt.

When discussing it with end users, I do think we have to be honest and
say that there's a fair chance that the output will not be perfect...
because what is in CVS is rather imperfect when you look at it closely
(which users aren't usually doing).

cheers,



m
--
 martin.langh...@gmail.com
 -  ask interesting questions
 - don't get distracted with shiny stuff  - working code first
 ~ http://docs.moodle.org/en/User:Martin_Langhoff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Can a git push over ssh trigger a gc/repack? Diagnosing pack explosion

2013-11-21 Thread Martin Langhoff
Hi git list,

I am trying to diagnose a strange problem in a VM running as a 'git
over ssh server', with one repo which periodically grows very quickly.

The complete dataset packs to a single pack+index of ~650MB. Growth is
slow, these are ASCII text reports that use a template -- highly
compressible. Reports come from a few dozen machines that log in every
hour.

However, something is happening that explodes the efficient pack into
an ungodly mess.

Do client pushes over git+ssh ever trigger a repack on the server? If
so, these repacking processes are racing with each other and taking
650MB to 7GB at which point we hit ENOSPC, sometimes pom killer joins
the party, etc.

pack dir looks like this, ordered by timestamp:
http://fpaste.org/55730/04636313/

cheers,



m
-- 
 martin.langh...@gmail.com
 -  ask interesting questions
 - don't get distracted with shiny stuff  - working code first
 ~ http://docs.moodle.org/en/User:Martin_Langhoff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Can a git push over ssh trigger a gc/repack? Diagnosing pack explosion

2013-11-21 Thread Martin Langhoff
On Thu, Nov 21, 2013 at 2:52 PM, Junio C Hamano gits...@pobox.com wrote:
  - if it's receiving from many pushers, it races with itself; needs
 some lock or back-off mechanism

 Surely.

 I think these should help:

 64a99eb4 (gc: reject if another gc is running, unless --force is given, 
 2013-08-08)
 4c5baf02 (gc: remove gc.pid file at end of execution, 2013-10-16)

 They should be in the upcoming v1.8.5.

Ah, great to hear. For the record, this hit me on git 1.7.1, current on RHEL6.

thanks!



m
-- 
 martin.langh...@gmail.com
 -  ask interesting questions
 - don't get distracted with shiny stuff  - working code first
 ~ http://docs.moodle.org/en/User:Martin_Langhoff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Can a git push over ssh trigger a gc/repack? Diagnosing pack explosion

2013-11-21 Thread Martin Langhoff
On Thu, Nov 21, 2013 at 10:21 AM, Martin Langhoff
martin.langh...@gmail.com wrote:
 Do client pushes over git+ssh ever trigger a repack on the server?

man git-config
[snip]

   receive.autogc
   By default, git-receive-pack will run git-gc --auto after
   receiving data from git-push and updating refs. You can stop it by
   setting this variable to false.

Oops!

Ok, couple problems here:

 - if it's receiving from many pushers, it races with itself; needs
some lock or back-off mechanism

 - alternatively, an splay mechanism. We have a hard threshold...
given many pushers acting in parallel, they'll all hit the threshold
at the same time. There is no need for this, we could randomize the
threshold by 20%; that would radically reduce the racy-ness

 - auto repack in this scenario has a reasonable likelihood if being
visited by the OOM killer -- therefore it needs to fail more
gracefully, for example with tmpfile cleanup. Perhaps by having the
tmpfiles places in a tmpdir named with the pid of the child would make
this easier...

Naturally, I'll move quickly to disable this evil-spawn-automagic
setting and setup a cronjob. But I think it is possible to have
defaults that work more reliably and with lower risk of explosion.

thoughts?



m
-- 
 martin.langh...@gmail.com
 -  ask interesting questions
 - don't get distracted with shiny stuff  - working code first
 ~ http://docs.moodle.org/en/User:Martin_Langhoff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


git filter-branch --directory-filter oddity

2013-12-03 Thread Martin Langhoff
When using git filter-branch --prune-empty --directory-filter foo/bar
to extract the history of the foo/bar directory, I am getting a very
strange result.

Directory foo/bar is slow moving. Say, 22 commits out of several
thousand. I would like to extract just those 22 commits.

Instead, I get ~1500 commits, which seem to have not been skipped
because they are merge commits. Merges completely immaterial to this
directory.

As they have not been skipped, they are fully fleshed out. By this, I
mean that we have the whole tree in place. So these 22 commits appear
with foo/bar pulled out to the root of the project, in the midst of
1500 commits with a full tree.

The commit diffs make no sense what-so-ever.

Am I doing it wrong?



m
-- 
 martin.langh...@gmail.com
 -  ask interesting questions
 - don't get distracted with shiny stuff  - working code first
 ~ http://docs.moodle.org/en/User:Martin_Langhoff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: git filter-branch --directory-filter oddity

2013-12-03 Thread Martin Langhoff
On Tue, Dec 3, 2013 at 5:44 PM, Martin Langhoff
martin.langh...@gmail.com wrote:
 As they have not been skipped, they are fully fleshed out. By this, I
 mean that we have the whole tree in place. So these 22 commits appear
 with foo/bar pulled out to the root of the project, in the midst of
 1500 commits with a full tree.

IOWs, I am experimenting with something like:

git filter-branch -f -d /tmp/moodle-$RANDOM --prune-empty
--index-filter git ls-files -z | grep -zZ -v '${dirpath}'  | xargs -0
--no-run-if-empty -n100 git rm --quiet --cached --ignore-unmatch 
^v2.1.0 $branches
git filter-branch -f --prune-empty --subdirectory-filter ${dirpath}
^v2.1.0 $branches
git filter-branch -f --commit-filter
~/src/git-filter-branch-tools/remove-pointless-commit.rb \\$@\
^v2.1.0 $branches

perhaps the docs for filter-branch imply, to me at least, that it's a
DWIM tool. I am surprised at having to roll my own on something that
is fairly popular.

cheers,



m
-- 
 martin.langh...@gmail.com
 -  ask interesting questions
 - don't get distracted with shiny stuff  - working code first
 ~ http://docs.moodle.org/en/User:Martin_Langhoff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: git filter-branch --directory-filter oddity

2013-12-04 Thread Martin Langhoff
On Tue, Dec 3, 2013 at 5:44 PM, Martin Langhoff
martin.langh...@gmail.com wrote:
 Am I doing it wrong?

Looks like I was doing something wrong. Apologies about the noise.

cheers,



m
-- 
 martin.langh...@gmail.com
 -  ask interesting questions
 - don't get distracted with shiny stuff  - working code first
 ~ http://docs.moodle.org/en/User:Martin_Langhoff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Publishing filtered branch repositories - workflow / recommendations?

2013-12-04 Thread Martin Langhoff
Hi folks.

currently working on a project based on Moodle (the LMS that got me
into git in the first place). This is a highly modular software, and I
would like to maintain a bunch of out of tree modules in a single
repository, and be able to publish them in per-module repositories.

So I would like to maintain a tree with looking like

  auth/foomatic/{code}
  mod/foomatic/{code}

where I can develop, branch and tag all the foomatic code together.
Yet, at release time I want to _also_ publish two repos

  auth-foomatic.git
  mod-foomatic.git

each of them with matching tags and code at the root of the git
tree, and ideally with a truthful history (i.e.: similar to having run
git filter-branch --subdirectory-filter, but able to update that
filtered history incrementally).

Is there a reasonable approach to scripting this?

Alternatively, has git submodule been improved so that it's usable by
mere mortals (i.e.: my team), or are there strong alternatives to git
submodule?

cheers,



m
-- 
 martin.langh...@gmail.com
 -  ask interesting questions
 - don't get distracted with shiny stuff  - working code first
 ~ http://docs.moodle.org/en/User:Martin_Langhoff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Publishing filtered branch repositories - workflow / recommendations?

2013-12-05 Thread Martin Langhoff
On Wed, Dec 4, 2013 at 6:01 PM, Martin Langhoff
martin.langh...@gmail.com wrote:
 Is there a reasonable approach to scripting this?

Found my answers.

The 'subtree' merge strategy is smart enough to mostly help here.
However, it does not handle new files created in the subdirectory.

My workflow is this one. It is similar to the recipes for the subtree
merge strategies, but invoking git mv to pull files out of the

git merge -s ours --no-commit upstream/branch
git read-tree --prefix= -u upstream/branch
git mv mysubdir/* ./ ### read-tree can't do this
git commit

... time passes

git merge -s subtree --no-commit upstream/branch
if [ -d mysubdir ]; then
# handle new files
git mv mysubdir/* ./
fi
git commit

glad that I ended up reading a lot about subtree.

cheers,



m
-- 
 martin.langh...@gmail.com
 -  ask interesting questions
 - don't get distracted with shiny stuff  - working code first
 ~ http://docs.moodle.org/en/User:Martin_Langhoff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Publishing filtered branch repositories - workflow / recommendations?

2013-12-05 Thread Martin Langhoff
On Thu, Dec 5, 2013 at 2:18 PM, Jens Lehmann jens.lehm...@web.de wrote:
 Without knowing more I can't think of a reason why submodules should
 not suit your use case (but you'd have to script branching and tagging
 yourself until these commands learn to recurse into submodules too).

The submodules feature is way too fiddly and has abundant gotchas.

I am diving into subtrees, and finding it a lot more workable.

cheers,



m
-- 
 martin.langh...@gmail.com
 -  ask interesting questions
 - don't get distracted with shiny stuff  - working code first
 ~ http://docs.moodle.org/en/User:Martin_Langhoff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Publishing filtered branch repositories - workflow / recommendations?

2013-12-05 Thread Martin Langhoff
On Thu, Dec 5, 2013 at 2:54 PM, Jens Lehmann jens.lehm...@web.de wrote:
 Am 05.12.2013 20:27, schrieb Martin Langhoff:
 On Thu, Dec 5, 2013 at 2:18 PM, Jens Lehmann jens.lehm...@web.de wrote:
 Without knowing more I can't think of a reason why submodules should
 not suit your use case (but you'd have to script branching and tagging
 yourself until these commands learn to recurse into submodules too).

 The submodules feature is way too fiddly and has abundant gotchas.

 Care to explain what bothers you the most? Being one of the people
 improving submodules I'm really interested in hearing more about that.

Very glad to hear submodules is getting TLC! I have other scenarios at
$dayjob where I may need submodules, so happy happy.

I may be unaware of recent improvements, here's my (perhaps outdated) list

 - git clone does not clone existing submodules by default. An ideal
workflow assumes that the user wants a fully usable checkout.

 - git pull does not fetchupdate all submodules (assuming a trivial
tracking repos scenario)

 - git push does not warn if you forgot to push commits in the submodule

there's possibly a few others that I've forgotten. The main issue is
that things that are conceptually simple (clone, git pull with no
local changes) are very fiddly. Our new developers, testers and
support folks hurt themselves with it plenty.

I don't mind complex scenarios being complex to handle. If you hit a
nasty merge conflict in your submodule, and that's gnarly to resolve,
that's not a showstopper.


While writing this email, I reviewed Documentation/git-submodule.txt
in git master, and it does seem to have grown some new options. I
wonder if there is a tutorial with an example workflow anywhere
showing the current level of usability. My hope is actually for some
bits of automagic default behaviors to help things along (rather than
new options)...

Early git was very pedantic, and over time it learned some DWIMery.
You're giving me hope that similar smarts might have grown in around
submodule support ...

cheers,



m
-- 
 martin.langh...@gmail.com
 -  ask interesting questions
 - don't get distracted with shiny stuff  - working code first
 ~ http://docs.moodle.org/en/User:Martin_Langhoff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


gitignore excludes not working?

2013-12-06 Thread Martin Langhoff
Tested with git 1.7.12.4 (Apple Git-37) and git 1.8.3.1 on F20.

$ mkdir foo
$ cd foo
$ git init
Initialized empty Git repository in /tmp/foo/.git/
$ mkdir -p modules/boring
$ mkdir -p modules/interesting
$ touch modules/boring/lib.c
$ touch modules/interesting/other.c
$ touch modules/interesting/lib.c
$ git add modules/interesting/lib.c
$ git status
# On branch master
#
# Initial commit
#
# Changes to be committed:
#   (use git rm --cached file... to unstage)
#
# new file:   modules/interesting/lib.c
#
# Untracked files:
#   (use git add file... to include in what will be committed)
#
# modules/boring/
# modules/interesting/other.c

$ echo '/modules/'  .gitignore
$ echo '!/modules/interesting/'  .gitignore

On this git status, I would expect to see modules/interesting/other.c
listed as untracked, however:

$ git status
# On branch master
#
# Initial commit
#
# Changes to be committed:
#   (use git rm --cached file... to unstage)
#
# new file:   modules/interesting/lib.c
#
# Untracked files:
#   (use git add file... to include in what will be committed)
#
# .gitignore


thoughts?



m
-- 
 martin.langh...@gmail.com
 -  ask interesting questions
 - don't get distracted with shiny stuff  - working code first
 ~ http://docs.moodle.org/en/User:Martin_Langhoff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Publishing filtered branch repositories - workflow / recommendations?

2013-12-06 Thread Martin Langhoff
On Fri, Dec 6, 2013 at 3:48 AM, Jens Lehmann jens.lehm...@web.de wrote:
 Right you are, we need tutorials for the most prominent use cases.

In the meantime, are there any hints? Emails on this list showing a
current smart workflow? Blog posts? Notes on a wiki?

 Early git was very pedantic, and over time it learned some DWIMery.
 You're giving me hope that similar smarts might have grown in around
 submodule support ...

 That's what we are aiming at :-)

That is fantastic! Thank you.



m
-- 
 martin.langh...@gmail.com
 -  ask interesting questions
 - don't get distracted with shiny stuff  - working code first
 ~ http://docs.moodle.org/en/User:Martin_Langhoff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: I have end-of-lifed cvsps

2013-12-11 Thread Martin Langhoff
On Wed, Dec 11, 2013 at 7:17 PM, Eric S. Raymond e...@thyrsus.com wrote:
 I tried very hard to salvage this program - the ability to
 remote-fetch CVS repos without rsync access was appealing

Is that the only thing we lose, if we abandon cusps? More to the
point, is there today an incremental import option, outside of
git-cvsimport+cvsps?

[ I am a bit out of touch with the current codebase but I coded and
maintained a good part of it back in the day. However naive/limited
the cvsps parser was, it did help a lot of projects make the leap to
git... ]

regards,



m
-- 
 martin.langh...@gmail.com
 -  ask interesting questions
 - don't get distracted with shiny stuff  - working code first
 ~ http://docs.moodle.org/en/User:Martin_Langhoff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: I have end-of-lifed cvsps

2013-12-12 Thread Martin Langhoff
On Wed, Dec 11, 2013 at 11:26 PM, Eric S. Raymond e...@thyrsus.com wrote:
 You'll have to remind me what you mean by incremental here. Possibly
 it's something cvs-fast-export could support.

User can

 - run a cvs to git import at time T, resulting in repo G
 - make commits to cvs repo
 - run cvs to git import at time T1, pointed to G, and the import tool
will only add the new commits found in cvs between T and T1.

 But what I'm trying to tell you is that, even after I've done a dozen
 releases and fixed the worst problems I could find, cvsps is far too
 likely to mangle anything that passes through it.  The idea that you
 are preserving *anything* valuable by sticking with it is a mirage.

The bugs that lead to a mangled history are real. I acknowledge and
respect that.

However, with those limitations, the incremental feature has value in
many scenarios.

The two main ones are as follows:

 - A developer is tracking his/her own patches on top of a CVS-based
project with git. This is often done with git-svn for example. If
old/convoluted branches in the far past are mangled, this user won't
care; as long as HEAD-master and/or the current/recent branch are
consistent with reality, the tool fits a need.

 - A project plans to transition to git gradually. Experienced
developers who'd normally work on CVS HEAD start working on git (and
landing their work on CVS afterwards). Old/mangled branches and tags
are of little interest, the big value is CVS HEAD (which is linear)
and possibly recent release/stable branches. The history captured is
good enough for git blame/log/pickaxe along the master line. At
transition time the original CVS repo can be kept around in readonly
mode, so people can still checkout the exact contents of an old branch
or tag for example (assuming no destructive surgery was done in the
CVS repo).

The above examples assume that the CVS repos have used flying fish
approach in the interesting (i.e.: recent) parts of their history.

[ Simplifying a bit for non-CVS-geeks -- flying fish is using CVS HEAD
for your development, plus 'feature branches' that get landed, plus
long-lived 'stable release' branches. Most CVS projects in modern
times use flying fish, which is a lot like what the git project uses
in its own repo, but tuned to CVS's strengths (interesting commits
linearized in CVS HEAD).

Other approaches ('dovetail') tend to end up with unworkable messes
given CVS's weaknesses. ]

The cvsimport+cvsps combo does a reasonable (though imperfect) job on
'flying fish' CVS histories _and that is what most projects evolved to
use_. If other cvs import tools can handle crazy histories, hats off
to them. But careful with knifing cvsps!

cheers,



m
-- 
 martin.langh...@gmail.com
 -  ask interesting questions
 - don't get distracted with shiny stuff  - working code first
 ~ http://docs.moodle.org/en/User:Martin_Langhoff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: I have end-of-lifed cvsps

2013-12-12 Thread Martin Langhoff
On Thu, Dec 12, 2013 at 12:17 PM, Andreas Krey a.k...@gmx.de wrote:
 But anyway, the replacement question is a) how fast the cvs-fast-export is
 and b) whether its output is stable

In my prior work, the better CVS importers would not have stable
output, so were not appropriate for incremental imports.

And even the fastest ones were very slow on large repos.

That is why I am asking the question.

 It won't magically disappear from your machine, and you have been warned. :-)

However, esr is making the case that git-cvsimport should stop using
cvsps. My questions are aimed at understanding whether this actually
results in proposing that an important feature is dropped.

Perhaps a better alternative is now available.


m
-- 
 martin.langh...@gmail.com
 -  ask interesting questions
 - don't get distracted with shiny stuff  - working code first
 ~ http://docs.moodle.org/en/User:Martin_Langhoff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: I have end-of-lifed cvsps

2013-12-12 Thread Martin Langhoff
On Thu, Dec 12, 2013 at 1:15 PM, Eric S. Raymond e...@thyrsus.com wrote:
 That terminology -- flying fish and dovetail -- is interesting, and
 I have not heard it before.  It might be woth putting in the Jargon File.
 Can you point me at examples of live usage?

The canonical reference would be
http://cvsbook.red-bean.com/cvsbook.html#Going%20Out%20On%20A%20Limb%20(How%20To%20Work%20With%20Branches%20And%20Survive)

just by being on the internet and widely referenced it has probably
eclipsed in google-juice examples of earlier usage. Karl Fogel may
remember where he got the names from.

cheers,



m
-- 
 martin.langh...@gmail.com
 -  ask interesting questions
 - don't get distracted with shiny stuff  - working code first
 ~ http://docs.moodle.org/en/User:Martin_Langhoff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: I have end-of-lifed cvsps

2013-12-12 Thread Martin Langhoff
On Thu, Dec 12, 2013 at 1:29 PM, Eric S. Raymond e...@thyrsus.com wrote:
 I am almost certain the output of cvs-fast-export is stable.  I
 believe the output of cvsps-3.x was, too.  Not sure about 2.x.

IIRC, making the output stable is nontrivial, specially on branches.
Two cases are still in my mind, from when I was wrestling with cvsps.

1 - For a history with CVS HEAD and a long-running stable release
branch (STABLE), which branched at P1...

   a - adding a file only at the tip of STABLE retroactively changes
history  for P1 and perhaps CVS HEAD

   b - forgetting to properly tag a subset of files with the branch
tag, and doing it later retroactively changes history

2 - you can create a new branch or tag with files that do not belong
together in any commit. Doing so changes history retroactively

... when I say changes history, I mean that the importers I know
revise their guesses of what files were seen together in a 'commit'.
This is specially true for history recorded with early cvs versions
that did not record a 'commit id'.

cvsps has the strange feature that it will cache its
assumptions/guesses, and continue incrementally from there. So if a
change in the CVS repo means that the old guess is now invalidated, it
continues the charade instead of forcing a complete rewrite of the git
history.

Maybe the current crop of tools have developed stronger magic than
what was available a few years ago... the task did seem impossible to
me.

cheers,




m
-- 
 martin.langh...@gmail.com
 -  ask interesting questions
 - don't get distracted with shiny stuff  - working code first
 ~ http://docs.moodle.org/en/User:Martin_Langhoff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: I have end-of-lifed cvsps

2013-12-12 Thread Martin Langhoff
On Thu, Dec 12, 2013 at 3:58 PM, Eric S. Raymond e...@thyrsus.com wrote:
  - regardless of commit ids, do you synthesize an artificial commit?
 How do you define parenthood for that artificial commit?

 Because tagging is never used to deduce changesets, the case does not arise.

So if a branch has a nonsensical branching point, or a tag is
nonsensical, is it ignored and not imported?

curious,



m
-- 
 martin.langh...@gmail.com
 -  ask interesting questions
 - don't get distracted with shiny stuff  - working code first
 ~ http://docs.moodle.org/en/User:Martin_Langhoff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: I have end-of-lifed cvsps

2013-12-12 Thread Martin Langhoff
On Thu, Dec 12, 2013 at 6:04 PM, Eric S. Raymond e...@thyrsus.com wrote:
 I'm not sure what counts as a nonsensical branching point. I do know that
 Keith left this rather cryptic note in a REAME:

Keith names exactly what we are talking about. At that time, Keith was
struggling with the old xorg cvs repo which these and quite a few
other nasties. I was also struggling with the mozilla cvs repo with
its own gremlins.

Between my earlier explanation and Keith's notes it should be clear to
you. It is absolutely trivial in CVS to have an inconsistent
checkout (for example, if you switch branch with the -l parameter
disabling recursion, or if you accidentally switch branch in a
subdirectory).

On that inconsistent checkout, nothing prevents you from tagging it,
nor from creating a new branch.

An importer with a 'consistent tree mentality' will look at the
files/revs involved in that tag (or branching point) and find no tree
to match.

CVS repos with that crap exist. x11/xorg did (Jim Gettys challenged me
to try importing it at an LCA, after the Bazaar NG folks passed on
it). Mozilla did as well.


IMHO it is a valid path to skip importing the tag/branch. As long as
main dev work was in HEAD, things end up ok (which goes back to my
flying fish notes).

cheers,



m
-- 
 martin.langh...@gmail.com
 -  ask interesting questions
 - don't get distracted with shiny stuff  - working code first
 ~ http://docs.moodle.org/en/User:Martin_Langhoff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: git "thin" submodule clone to feed "describe"

2016-02-23 Thread Martin Langhoff
On Tue, Feb 23, 2016 at 4:33 PM, Junio C Hamano  wrote:
> No, I do not think so.

Thanks. I will probably setup a pre-commit hook at the top level
project to update a submodule metadata file.

Not the prettiest but... :-)



m
-- 
 martin.langh...@gmail.com
 -  ask interesting questions
 - don't get distracted with shiny stuff  - working code first
 ~ http://docs.moodle.org/en/User:Martin_Langhoff
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


git "thin" submodule clone to feed "describe"

2016-02-23 Thread Martin Langhoff
Hi git list! long time no see! :-) Been missing you lots.

Do we currently have any means to clone _history_ but not _blobs_ of a
repo, or some approximation thereof?

With a bit more context: If I have a top-level project using a couple
dozen submodules, where the submodules are huge, do I have a
git-native means of running git-describe on each submodule without
pulling the whole thing down?

In this context, most developers want to get full checkout of some
submodules, but not of all; and 'git describe' of the submodules is
needed to 'shim' the missing submodules appropriately.

If the answer is no, there's a bunch of ways I can carry that as extra
data in the top level project. It's possible, yet inelegant &
duplicative.

thanks,



martin
-- 
 martin.langh...@gmail.com
 -  ask interesting questions
 - don't get distracted with shiny stuff  - working code first
 ~ http://docs.moodle.org/en/User:Martin_Langhoff
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Automagic `git checkout branchname` mysteriously fails

2016-10-14 Thread Martin Langhoff
In a (private) repo project I have, I recently tried (and failed) to do:

  git checkout v4.1-support

getting a "pathspec did not match any files known to git" error.

There's an origin/v4.1-support, there is no v4.1-support "local"
branch. Creating the tracking branch explicitly worked.

Other similar branches in existence upstream did work. Autocomplete
matched git's own behaviour for this; where git checkout foo woudn't
work, autocomplete would not offer a completion.

Why is this?

One theory I have not explored is that I have other remotes, and some
have a v4.1-support branch. If that's the case, the error message is
not very helpful, and could be improved.

git --version
2.7.4

DWIM in git is remarkably good, even addictive... when it works :-)

cheers,



m
-- 
 martin.langh...@gmail.com
 - ask interesting questions  ~  http://linkedin.com/in/martinlanghoff
 - don't be distracted~  http://github.com/martin-langhoff
   by shiny stuff


Re: Automagic `git checkout branchname` mysteriously fails

2016-10-14 Thread Martin Langhoff
On Fri, Oct 14, 2016 at 4:58 PM, Kevin Daudt <m...@ikke.info> wrote:
> Correct, this only works when it's unambiguous what branch you actually
> mean.

That's not surprising, but there isn't a warning. IMHO, finding
several branch matches is a strong indication that it'll be worth
reporting to the user that the DWIM machinery got hits, but couldn't
work it out.

I get that process is not geared towards making a friendly msg easy,
but we could print to stderr something like:

 "branch" matches more than one candidate ref, cannot choose automatically.
 If you mean to check out a branch, try git branch command.
 If you mean to check out a file, use -- before the pathname to
 disambiguate.

and then continue with execution. With a bit of wordsmithing, the msg
can be made to be helpful in the various failure modes.

cheers,


m
-- 
 martin.langh...@gmail.com
 - ask interesting questions  ~  http://linkedin.com/in/martinlanghoff
 - don't be distracted    ~  http://github.com/martin-langhoff
   by shiny stuff


Re: Delta compression not so effective

2017-03-01 Thread Martin Langhoff
On Wed, Mar 1, 2017 at 1:30 PM, Linus Torvalds
<torva...@linux-foundation.org> wrote:
> For example, the sorting code thinks that objects with the same name
> across the history are good sources of deltas.

Marius has indicated he is working with jar files. IME jar and war
files, which are zipfiles containing Java bytecode, range from not
delta-ing in a useful fashion, to pretty good deltas.

Depending on the build process (hi Maven!) there can be enough
variance in the build metadata to throw all the compression machinery
off.

On a simple Maven-driven project I have at hand, two .war files
compiled from the same codebase compressed really well in git. I've
also seen projects where storage space is ~101% of the "uncompressed"
size.

my 2c,



m
-- 
 martin.langh...@gmail.com
 - ask interesting questions  ~  http://linkedin.com/in/martinlanghoff
 - don't be distracted    ~  http://github.com/martin-langhoff
   by shiny stuff


Re: Delta compression not so effective

2017-03-01 Thread Martin Langhoff
On Wed, Mar 1, 2017 at 8:51 AM, Marius Storm-Olsen <msto...@gmail.com> wrote:
> BUT, even still, I would expect Git's delta compression to be quite 
> effective, compared to the compression present in SVN.

jar files are zipfiles. They don't delta in any useful form, and in
fact they differ even if they contain identical binary files inside.

> Commits: 32988
> DB (server) size: 139GB

Are you certain of the on-disk storage at the SVN server? Ideally,
you've taken the size with a low-level tool like `du -sh
/path/to/SVNRoot`.

Even with no delta compression (as per Junio and Linus' discussion),
based on past experience importing jar/wars/binaries from SVN into
git... I'd expect git's worst case to be on-par with SVN, perhaps ~5%
larger due to compression headers on uncompressible data.

cheers,


m
-- 
 martin.langh...@gmail.com
 - ask interesting questions  ~  http://linkedin.com/in/martinlanghoff
 - don't be distracted~  http://github.com/martin-langhoff
   by shiny stuff


Dropping a merge from history -- rebase or filter-branch or ...?

2017-07-07 Thread Martin Langhoff
Hi git-folk!

long time no see! I'm trying to do one of those "actually, please
don't" things that turn out to be needed in the field.

I need to open our next "for release" development branch from our
master, but without a couple of disruptive feature branches, which
have been merged into master already. We develop in github, so I'll
call them Pull Requests (PRs) as gh does.

So I'd like to run a filter-branch or git-rebase --interactive
--preserve-merges that drops some PRs. Problem is, they don't work!

filter-branch --commit-filter is fantastic, and gives me all the
control I want... except that it will "skip the commit", but still use
the trees in the later commits, so the code changes brought in by
those commits I wanted to avoid will be there. I think the docs/help
that discuss  "skip commit" should have a big warning there!

rebase --interactive --preserve-merges  --keep-empty made a complete
hash of things. Nonsense conflicts all over on the merge commits; I
think it re-ran the merge without picking up the conflict resolutions
we had applied.

The changes we want to avoid are fairly localized -- a specific module
got refactored in 3 stages. The rest of the history should replay
cleanly. I don't want to delete the module.

My fallback is a manually constructed revert. While still an option, I
think it's better to have a clean stat without sizable feature-branch
reverts.

cheers,



m
-- 
 martin.langh...@gmail.com
 - ask interesting questions  ~  http://linkedin.com/in/martinlanghoff
 - don't be distracted~  http://github.com/martin-langhoff
   by shiny stuff


Should rerere auto-update a merge resolution?

2017-08-23 Thread Martin Langhoff
Hi List!

Let's say...
 - git v2.9.4
 - rerere is enabled.
 - I merge maint into master, resolve erroneously, commit
 - I publish my merge in a temp branch, a reviewer points out my mistake
 - I reset hard, retry the merge, using --no-commit, rerere  applies
what it knows
 - I fix things up, then commit

So far so good.

Oops! One of the branches has moved forward in the meantime, so

 - git fetch
 - git reset --hard master
 - git merge maint
... rerere applies the first (incorrect) resolution...

Am I doing it wrong? {C,Sh}ould rerere have done better?

cheers,


m
-- 
 martin.langh...@gmail.com
 - ask interesting questions  ~  http://linkedin.com/in/martinlanghoff
 - don't be distracted~  http://github.com/martin-langhoff
   by shiny stuff


Re: Should rerere auto-update a merge resolution?

2017-08-23 Thread Martin Langhoff
On Wed, Aug 23, 2017 at 4:34 PM, Junio C Hamano <gits...@pobox.com> wrote:
> Between these two steps:
>
>>  - I reset hard, retry the merge, using --no-commit, rerere applies what it 
>> knows
>>  - I fix things up, then commit
>
> You'd tell rerere to forget what it knows because it is wrong.

Hi Junio!

thanks for the quick response.

Questions

 - when I tell it to forget, won't it forget the pre-resolution state?
my read of the rerere docs imply that it gets called during the merge
to record the conflicted state.

 - would it be a feature if it updated its resolution db
automagically? rerere is plenty automagic already...

cheers,



m
-- 
 martin.langh...@gmail.com
 - ask interesting questions  ~  http://linkedin.com/in/martinlanghoff
 - don't be distracted~  http://github.com/martin-langhoff
   by shiny stuff


Re: I'm trying to break "git pull --rebase"

2018-02-20 Thread Martin Langhoff
On Tue, Feb 20, 2018 at 5:00 PM, Julius Musseau  wrote:
> I was hoping to concoct a situation where "git pull --rebase" makes a
> mess of things.

It breaks quite easily with some workflows. They are all in the "don't
do that" territory.

Open a long-lived feature-dev branch, work on it. Other folks are
working on master. Merge master into feature-dev. Make sure some
merges might need conflict resolution.

Reorg some code on master, move files around. Code some more on
feature-dev branch. Merge master into feature-dev; the merge machinery
will probably cope with the code move, file renames. If it doesn't,
resolve it by hand.

Let all that simmer for a little bit.

Then try to rebase.

"Doctor, it hurts when I rebase after merging with conflict resolution... "

cheers,



m


git svn clone/fetch hits issues with gc --auto

2018-10-09 Thread Martin Langhoff
Hi folks,

Long time no see! Importing a 3GB (~25K revs, tons of files) SVN repo
I hit the gc error:

warning: There are too many unreachable loose objects; run 'git prune'
to remove them.
gc --auto: command returned error: 255

I don't seem to be the only one --
https://stackoverflow.com/questions/35738680/avoiding-warning-there-are-too-many-unreachable-loose-objects-during-git-svn

Looking at code history, it dropped the ability to pass options to git
repack when it was converted it to using git gc.

Experimentally I find that tweaking it to run git gc --auto
--prune=5.minutes.ago works well, while --prune=now breaks it.
Attempts to commit fail with missing objects.

- Why does --prune=now break it? Perhaps "gc" runs in the background,
and races with the commit being prepared?

- Would it be safe, sane to apply --prune=some.value on _clone_?

- During _fetch_, --prune=some.value seems risky. In a checkout being
actively used for development or merging it'd risk pruning objects
users expect to be there for recovery. Would there be a safe, sane
way?

- Is there a safer, saner value than 5 minutes?

cheers,


m
-- 
 martin.langh...@gmail.com
 - ask interesting questions  ~  http://linkedin.com/in/martinlanghoff
 - don't be distracted~  http://github.com/martin-langhoff
   by shiny stuff


-- 
 martin.langh...@gmail.com
 - ask interesting questions  ~  http://linkedin.com/in/martinlanghoff
 - don't be distracted~  http://github.com/martin-langhoff
   by shiny stuff


Re: git svn clone/fetch hits issues with gc --auto

2018-10-10 Thread Martin Langhoff
On Wed, Oct 10, 2018 at 8:21 AM Junio C Hamano  wrote:
> We probably can keep the "let's not run for a day" safety while
> pretending that "git gc -auto" succeeded for callers like "git svn"
> so that these callers do not hae to do "eval { ... }" to hide our
> exit code, no?
>
> I think that is what Jonathan's patch (jn/gc-auto) does.

+1

`--auto` means "DTRT, but remember you're running as part of a larger
process; don't error out unless it's critical".

cheers,


m
-- 
 martin.langh...@gmail.com
 - ask interesting questions  ~  http://linkedin.com/in/martinlanghoff
 - don't be distracted~  http://github.com/martin-langhoff
   by shiny stuff


Re: git svn clone/fetch hits issues with gc --auto

2018-10-10 Thread Martin Langhoff
Looking around, Jonathan Tan's "[PATCH] gc: do not warn about too many
loose objects" makes sense to me.

- remove unactionable warning
- as the warning is gone, no gc.log is produced
- subsequent gc runs don't exit due to gc.log

My very humble +1 on that.

As for downsides... if we have truly tons of _recent_ loose objects,
it'll ... take disk space? I'm fine with that.

For more aggressive gc options, thoughts:

 - Do we always consider git gc --prune=now "safe" in a "won't delete
stuff the user is likely to want" sense? For example -- are the
references from reflogs enough safety?

 - Even if we don't, for some commands it should be safe to run git gc
--prune=now at the end of the process, for example an import that
generates a new git repo (git svn clone).

cheers,


m
On Tue, Oct 9, 2018 at 10:49 PM Junio C Hamano  wrote:
>
> Forwarding to Jonathan, as I think this is an interesting supporting
> vote for the topic that we were stuck on.
>
> Eric Wong  writes:
>
> > Martin Langhoff  wrote:
> >> Hi folks,
> >>
> >> Long time no see! Importing a 3GB (~25K revs, tons of files) SVN repo
> >> I hit the gc error:
> >>
> >> warning: There are too many unreachable loose objects; run 'git prune'
> >> to remove them.
> >> gc --auto: command returned error: 255
> >
> > GC can be annoying when that happens... For git-svn, perhaps
> > this can be appropriate to at least allow the import to continue:
> >
> > diff --git a/perl/Git/SVN.pm b/perl/Git/SVN.pm
> > index 76b2965905..9b0caa3d47 100644
> > --- a/perl/Git/SVN.pm
> > +++ b/perl/Git/SVN.pm
> > @@ -999,7 +999,7 @@ sub restore_commit_header_env {
> >  }
> >
> >  sub gc {
> > - command_noisy('gc', '--auto');
> > + eval { command_noisy('gc', '--auto') };
> >  };
> >
> >  sub do_git_commit {
> >
> >
> > But yeah, somebody else who works on git regularly could
> > probably stop repack from writing thousands of loose
> > objects (and instead write a self-contained pack with
> > those objects, instead).  I haven't followed git closely
> > lately, myself.



-- 
 martin.langh...@gmail.com
 - ask interesting questions  ~  http://linkedin.com/in/martinlanghoff
 - don't be distracted~  http://github.com/martin-langhoff
   by shiny stuff


Re: git svn clone/fetch hits issues with gc --auto

2018-10-10 Thread Martin Langhoff
On Wed, Oct 10, 2018 at 7:27 AM Ævar Arnfjörð Bjarmason
 wrote:
> As Jeff's
> https://public-inbox.org/git/20180716175103.gb18...@sigill.intra.peff.net/
> and my https://public-inbox.org/git/878t69dgvx@evledraar.gmail.com/
> note it's a bit more complex than that.

Ok, my bad for not reading the whole thread :-) thanks for the kind explanation.

>  - The warning is actionable, you can decide to up your expiration
>policy.

A newbie-ish user shouldn't need to know git's internal store model
_and the nuances of its special cases_ got get through.


>  - We use this warning as a proxy for "let's not run for a day"

Oh, so _that's_ the trick with creating gc.log? I then understand the
idea of changing to exit 0.

But it's far from clear, and a clear _flag_, and not spitting again
the same warning, or differently-worded warning would be better.

"We won't try running gc, a recent run was deemed pointless until some
time passes. Nothing to worry about."

>  - This conflation of the user-visible warning and the policy is an
>emergent effect of how the different gc pieces interact, which as I
>note in the linked thread(s) sucks.

It sure does, and that aspect should be easy to fix...(?)

> So it's creating a lot of garbage during its cloning process that can
> just be immediately thrown away? What is it doing? Using the object
> store as a scratch pad for its own temporary state?

Yeah, thats suspicious and I don't know why. I've worked on other
importers and while those needed 'gc' to generate packs, they didn't
generate garbage objects. After gc, the repo was "clean".

cheers,



m
-- 
 martin.langh...@gmail.com
 - ask interesting questions  ~  http://linkedin.com/in/martinlanghoff
 - don't be distracted~  http://github.com/martin-langhoff
   by shiny stuff


<    1   2