Re: optimize gitdiff-do script
Pasky, Looks like a couple of questions I asked over the weekend got lost along the way. 1) How do you want me to fix the indentation on my patch to optimize gitdiff-do script: - forget my first patch and resend from scratch, or - a second patch restoring indentation, on top of my first one. 2) Would you be interested in a patch that used a more robust tmp file creation, along the lines of replacing t=${TMPDIR:-/usr/tmp}/gitdiff.$$ trap 'set +f; rm -fr $t.?; trap 0; exit 0' 0 1 2 3 15 with: tmp=${TMPDIR-/tmp} tmp=$tmp/gitdiff-do.$RANDOM.$RANDOM.$RANDOM.$$ (umask 077 mkdir $tmp) || { echo Could not create temporary directory! Exiting. 12 exit 1 } trap 'rm -fr $tmp; trap 0; exit 0' 0 1 2 3 15 t=$tmp/tmp From the www.linuxsecurity.com link that Dave Jones provided, the above $tmp directory is about as good as using mktemp, while avoiding both dependency on mktemp options not everyone has. 3) If interested in (2), would you want it instead of my previous mktemp removal patch, or on top of it? -- I won't rest till it's the best ... Programmer, Linux Scalability Paul Jackson [EMAIL PROTECTED] 1.650.933.1373, 1.925.600.0401 - To unsubscribe from this list: send the line unsubscribe git in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: optimize gitdiff-do script
Dear diary, on Mon, Apr 18, 2005 at 05:23:34PM CEST, I got a letter where Paul Jackson [EMAIL PROTECTED] told me that... Pasky, Looks like a couple of questions I asked over the weekend got lost along the way. Yes, sorry about that; I had a lot of mail traffic lately and I'm not so used to it. ;-) 1) How do you want me to fix the indentation on my patch to optimize gitdiff-do script: - forget my first patch and resend from scratch, or - a second patch restoring indentation, on top of my first one. Resend from scratch, please. I cannot guarantee I will look at it immediately, though. Optimizing is nice, but gitdiff-do's speed is already usable and there are much more pressing issues for git-pasky right now. 2) Would you be interested in a patch that used a more robust tmp file creation, along the lines of replacing t=${TMPDIR:-/usr/tmp}/gitdiff.$$ trap 'set +f; rm -fr $t.?; trap 0; exit 0' 0 1 2 3 15 with: tmp=${TMPDIR-/tmp} tmp=$tmp/gitdiff-do.$RANDOM.$RANDOM.$RANDOM.$$ (umask 077 mkdir $tmp) || { echo Could not create temporary directory! Exiting. 12 exit 1 } trap 'rm -fr $tmp; trap 0; exit 0' 0 1 2 3 15 t=$tmp/tmp From the www.linuxsecurity.com link that Dave Jones provided, the above $tmp directory is about as good as using mktemp, while avoiding both dependency on mktemp options not everyone has. 3) If interested in (2), would you want it instead of my previous mktemp removal patch, or on top of it? Instead of the previous patch. But what I said still holds - this can go in only after we have a shell library sharing the common functions - I don't want to have this horrid stuff in every file. Actually, if you will make a mktemp shell function, no changes whatsoever might be needed to the other scripts. -- Petr Pasky Baudis Stuff: http://pasky.or.cz/ C++: an octopus made by nailing extra legs onto a dog. -- Steve Taylor - To unsubscribe from this list: send the line unsubscribe git in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] optimize gitdiff-do script
Rewrite gitdiff-do so that it works with arbitrary whitespace (space, tab, newline, ...) in filenames. Reduce number of subcommands execv'd by a third, by only calling 'rm' once, at end, not each loop. Avoid using shell arrays; perhaps more portable. Avoid 'echo -e' when displaying names; dont expand escape sequences in names. Use shell noglob (-f) to minimize getdents() calls. Simplify argument parsing and tmp file management. Comment the nastier shell patterns. This reduces the time by about 1/3 of what it was. Signed-off-by: Paul Jackson [EMAIL PROTECTED] Index: git-pasky-0.4/gitdiff-do === --- git-pasky-0.4.orig/gitdiff-do 2005-04-16 13:19:07.0 -0700 +++ git-pasky-0.4/gitdiff-do2005-04-16 15:33:28.0 -0700 @@ -2,19 +2,22 @@ # # Make a diff between two GIT trees. # Copyright (c) Petr Baudis, 2005 +# Copyright (c) Paul Jackson, 2005 # # Takes two parameters identifying the two trees/commits to compare. # Empty string will be substitued to HEAD revision. # # Note that this is probably the most performance critical shell script -# in the whole GIT suite. That's also why I resorted to bash builtin -# features and stuff. -- [EMAIL PROTECTED] +# in the whole GIT suite. # # Outputs a diff converting the first tree to the second one. +set -f # keep shell from scanning . to expand wildcards -id1=$1; shift -id2=$1; shift +t=${TMPDIR:-/usr/tmp}/gitdiff.$$ +trap 'set +f; rm -fr $t.?; trap 0; exit 0' 0 1 2 3 15 + +id1=$1; id2=$2; shift 2 # Leaves the result in $label. mkbanner () { @@ -32,58 +35,55 @@ mkbanner () { [ $labelapp ] label=$label ($labelapp) } -t=${TMPDIR:-/usr/tmp}/gitdiff.$$ -trap 'rm -fr $t.?; trap 0; exit 0' 0 1 2 3 15 -diffdir=$t.1 -diffdir1=$diffdir/$id1 -diffdir2=$diffdir/$id2 -mkdir -p $diffdir1 $diffdir2 - -while [ $1 ]; do - declare -a param - param=($1); - op=${param[0]:0:1} - mode=${param[0]:1} - type=${param[1]} - sha=${param[2]} - name=${param[3]} - - echo -e Index: $name\n=== - - if [ $type = tree ]; then - # diff-tree will kindly diff the subdirs for us - # XXX: What about modes? - shift; continue - fi - - loc1=$diffdir1/$name; dir1=${loc1%/*} - loc2=$diffdir2/$name; dir2=${loc2%/*} - ([ -d $dir1 ] [ -d $dir2 ]) || mkdir -p $dir1 $dir2 - - case $op in - +) - mkbanner $loc2 $id2 $name $mode $sha - diff -L /dev/null (tree:$id1) -L $label -u /dev/null $loc2 - ;; - -) - mkbanner $loc1 $id1 $name $mode $sha - diff -L $label -L /dev/null (tree:$id2) -u $loc1 /dev/null - ;; - *) - modes=(${mode/-/ }); - mode1=${modes[0]}; mode2=${modes[1]} - shas=(${sha/-/ }); - sha1=${shas[0]}; sha2=${shas[1]} - mkbanner $loc1 $id1 $name $mode1 $sha1; label1=$label - mkbanner $loc2 $id2 $name $mode2 $sha2; label2=$label - diff -L $label1 -L $label2 -u $loc1 $loc2 - ;; - *) - echo Unknown operator $op, ignoring delta: $1;; - esac - - rm -f $loc1 $loc2 - shift +for arg +do + IFS='' + set X$arg# X: don't let shell set see leading '+' in $arg + op=$1 + mode=${op#X?}# trim leading X? 1st two chars + type=$2 + sha=$3 + # if 4+ tabs, trim 1st 3 fields on 1st line with sed + case $arg in + *\ *\ *\ *\ *) +name=$(echo $arg | + /bin/sed '1s/[^ ]* [^ ]* [^ ]* //') +;; + *) +name=$4 +;; + esac + + echo Index: $name + echo === + + test $type = tree continue + + loc1=$t.1 + loc2=$t.2 + + case $op in + X+*) +mkbanner $loc2 $id2 $name $mode $sha +diff -L /dev/null (tree:$id1) -L $label -u /dev/null $loc2 +;; + X-*) +mkbanner $loc1 $id1 $name $mode $sha +diff -L $label -L /dev/null (tree:$id2) -u $loc1 /dev/null +;; + X\**) +mode1=${mode%-*} # trim '-' and after +mode2=${mode#*-} # trim up to and including '-' +sha1=${sha%-*}# trim '-' and after +sha2=${sha#*-}# trim up to and including '-' + +mkbanner $loc1 $id1 $name $mode1 $sha1; label1=$label +mkbanner $loc2 $id2 $name $mode2 $sha2; label2=$label +diff -L $label1 -L $label2 -u $loc1 $loc2 +;; + *) +badop=$(echo $op | sed 's/.\(.\).*/\1/') +echo Unknown operator $badop, ignoring delta: $1 +;; + esac done - -rm -rf $diffdir -- I won't rest till it's the best ... Programmer, Linux Scalability Paul Jackson [EMAIL PROTECTED] 1.650.933.1373, 1.925.600.0401 - To unsubscribe from this list: send the
Re: optimize gitdiff-do script
Petr wrote: Please don't reindent the scripts. It violates the current coding style and the patch is unreviewable. Sorry - I had not realized that there was a style in this case. I am all in favor of such coding styles, and will gladly fit this one. Do you want the patch resent, or a patch to restore indent on top of this one? the patch is unreviewable. The section that I indented the wrong way was such a total rewrite, that you aren't going to be able to review it line by line compared to the old anyway. So in this case, it wasn't that I was modifying and reindenting, rather that I was rewriting a page of code from scratch. But that's a nit. Honoring the coding style is necessary in any case. The idea behind that was that diffing could take a significant portion of disk space, Here I don't understand, or don't agree, not sure which. This won't eat more disk space, because the same tmp files are reused, over and over. Instead of unlinking them just before reopening them truncating (O_WRONLY|O_CREAT|O_TRUNC), I just reopen them truncating. -- I won't rest till it's the best ... Programmer, Linux Scalability Paul Jackson [EMAIL PROTECTED] 1.650.933.1373, 1.925.600.0401 - To unsubscribe from this list: send the line unsubscribe git in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html