Re: optimize gitdiff-do script

2005-04-18 Thread Paul Jackson
Pasky,

Looks like a couple of questions I asked over the weekend
got lost along the way.

 1) How do you want me to fix the indentation on my patch
to optimize gitdiff-do script:
- forget my first patch and resend from scratch, or
- a second patch restoring indentation, on top of my first one.

 2) Would you be interested in a patch that used a more robust tmp
file creation, along the lines of replacing

t=${TMPDIR:-/usr/tmp}/gitdiff.$$
trap 'set +f; rm -fr $t.?; trap 0; exit 0' 0 1 2 3 15

with:

tmp=${TMPDIR-/tmp}
tmp=$tmp/gitdiff-do.$RANDOM.$RANDOM.$RANDOM.$$
(umask 077  mkdir $tmp) || {
echo Could not create temporary directory! Exiting. 12 
exit 1
}
trap 'rm -fr $tmp; trap 0; exit 0' 0 1 2 3 15
t=$tmp/tmp

From the www.linuxsecurity.com link that Dave Jones provided, the
above $tmp directory is about as good as using mktemp, while
avoiding both dependency on mktemp options not everyone has.

 3) If interested in (2), would you want it instead of my previous mktemp
removal patch, or on top of it?

-- 
  I won't rest till it's the best ...
  Programmer, Linux Scalability
  Paul Jackson [EMAIL PROTECTED] 1.650.933.1373, 
1.925.600.0401
-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: optimize gitdiff-do script

2005-04-18 Thread Petr Baudis
Dear diary, on Mon, Apr 18, 2005 at 05:23:34PM CEST, I got a letter
where Paul Jackson [EMAIL PROTECTED] told me that...
 Pasky,
 
 Looks like a couple of questions I asked over the weekend
 got lost along the way.

Yes, sorry about that; I had a lot of mail traffic lately and I'm not so
used to it. ;-)

  1) How do you want me to fix the indentation on my patch
 to optimize gitdiff-do script:
   - forget my first patch and resend from scratch, or
   - a second patch restoring indentation, on top of my first one.

Resend from scratch, please.

I cannot guarantee I will look at it immediately, though. Optimizing is
nice, but gitdiff-do's speed is already usable and there are much more
pressing issues for git-pasky right now.

  2) Would you be interested in a patch that used a more robust tmp
 file creation, along the lines of replacing
 
   t=${TMPDIR:-/usr/tmp}/gitdiff.$$
   trap 'set +f; rm -fr $t.?; trap 0; exit 0' 0 1 2 3 15
 
 with:
 
   tmp=${TMPDIR-/tmp}
   tmp=$tmp/gitdiff-do.$RANDOM.$RANDOM.$RANDOM.$$
   (umask 077  mkdir $tmp) || {
   echo Could not create temporary directory! Exiting. 12 
   exit 1
   }
   trap 'rm -fr $tmp; trap 0; exit 0' 0 1 2 3 15
   t=$tmp/tmp
 
 From the www.linuxsecurity.com link that Dave Jones provided, the
 above $tmp directory is about as good as using mktemp, while
 avoiding both dependency on mktemp options not everyone has.
 
  3) If interested in (2), would you want it instead of my previous mktemp
 removal patch, or on top of it?

Instead of the previous patch. But what I said still holds - this can go
in only after we have a shell library sharing the common functions - I
don't want to have this horrid stuff in every file.

Actually, if you will make a mktemp shell function, no changes
whatsoever might be needed to the other scripts.

-- 
Petr Pasky Baudis
Stuff: http://pasky.or.cz/
C++: an octopus made by nailing extra legs onto a dog. -- Steve Taylor
-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] optimize gitdiff-do script

2005-04-16 Thread Paul Jackson
Rewrite gitdiff-do so that it works with arbitrary
whitespace (space, tab, newline, ...) in filenames.

Reduce number of subcommands execv'd by a
third, by only calling 'rm' once, at end, not each
loop.

Avoid using shell arrays; perhaps more portable.

Avoid 'echo -e' when displaying names; dont expand escape
  sequences in names.

Use shell noglob (-f) to minimize getdents() calls.

Simplify argument parsing and tmp file management.

Comment the nastier shell patterns.

This reduces the time by about 1/3 of what it was.

Signed-off-by: Paul Jackson [EMAIL PROTECTED]

Index: git-pasky-0.4/gitdiff-do
===
--- git-pasky-0.4.orig/gitdiff-do   2005-04-16 13:19:07.0 -0700
+++ git-pasky-0.4/gitdiff-do2005-04-16 15:33:28.0 -0700
@@ -2,19 +2,22 @@
 #
 # Make a diff between two GIT trees.
 # Copyright (c) Petr Baudis, 2005
+# Copyright (c) Paul Jackson, 2005
 #
 # Takes two parameters identifying the two trees/commits to compare.
 # Empty string will be substitued to HEAD revision.
 #
 # Note that this is probably the most performance critical shell script
-# in the whole GIT suite. That's also why I resorted to bash builtin
-# features and stuff. -- [EMAIL PROTECTED]
+# in the whole GIT suite.
 #
 # Outputs a diff converting the first tree to the second one.
 
+set -f   # keep shell from scanning . to expand wildcards
 
-id1=$1; shift
-id2=$1; shift
+t=${TMPDIR:-/usr/tmp}/gitdiff.$$
+trap 'set +f; rm -fr $t.?; trap 0; exit 0' 0 1 2 3 15
+
+id1=$1; id2=$2; shift 2
 
 # Leaves the result in $label.
 mkbanner () {
@@ -32,58 +35,55 @@ mkbanner () {
[ $labelapp ]  label=$label  ($labelapp)
 }
 
-t=${TMPDIR:-/usr/tmp}/gitdiff.$$
-trap 'rm -fr $t.?; trap 0; exit 0' 0 1 2 3 15
-diffdir=$t.1
-diffdir1=$diffdir/$id1
-diffdir2=$diffdir/$id2
-mkdir -p $diffdir1 $diffdir2
-
-while [ $1 ]; do
-   declare -a param
-   param=($1);
-   op=${param[0]:0:1}
-   mode=${param[0]:1}
-   type=${param[1]}
-   sha=${param[2]}
-   name=${param[3]}
-
-   echo -e Index: 
$name\n===
-
-   if [ $type = tree ]; then
-   # diff-tree will kindly diff the subdirs for us
-   # XXX: What about modes?
-   shift; continue
-   fi
-
-   loc1=$diffdir1/$name; dir1=${loc1%/*}
-   loc2=$diffdir2/$name; dir2=${loc2%/*}
-   ([ -d $dir1 ]  [ -d $dir2 ]) || mkdir -p $dir1 $dir2
-
-   case $op in
-   +)
-   mkbanner $loc2 $id2 $name $mode $sha
-   diff -L /dev/null  (tree:$id1) -L $label -u /dev/null 
$loc2
-   ;;
-   -)
-   mkbanner $loc1 $id1 $name $mode $sha
-   diff -L $label -L /dev/null  (tree:$id2) -u $loc1 
/dev/null
-   ;;
-   *)
-   modes=(${mode/-/ });
-   mode1=${modes[0]}; mode2=${modes[1]}
-   shas=(${sha/-/ });
-   sha1=${shas[0]}; sha2=${shas[1]}
-   mkbanner $loc1 $id1 $name $mode1 $sha1; label1=$label
-   mkbanner $loc2 $id2 $name $mode2 $sha2; label2=$label
-   diff -L $label1 -L $label2 -u $loc1 $loc2
-   ;;
-   *)
-   echo Unknown operator $op, ignoring delta: $1;;
-   esac
-
-   rm -f $loc1 $loc2
-   shift
+for arg
+do
+  IFS=''
+  set X$arg# X: don't let shell set see leading '+' in $arg
+  op=$1
+  mode=${op#X?}# trim leading X? 1st two chars
+  type=$2
+  sha=$3
+  # if 4+ tabs, trim 1st 3 fields on 1st line with sed
+  case $arg in
+  *\   *\  *\  *\  *)
+name=$(echo $arg |
+  /bin/sed '1s/[^  ]*  [^  ]*  [^  ]*  //')
+;;
+  *)
+name=$4
+;;
+  esac
+
+  echo Index: $name
+  echo ===
+
+  test $type = tree  continue
+
+  loc1=$t.1
+  loc2=$t.2
+
+  case $op in
+  X+*)
+mkbanner $loc2 $id2 $name $mode $sha
+diff -L /dev/null  (tree:$id1) -L $label -u /dev/null $loc2
+;;
+  X-*)
+mkbanner $loc1 $id1 $name $mode $sha
+diff -L $label -L /dev/null  (tree:$id2) -u $loc1 /dev/null
+;;
+  X\**)
+mode1=${mode%-*}  # trim '-' and after
+mode2=${mode#*-}  # trim up to and including '-'
+sha1=${sha%-*}# trim '-' and after
+sha2=${sha#*-}# trim up to and including '-'
+
+mkbanner $loc1 $id1 $name $mode1 $sha1; label1=$label
+mkbanner $loc2 $id2 $name $mode2 $sha2; label2=$label
+diff -L $label1 -L $label2 -u $loc1 $loc2
+;;
+  *)
+badop=$(echo $op | sed 's/.\(.\).*/\1/')
+echo Unknown operator $badop, ignoring delta: $1
+;;
+  esac
 done
-
-rm -rf $diffdir

-- 
  I won't rest till it's the best ...
  Programmer, Linux Scalability
  Paul Jackson [EMAIL PROTECTED] 1.650.933.1373, 
1.925.600.0401
-
To unsubscribe from this list: send the 

Re: optimize gitdiff-do script

2005-04-16 Thread Paul Jackson
Petr wrote:
 Please don't reindent the scripts. It violates the current coding style
 and the patch is unreviewable.

Sorry - I had not realized that there was a style in this case.

I am all in favor of such coding styles, and will gladly fit this one.

Do you want the patch resent, or a patch to restore indent on top of
this one?

 the patch is unreviewable.

The section that I indented the wrong way was such a total rewrite, that
you aren't going to be able to review it line by line compared to the
old anyway.  So in this case, it wasn't that I was modifying and
reindenting, rather that I was rewriting a page of code from scratch.

But that's a nit.  Honoring the coding style is necessary in any case.

 The idea behind that was that diffing could take a significant portion
 of disk space,

Here I don't understand, or don't agree, not sure which.

This won't eat more disk space, because the same tmp files are reused,
over and over.  Instead of unlinking them just before reopening them
truncating (O_WRONLY|O_CREAT|O_TRUNC), I just reopen them truncating.

-- 
  I won't rest till it's the best ...
  Programmer, Linux Scalability
  Paul Jackson [EMAIL PROTECTED] 1.650.933.1373, 
1.925.600.0401
-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html