Re: [RFC PATCH 1/1] recover: restoration of deleted worktree files

2018-08-04 Thread Todd Zullinger
Hi,

Robert P. J. Day wrote:
> On Sat, 4 Aug 2018, Junio C Hamano wrote:
>> In other words, I think this patch can be a fine addition to
>> somebody else's project (i.e. random collection of scripts that may
>> help Git users), so let's see how I can offer comments/inputs to
>> help you improve it.  So I won't comment on lang, log message, or
>> shell scripting style---these are project convention and the
>> git-core convention won't be relevant to this patch.
> 
>   not sure how relevant this is, but fedora bundles a bunch of neat
> utilities into two packages: git-tools and git-extras. i have no idea
> what relationship those packages have to official git, or who decides
> what goes into them.

For anyone curious, those packages (git-extras and
git-tools) are both entirely separate projects upstream and
in the fedora packaging.  A git-recover script may well be a
good fit in one of those upstream projects.

The git-(extras|tools) package names are a bit confusing
IMO.  But it's probably more confusing that they each add a
number of git-* commands in the default PATH the way they're
packaged.

We do package some bits from contrib/ (e.g. completion,
subtree, etc.) in the fedora git packages.  We don't add
scripts and commands from outside of the git tarballs as
part of the fedora git package, though.

So far, I don't recall anyone filing a bug report about
commands from git-extras or git-tools against git.  So it
seems that users of those additional packages aren't being
confused, thankfully.

-- 
Todd
~~
Between two evils, I always pick the one I never tried before.
-- Mae West



Re: [RFC PATCH 1/1] recover: restoration of deleted worktree files

2018-08-04 Thread Junio C Hamano
Edward Thomson  writes:

> In any case, it sounds like you're not particularly interested in
> this, although I certainly appreciate you taking the time to suggest
> improvements despite that.  There's some good feedback there.

Not in its current shape.  But do not take this in a wrong way.  It
may be useful in a third-party script collection in its current
shape already.

More importantly, I am not opposed to have a "resurrect" utility in
the core distribution.  It just has to be a lot better than what
"grep -e 'I think I wrote this string' .git/lost-found/other/*"
gives us.

Filename discovery (perhaps from lost trees, which was the idea I
wrote in the message I am responding to, but others may come up with
better alternatibve approaches) is a must, but not primarily because
such a grep won't find the path to which the contents should go.
When a user says "I think I wrote this string in the file I am
looking for", s/he already knows what s/he wants to recover (i.e. it
was a README file at the top-level).  Filename discovery is a must
because grepping in the raw blob contents without smudge filter
chain applied may not find what we want in the first place, and for
that to happen, we need to have a filename.

Side note.  That may mean that even working in the
do-recover mode, the script may want to take a filename,
letting the user to say "pretend all lost blobs are of this
type, as that is the type of the blob I just lost and am
interested in, and a filename will help you find an
appropriate smudge and/or textconv filter to help me"

That makes me realize that I did not mention one more thing, other
than the "interactibve loop", I did like in the script over what
lost-found gives us: smudge filter support.  I do not very often
work with contents that needs clean/smudge other than in one project
(obviously not "git.git"), and I can see how it is essential in
helping the user to find the contents the user is looking for.

Thanks.


Re: [RFC PATCH 1/1] recover: restoration of deleted worktree files

2018-08-04 Thread Robert P. J. Day
On Sat, 4 Aug 2018, Junio C Hamano wrote:

> Edward Thomson  writes:
>
> > Introduce git-recover, a simple script to aide in restoration of
> > deleted worktree files.  This will look for unreachable blobs in
> > the object database and prompt users to restore them to disk,
> > either interactively or on the command-line.

> >  git-recover.sh | 311 
> > +
> >  1 file changed, 311 insertions(+)
> >  create mode 100755 git-recover.sh
>
> My first reaction was to say that I am not going to take a new
> command written only for bash with full bashism, even if it came
> with docs, tests nor Makefile integration, for Git itself.  Then I
> reconsidered, as not everything related to Git is git-core, and all
> of the above traits are sign of this patch _not_ meant for git-core.
>
> In other words, I think this patch can be a fine addition to
> somebody else's project (i.e. random collection of scripts that may
> help Git users), so let's see how I can offer comments/inputs to
> help you improve it.  So I won't comment on lang, log message, or
> shell scripting style---these are project convention and the
> git-core convention won't be relevant to this patch.

  not sure how relevant this is, but fedora bundles a bunch of neat
utilities into two packages: git-tools and git-extras. i have no idea
what relationship those packages have to official git, or who decides
what goes into them.

rday

-- 


Robert P. J. Day Ottawa, Ontario, CANADA
  http://crashcourse.ca/dokuwiki

Twitter:   http://twitter.com/rpjday
LinkedIn:   http://ca.linkedin.com/in/rpjday



Re: [RFC PATCH 1/1] recover: restoration of deleted worktree files

2018-08-04 Thread Edward Thomson
On Sat, Aug 04, 2018 at 08:54:49AM -0700, Junio C Hamano wrote:
> 
> My first reaction was to say that I am not going to take a new
> command written only for bash with full bashism, even if it came
> with docs, tests nor Makefile integration, for Git itself.  Then I
> reconsidered, as not everything related to Git is git-core, and all
> of the above traits are sign of this patch _not_ meant for git-core.

Yes, obviously I was not suggesting that this would be mergeable with
the bashims, as I mentioned in my cover letter.

In any case, it sounds like you're not particularly interested in
this, although I certainly appreciate you taking the time to suggest
improvements despite that.  There's some good feedback there.

Cheers-
-ed


Re: [RFC PATCH 1/1] recover: restoration of deleted worktree files

2018-08-04 Thread Junio C Hamano
Edward Thomson  writes:

> Introduce git-recover, a simple script to aide in restoration of deleted
> worktree files.  This will look for unreachable blobs in the object
> database and prompt users to restore them to disk, either interactively
> or on the command-line.
> ---
>  git-recover.sh | 311 
> +
>  1 file changed, 311 insertions(+)
>  create mode 100755 git-recover.sh

My first reaction was to say that I am not going to take a new
command written only for bash with full bashism, even if it came
with docs, tests nor Makefile integration, for Git itself.  Then I
reconsidered, as not everything related to Git is git-core, and all
of the above traits are sign of this patch _not_ meant for git-core.

In other words, I think this patch can be a fine addition to
somebody else's project (i.e. random collection of scripts that may
help Git users), so let's see how I can offer comments/inputs to
help you improve it.  So I won't comment on lang, log message, or
shell scripting style---these are project convention and the git-core
convention won't be relevant to this patch.

> diff --git a/git-recover.sh b/git-recover.sh
> new file mode 100755
> index 0..651d4116f
> --- /dev/null
> +++ b/git-recover.sh
> @@ -0,0 +1,311 @@
> +#!/usr/bin/env bash
> +#
> +# This program helps recover files in your repository that were deleted
> +# from the working tree.
> +#
> +# Copyright (c) 2017-2018 Edward Thomson.
> +
> +set -e
> +
> +IFS=$'\n'
> +
> +PROGNAME=$(echo "$0" | sed -e 's/.*\///')
> +GIT_DIR=$(git rev-parse --git-dir)
> +
> +DO_RECOVER=0
> +DO_FULL=0
> +DO_INTERACTIVE=0
> +BLOBS=()
> +FILENAMES=()
> +
> +function die_usage {
> + echo "usage: $PROGNAME [-a] [-i] [--full] [ [-f ] ...]" 
> >&2
> + exit 1
> +}
> +
> +while [[ $# -gt 0 ]]; do
> + case "$1" in
> + -a|--all)
> + DO_RECOVER=1
> + ;;
> + -i|--interactive)
> + DO_INTERACTIVE=1
> + ;;
> + --full)
> + DO_FULL=1
> + ;;
> + *)
> + if [ "${1:0:1}" == "-" ]; then
> + echo "$PROGNAME: unknown argument: $1" >&2
> + die_usage
> + fi
> + BLOBS+=("$1")
> +
> + shift
> + if [ "$1" == "-f" ] || [ "$1" == "--filename" ]; then
> + shift
> + if [ $# == 0 ]; then
> + die_usage
> + fi
> + FILENAMES+=("$1")
> + shift
> + else
> + FILENAMES+=("")
> + fi

You do not want to take "--file=Makefile" (i.e. abbreviated option
name, and value as part of the option arg after '=')?

> + continue
> + ;;
> + esac
> + shift
> +done

So, as a user, I can run this with "-a" but no blob object names to
run it in DO_RECOVER mode, or I can give one or more "blob spec"
where I say object id, optionally followed by one "-f filename"; in
the latter mode, BLOBS[] and FILENAMES[] array would have the same
number of elements, corresponding to each other.

> +if [ ${#BLOBS[@]} != 0 ] && [ $DO_RECOVER == 1 ]; then
> + die_usage
> +elif [ ${#BLOBS[@]} != 0 ]; then
> + DO_RECOVER=1
> +fi

If I did not say "-a" but did not give "blob spec", then I am
implicitly asking for "-a" to work in DO_RECOVER mode.

I think I understood what the program wants to do so far.

> +case "$OSTYPE" in
> + darwin*|freebsd*) IS_BSD=1 ;;
> + *) IS_BSD=0 ;;
> +esac
> +
> +function expand_given_blobs() {
> + for i in "${!BLOBS[@]}"; do
> + ID=$(git rev-parse --verify "${BLOBS[$i]}" 2>/dev/null || true)
> +
> + if [ -z "$ID" ]; then
> + echo "$PROGNAME: ${BLOBS[$i]} is not a valid object." 
> 1>&2
> + exit 1
> + fi
> +
> + TYPE=$(git cat-file -t "${ID}" 2>/dev/null || true)

An earlier "set -e" makes "|| true" ugliness required.  I suspect
use of "set -e" overall is a loss (vs explicit error checking).

> + if [ "$TYPE" != "blob" ]; then
> + echo "$PROGNAME: ${BLOBS[$i]} is not a blob." 1>&2
> + exit
> + fi

A user may have given us 11f5bcd9 and this function makes sure such
an object exists in the object store *and* is a blob.  Otherwise
it dies.  The main objective of this function is to turn that user
supplied object name to a full hex that is known to refer to an
existing blob.

> + BLOBS[$i]=$ID
> + done

I find a disconnect between this being a loop and the attiude "we
won't tolerate any erroneous input".  If a user is feeding dozens of
blob object names, wouldn't it be more helpful to give a warning, go
on and help the user with the rest?

> +}
> +
> +# find all the unreachable blobs
> +function find_unreachable() {
> + FULLNESS="--no-full"
> +
> + if [ $DO_FULL == 1 ]; then 

[RFC PATCH 1/1] recover: restoration of deleted worktree files

2018-08-04 Thread Edward Thomson
Introduce git-recover, a simple script to aide in restoration of deleted
worktree files.  This will look for unreachable blobs in the object
database and prompt users to restore them to disk, either interactively
or on the command-line.
---
 git-recover.sh | 311 +
 1 file changed, 311 insertions(+)
 create mode 100755 git-recover.sh

diff --git a/git-recover.sh b/git-recover.sh
new file mode 100755
index 0..651d4116f
--- /dev/null
+++ b/git-recover.sh
@@ -0,0 +1,311 @@
+#!/usr/bin/env bash
+#
+# This program helps recover files in your repository that were deleted
+# from the working tree.
+#
+# Copyright (c) 2017-2018 Edward Thomson.
+
+set -e
+
+IFS=$'\n'
+
+PROGNAME=$(echo "$0" | sed -e 's/.*\///')
+GIT_DIR=$(git rev-parse --git-dir)
+
+DO_RECOVER=0
+DO_FULL=0
+DO_INTERACTIVE=0
+BLOBS=()
+FILENAMES=()
+
+function die_usage {
+   echo "usage: $PROGNAME [-a] [-i] [--full] [ [-f ] ...]" 
>&2
+   exit 1
+}
+
+while [[ $# -gt 0 ]]; do
+   case "$1" in
+   -a|--all)
+   DO_RECOVER=1
+   ;;
+   -i|--interactive)
+   DO_INTERACTIVE=1
+   ;;
+   --full)
+   DO_FULL=1
+   ;;
+   *)
+   if [ "${1:0:1}" == "-" ]; then
+   echo "$PROGNAME: unknown argument: $1" >&2
+   die_usage
+   fi
+   BLOBS+=("$1")
+
+   shift
+   if [ "$1" == "-f" ] || [ "$1" == "--filename" ]; then
+   shift
+   if [ $# == 0 ]; then
+   die_usage
+   fi
+   FILENAMES+=("$1")
+   shift
+   else
+   FILENAMES+=("")
+   fi
+   continue
+   ;;
+   esac
+   shift
+done
+
+if [ ${#BLOBS[@]} != 0 ] && [ $DO_RECOVER == 1 ]; then
+   die_usage
+elif [ ${#BLOBS[@]} != 0 ]; then
+   DO_RECOVER=1
+fi
+
+case "$OSTYPE" in
+   darwin*|freebsd*) IS_BSD=1 ;;
+   *) IS_BSD=0 ;;
+esac
+
+function expand_given_blobs() {
+   for i in "${!BLOBS[@]}"; do
+   ID=$(git rev-parse --verify "${BLOBS[$i]}" 2>/dev/null || true)
+
+   if [ -z "$ID" ]; then
+   echo "$PROGNAME: ${BLOBS[$i]} is not a valid object." 
1>&2
+   exit 1
+   fi
+
+   TYPE=$(git cat-file -t "${ID}" 2>/dev/null || true)
+
+   if [ "$TYPE" != "blob" ]; then
+   echo "$PROGNAME: ${BLOBS[$i]} is not a blob." 1>&2
+   exit
+   fi
+
+   BLOBS[$i]=$ID
+   done
+}
+
+# find all the unreachable blobs
+function find_unreachable() {
+   FULLNESS="--no-full"
+
+   if [ $DO_FULL == 1 ]; then FULLNESS="--full"; fi
+
+   BLOBS=($(git fsck --unreachable --no-reflogs \
+   "${FULLNESS}" --no-progress | sed -ne 's/^unreachable blob 
//p'))
+}
+
+function read_one_file {
+   BLOB=$1
+   FILTER_NAME=$2
+   ARGS=()
+
+   if [ -z "$FILTER_NAME" ]; then
+   ARGS+=("blob")
+   else
+   ARGS+=("--filters" "--path=$FILTER_NAME")
+   fi
+
+   git cat-file "${ARGS[@]}" "$BLOB"
+}
+
+function write_one_file {
+   BLOB=$1
+   FILTER_NAME=$2
+   OUTPUT_NAME=$3
+
+   ABBREV=$(git rev-parse --short "${BLOB}")
+
+   echo -n "Writing $ABBREV: "
+   read_one_file "$BLOB" "$FILTER_NAME" > "$OUTPUT_NAME"
+   echo "$OUTPUT_NAME."
+}
+
+function unique_filename {
+   if [ ! -f "${BLOB}" ]; then
+   echo "$BLOB"
+   else
+   cnt=1
+   while true
+   do
+   fn="${BLOB}~${cnt}"
+   if [ ! -f "${fn}" ]; then
+   echo "${fn}"
+   break
+   fi
+   cnt=$((cnt+1))
+   done
+   fi
+}
+
+function write_recoverable {
+   for i in "${!BLOBS[@]}"; do
+   BLOB=${BLOBS[$i]}
+   FILTER_NAME=${FILENAMES[$i]}
+   OUTPUT_NAME=${FILENAMES[$i]:-$(unique_filename)}
+
+   write_one_file "$BLOB" "$FILTER_NAME" "$OUTPUT_NAME"
+   done
+}
+
+function file_time {
+   if [ $IS_BSD == 1 ]; then
+   stat -f %c "$1"
+   else
+   stat -c %Y "$1"
+   fi
+}
+
+function timestamp_to_s {
+   if [ $IS_BSD == 1 ]; then
+   date -r "$1"
+   else
+   date -d @"$1"
+   fi
+}
+
+function sort_by_timestamp {
+   # sort blobs in loose objects by their timestamp (packed blobs last)
+   BLOB_AND_TIMESTAMPS=($(for BLOB in "${BLOBS[@]}"; do
+   LOOSE="${BLOB::2}/${BLOB:2}"
+   TIME=$(file_time "$GIT_DIR/objects/$LOOSE" 2>/dev/null || true)
+   echo "$BLOB $TIME"
+