Re: git-p4 Question

2015-04-20 Thread Sam Vilain

On 04/20/2015 09:41 AM, FusionX86 wrote:

Hopefully this is an appropriate place to ask questions about git-p4.

I started at a company that wants to migrate from Perforce to Git. I'm
new to Perforce and have been trying to learn just enough about it to
get through this migration.


You might also like to check out my git-p4raw project which imports 
directly from the raw repository files into a git repo using git fast-import


http://github.com/samv/git-p4raw

Apparently it's my most popular github project :-).  YMMV.

Sam.
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: weaning distributions off tarballs: extended verification of git tags

2015-03-02 Thread Sam Vilain

On 03/02/2015 12:08 PM, Junio C Hamano wrote:

I have a
hazy recollection of what it would take to replace SHA-1 in git with
something else; it should be possible (though tricky) to do it lazily,
where a tree entry has bits (eg, some of the currently unused file
mode bits) to denotes which hash algorithm is in use for the entry.
However I don't think that got past idea stage...

I think one reason why it didn't was because it would not work well.
That bit that tells this is a new object or old would mean that a
single tree can have many different object names, depending on which
of its component entries are using that bit and which aren't.  There
goes the we know two trees with the same object name are identical
without recursing into them optimization out the window.

Also it would make it impossible to do what you suggest to Joey to
do, i.e. exactly the same way that git does, once you start saying
that a tree object can be encoded in more than one different ways,
wouldn't it?


I was reasoning that people would rather not have to rewrite their whole 
history in order to switch checksum algorithms, and that by allowing 
trees to be lazily converted that this would make things more 
efficient.  However, I think I see your point here that this doesn't work.


However, as a per-commit header, then only first commit which changes 
the hashing algorithm would have to re-checksum each of the files: but 
just in the current tree, not all the way back to the beginning of 
history.  The delta logic should not have to care, and these objects 
with the same content but different object ID should pack perfectly, so 
long as git-pack-objects knows to re-checksum objects with the available 
hash algorithms and spot matches.


Other operations like diff which span commit hashing algorithms might be 
able to get away with their existing object ranking algorithms and cache 
alternate object IDs for content as they operate to facilitate exact 
matching across hash algorithm changes.


But actually, for the original problem - just producing a signature with 
a different hashing algorithm - probably it would be sufficient to just 
re-hash the current commit and the current tree recursively, and the 
mixed hash-algorithm case does not need to exist.  But I'm just thinking 
it might not be too hard to make git nicely generic, to be well prepared 
for when a second pre-image attack on SHA-1 becomes practical.


Sam
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: weaning distributions off tarballs: extended verification of git tags

2015-03-02 Thread Sam Vilain

On 03/02/2015 10:12 AM, Joey Hess wrote:

I support this proposal, as someone who no longer releases tarballs
of my software, when I can possibly avoid it. I have worried about
signed tags / commits only being a SHA1 break away from useless.

As to the implementation, checksumming the collection of raw objects is
certainly superior to tar. Colin had suggested sorting the objects by
checksum, but I don't think that is necessary. Just stream the commit
object, then its tree object, followed by the content of each object
listed in the tree, recursing into subtrees as necessary. That will be a
stable stream for a given commit, or tree.


I would really just do it exactly the same way that git does: checksum 
the objects including their headers with the new hashes.  I have a hazy 
recollection of what it would take to replace SHA-1 in git with 
something else; it should be possible (though tricky) to do it lazily, 
where a tree entry has bits (eg, some of the currently unused file mode 
bits) to denotes which hash algorithm is in use for the entry.  However 
I don't think that got past idea stage...


Sam
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: cherry picking and merge

2014-08-01 Thread Sam Vilain
On 08/01/2014 10:48 AM, Mike Stump wrote:
 There is also git-imerge, third party tool that is intended to help
 merging changes (and make it possible to do it in incremental way).
 Then remove git merge and replace it with git-imerge.  :-)  Anyway, I read 
 that, and I can see some beauty of that that might be nice in complex merges. 
  The problem is, I want git merge to work.


Git merge has a notion of discrete merge strategies.  The default,
recursive merge strategy isn't completely oblivious to history; in the
event that the two branches don't have a single merge bases, it performs
3-way merges (strangely enough) recursively, with the merge bases of the
branch you're trying to merge until it completes.  In general, this
works pretty well.  Some systems even simpler than that (eg, github's
green merge button) work acceptably as well.

There's no particular reason that you couldn't implement a merge
strategy which works more like SVN's approach, which essentially does an
internal rebase and then commits the result.  The advantages of a rebase
in this situation is that you get to eliminate changes which don't need
to be applied, either because (in SVN's case), it had some
metadata/hearsay information that told it that it could skip that
change, or (in git's case), because it found content/facts that the
change already was applied on one side.

However, there are corresponding disadvantages to this strategy.  It's
just as easy to contrive a situation where this internal rebasing
doesn't do the right thing, even without cheating by getting the
metadata wrong.  And besides, there's already a way to do this: do an
actual rebase.  You could also do a rebase, and then if, say, the
original branch you're rebasing is published and you don't want to
rewrite, then you can easily enough use squash merging, merge -s ours,
etc to make it look like the strategy you wanted was a built-in git
merge strategy.  Or, in the spirit of open source, you could contribute
the code required to make 'imerge' a built-in strategy.

 I was curious if svn handles this better the same or worse, and it did it 
 just fine.  I know that a while ago, svn could not handle this, it would do 
 what git does currently.  Apparently they figured out it was a bug and fixed 
 it.  Have you guys figured out it is a bug yet?  The first step in solving a 
 problem, is admitting you have a problem.

So, I have to chuckle when I read this indignant comment.  There's a
funny story to the while ago you refer to.  This refers to the time
period during which SVN was relevant; about versions 1.4 and earlier
(being generous).  Back in those days, SVN projects for the most part
avoided merging, because it was so problematic and not tracked at all. 
As one core SVN developer said to me, they found teams collaborate more
closely if they're all working on the same branch.  Sure, you could do
it, and I even know of a few communities who did, but by and large, it
was avoided.  Then, the new wave of version control systems including
Git, bzr and Mercurial were cropping up, and their merges were actually
good enough that you could practically use them.

The SVN core team had to keep pace to match.  So, in 1.5 the merge
tracking system, previously only supplied as a contrib script, became
core.  This is ironic, because the version control system which SVN
imitated poorly--Perforce--had a very sophisticated, if
over-complicated, merge tracking system which was also based on
metadata.  Per-branch, per-patch, per-file entries for whether or not a
patch had been integrated into the target branch.  I can only guess
that the reason they didn't implement this in the original SVN version
was that it was something of a pain point for users in Perforce. 
Possibly something to do with the way that Perforce would store double
entries for each merge (yes: two rows in a relational store, one
representing the mirror image of the other), and differentiated between
many different forms of integrated (ie, 2 rows and 4 states instead
of, say, a single bit).  So the underlying data model wasn't as simple
as it could have been, and this was reflected in the difficult to use
command-line tools.  Plus, they were using BerkeleyDB for metadata
instead of the relational ISAM library, and debugging a rabbit's nest of
merge record as Perforce used would have been a nightmare.  They didn't
go there.  And besides, they found that often, detecting patches as
already applied based on content, like 'patch' did, worked.

Prior to 1.5, the Perl community developed SVK, an offline version of
SVN, and this had a far simpler model for merge tracking, more similar
to git's: just tracking whole-branch merges rather than individual
files, patches, and branches.  SVN eventually added two separate ways of
tracking merges: either a per-file, per-branch, per-commit or a
per-branch, per-commit model.

Anyway, I'm not sure where I'm going with this, but I guess a little
extra perspective would be useful!

Sam
--

Re: [git-users] worlds slowest git repo- what to do?

2014-05-15 Thread Sam Vilain
On 05/15/2014 12:06 PM, Philip Oakley wrote:
 From: John Fisher fishook2...@gmail.com
 I assert based on one piece of evidence ( a post from a facebook dev)
 that I now have the worlds biggest and slowest git
 repository, and I am not a happy guy. I used to have the worlds
 biggest CVS repository, but CVS can't handle multi-G
 sized files. So I moved the repo to git, because we are using that
 for our new projects.

 goal:
 keep 150 G of files (mostly binary) from tiny sized to over 8G in a
 version-control system.

 problem:
 git is absurdly slow, think hours, on fast hardware.

 question:
 any suggestions beyond these-
 http://git-annex.branchable.com/
 https://github.com/jedbrown/git-fat
 https://github.com/schacon/git-media
 http://code.google.com/p/boar/
 subversion


You could shard.  Break the problem up into smaller repositories, eg via
submodules.  Try ~128 shards and I'd expect that 129 small clones should
complete faster than a single 150G clone, as well as being resumable etc.

The first challenge will be figuring out what to shard on, and how to
lay out the repository.  You could have all of the large files in their
own directory, and then the main repository just has symlinks into the
sharded area.  In that case, I would recommend sharding by date of the
introduced blob, so that there's a good chance you won't need to clone
everything forever; as shards with not many files for the current
version could in theory be retired.  Or, if the directory structure
already suits it, you could directly use submodules.

The second challenge will be writing the filter-branch script for this :-)

Good luck,
Sam


--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [spf:guess,mismatch] [PATCH v2] diff.c: keep arrow(=) on show_stats()'s shortened filename part to make rename visible.

2013-10-11 Thread Sam Vilain
On 10/11/2013 06:07 AM, Yoshioka Tsuneo wrote:
 + prefix_len = ((prefix_len = 0) ? prefix_len : 
 0);
 + strncpy(pre_arrow, arrow - prefix_len, 
 prefix_len);
 + pre_arrow[prefix_len] = '¥0';


This seems to be an encoding mistake; was this supposed to be an ASCII
arrow?

Sam

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 2/2] git-svn.perl: keep processing all commits in parents_exclude

2012-08-18 Thread Sam Vilain

On 08/11/2012 10:14 AM, Steven Walter wrote:

This fixes a bug where git finds the incorrect merge parent.  Consider a
repository with trunk, branch1 of trunk, and branch2 of branch1.
Without this change, git interprets a merge of branch2 into trunk as a
merge of branch1 into trunk.

Signed-off-by: Steven Walter stevenrwal...@gmail.com
---
  git-svn.perl |1 -
  t/t9164-git-svn-fetch-merge-branch-of-branch2.sh |   53 ++
  2 files changed, 53 insertions(+), 1 deletion(-)
  create mode 100755 t/t9164-git-svn-fetch-merge-branch-of-branch2.sh

diff --git a/git-svn.perl b/git-svn.perl
index abcec11..c4678c1 100755
--- a/git-svn.perl
+++ b/git-svn.perl
@@ -3623,7 +3623,6 @@ sub parents_exclude {
if ( $commit eq $excluded ) {
push @excluded, $commit;
$found++;
-   last;
}


I could believe that, too.  I like this change: one line of code, 53 
lines of test and a paragraph of explanation :-).


Cheers,
Sam.
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 2/2] git-svn.perl: keep processing all commits in parents_exclude

2012-08-18 Thread Sam Vilain

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA512

On 08/18/2012 01:43 PM, Steven Walter wrote:
 How about a Signed-Off-By?

Signed-Off-By: Sam Vilain s...@vilain.net

Sam

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.11 (GNU/Linux)

iQEcBAEBCgAGBQJQMCcnAAoJEBdtaL3wGtIoJ1UIAIJ6Xz5OEMmMk1tq546eggHg
I+sJIFjqg+mo53VqT0/bKhqg8sLx8F/Gda15nwOUMcslKJdA+sCc+QhAtgSWJ1WK
Idw59jtZHbabfopBHNgneSqVBhXSKpNw3e3EvlRVkK1wobO0+c0X6YkBG0eBCZl2
6RYXIAb6jX04k1hSrnxcPn+REkoyl31aEuFBPNz0wRWHjju+G6bPY/x7D/gO1YOc
/uRQXveQngJOLwawDR+dGS+0aWPseX/sbZqsVFo0hVQYqoHt+s4uVuriBfHSRKd+
R1eUoY0ikW4UvEwZX74Zf3SeoVLLFnkCW8B5XsGb10IojbvY3uyYevATXI79j1Y=
=Lb7H
-END PGP SIGNATURE-

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html