Re: [PATCH v2 00/21] Support multiple worktrees

2013-12-22 Thread Duy Nguyen
On Sun, Dec 22, 2013 at 1:38 PM, Junio C Hamano gits...@pobox.com wrote:
 Duy Nguyen pclo...@gmail.com writes:

 I am not happy with the choice of main/HEAD that would squat on a
 good name for remote-tracking branch (i.e. s/origin/main/), though.
 $GIT_DIR/COMMON_HEAD perhaps?

 It's not just about HEAD. Anything worktree-specific of the main
 checkout can be accessed this way, e.g. main/index,
 main/FETCH_HEAD and it's not exactly common because it's
 worktree info. Maybe 1ST_ as the prefix (e.g. 1ST_HEAD, 1ST_index...)
 ?

 Do we even need to expose them as ref-like things as a part of the
 external API/UI in the first place?  For end-user scripts that want
 to operate in a real or borrowing worktree, there should be no
 reason to touch this other repository directly.  Things like if
 one of the wortrees tries to check out a branch that is already
 checked out elsewhere, error out policy may need to consult the
 other worktrees via $GIT_COMMON_DIR mechanism but at that level we
 have all the control without contaminating end-user facing ref
 namespace in a way main/FETCH_HEAD... do.

No, external API/UI is just extra bonus. We need to (or should) do so
in order to handle $GIT_COMMON_DIR/HEAD exactly like how we do normal
refs. Given any ref, git_path(ref) gives the path to that ref,
git_path(logs/%s, ref) gives the path of its reflog. By mapping
special names to real paths behind git_path(), We can feed
$GIT_COMMON_DIR/HEAD (under special names) to refs.c and it'll handle
correctly without any changes for special cases.

 You said

 This makes it possible for a linked checkout to detach HEAD of
 the main one.

 but I think that is misguided---it just makes it easier to confuse
 users, if done automatically and without any policy knob. It instead
 should error out, while saying which worktree has the branch in
 question checked out. After all, checking out a branch that is
 checked out in another worktree (not the common one) needs the same
 caution, so main/HEAD is not the only special one.

 And if your updated git checkout 'frotz' gives a clear report of
 which worktree has the branch 'frotz' checked out, the user can go
 there to detach the HEAD in that worktree to detach with

 git -C $the_other_one checkout HEAD^0

 if he chooses to.

Jonathan mentions about the checkout in portable device case that
would make the above a bit unnatural as you just can't cd there (git
update-ref still works).

I don't see any problems with checking out a branch multiple times. I
may want to try modifying something in the branch that will be thrown
away in the end. It's when the user updates a branch that we should
either error+reject or detach other checkouts. I guess it's up to the
user to decide which way they want. The error+reject way may make the
user hunt through dead checkouts waiting to be pruned. But we can
start with error+reject then add an option to auto-detach.
-- 
Duy
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v4 01/23] sha1write: make buffer const-correct

2013-12-22 Thread Christian Couder
On Sat, Dec 21, 2013 at 2:59 PM, Jeff King p...@peff.net wrote:
 We are passed a void * and write it out without ever

s/are passed/pass/

Cheers,
Christian.
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] do not pretend sha1write returns errors

2013-12-22 Thread Thomas Rast
Jeff King p...@peff.net writes:

 The sha1write function returns an int, but it will always be
 0. The failure-prone parts of the function happen in the
 flush callback, which cannot pass an error back to us. So
 we just end up calling die() during the flush.

 Let's just drop the return value altogether, as it only
 confuses callers into thinking that it might be useful.

 Only one call site actually checked the return value. We can
 drop that check, since it just led to a die() anyway.

 Signed-off-by: Jeff King p...@peff.net

Thanks.

 This is kind of a step backwards if we ever wanted to actually make
 sha1write's return code mean anything. But I just don't foresee that
 happening.

Meh.  It hasn't returned a useful value since its introduction in 2005.

-- 
Thomas Rast
t...@thomasrast.ch
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: questions / suggestions about history simplification

2013-12-22 Thread Adam Spiers
On Sat, Dec 21, 2013 at 10:44:43PM -0800, Junio C Hamano wrote:
 Adam Spiers g...@adamspiers.org writes:
 
  I doubt it.  75% of the work for such a person to understand the
  behaviour from such an example is to understand what kind of history
  the example is building.
 
  Agreed.  And that's precisely why I wanted a real repository
  manifesting the given example: being able to view it in gitk makes
  that a lot easier to understand, for obvious reasons.
  ...
  Well I didn't roll my own; I just copied the example from the man
  page.  So it only tells you that I was spending a fair amount of
  effort trying to understand the man page ;-)
 
 Oh, that part I would agree, but then ...
 
  Not if the man page says if you wish to experiment with these options
  yourself, you can easily recreate the repository in this example by
  running the script contrib/foo bundled in the source distribution.
  ...
  The goal of sharing the series of commands is not to educate users
  through reading them, but simply to save them the time they would have
  to spend manually recreating the example scenario given in the man
  page.
 
 ... this part could be even easier if distro ships a sample repository,
 not a recipe to create such a sample repository, no?

It could ship either or both, but shipping a repository only saves the
user from having to run a single command to create the repository from
the script, and even that advantage is negated if the user wishes the
repository to be read/write (since then they would need cp -a ...).

However, the question of how the distro should ship it is separate to
the question of how the git source repository and release tarballs
should provide it.  Including a script (e.g. in contrib/) means that
any changes to the man page can trivially be mirrored in that script
within a single commit, giving all the normal advantages git offers
for tracking source.  And then the distro can easily create the sample
repository from the script at package build-time, if they want to.

OTOH, including a sample repository embedded within the git repository
is either impossible or very ugly (e.g. having a non-default value of
GIT_DIR for the embedded repository).  But I doubt you were suggesting
that ;-)

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Rationale behind 'extern' on protypes in .h files

2013-12-22 Thread Ravi Shekhar Jethani
Hi,
I am a C  Data Structure teacher at at an institute. Up until now I
have only written classroom type C code. I always had interest in
operating systems especially Linux Kernel but never got into anything
just passively followed 'lwn.net', 'dr dobbs magazine' etc.

But the series '30 Linux Kernel Developers in 30 Weeks' on 'linux.com'
has really pushed me over the edge and  I have this new found interest
towards Linux and C. As these articles suggest that I should have a
good experience in user space before I can even touch the kernel. So
to get a feel of how actual real world C projects look like I have
taken two excellent projects as my reference 'Git' and 'libvirt'. Also
'Linux Formrnal Scratch' to get started towards Linux internals.

Although I have a good understanding of the basic/advanced concepts
like pointers, memory management etc but many concepts like good error
handling mechanism, unit testing, build systems etc. were always
blurry but I know that this is just a start towards a long journey
towards Linux Kernel.


Now, my real question :
1) I cannot understand the reason behind making function prototypes as
extern. What purpose does this serve? AFAIK we put definition in a .c
file and the prototype in a .h thats it.

2) Why are some  prototypes in some of the .h file are extern and
others are not?

Thank you guys for reading through. Any suggestions are humbly welcome.
Ravi S. Jethani
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Rationale behind 'extern' on protypes in .h files

2013-12-22 Thread Stefan Beller
On 22.12.2013 16:51, Ravi Shekhar Jethani wrote:

 
 Now, my real question :
 1) I cannot understand the reason behind making function prototypes as
 extern. What purpose does this serve? AFAIK we put definition in a .c
 file and the prototype in a .h thats it.
 
 2) Why are some  prototypes in some of the .h file are extern and
 others are not?
 
 Thank you guys for reading through. Any suggestions are humbly welcome.
 Ravi S. Jethani
 --
 To unsubscribe from this list: send the line unsubscribe git in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 

That's an interesting question. From my understanding there is no
difference for functions declarations being set to extern or not,
because extern is the default on functions. It is however important
for variables to mark them extern if they're in a header file.

After a quick research on the web, it may be there for historical
reasons. Back then, when there were one pass compilers,
the compiler needed to know where the function is to be found.
(extern was not default at these times?)

Another reason I could make up, would be to indicate,
whether the function is in the c file having the
same name as the header file.
For example in builtin.h there are all the functions declared
extern, as the implementation of the functions are found in
builtin/*.c
In another header cache-tree.h there are function declarations
without explicit extern keyword, and the implementation
is found in the cache-tree.c file (${SAME-NAME-AS-HEADER}.C)
This is however not always the case as found in archive.{c,h}

I'd be also interested in knowing the real reason for this rationale.

Thanks,
Stefan



--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Rationale behind 'extern' on protypes in .h files

2013-12-22 Thread Jed Brown
Stefan Beller stefanbel...@googlemail.com writes:
 From my understanding there is no
 difference for functions declarations being set to extern or not,
 because extern is the default on functions.

There is a difference for shared libraries if you would like to control
which symbols are exported.  With gcc, for example, you might compile
using -fvisibility=hidden.  Any functions explicitly declared with
extern, bearing __attribute__((visibility(default)), or using
visibility pragmas will be exported (similar to __declspec(dllexport) on
Windows).  Other functions will be internal to the shared library so you
don't have to worry about callers depending on those symbols and
performance can be a bit better by skipping the PLT and avoiding symbol
relocations at load time.  See Drepper's guide for more.

http://www.akkadia.org/drepper/dsohowto.pdf


pgpZa1gqS6XJz.pgp
Description: PGP signature


[PATCH] [RFC] Making use of bitmaps for thin objects

2013-12-22 Thread Ben Maurer
I've been hacking with the performance of git on a large, quickly
changing git repo used inside Facebook. Pulling a week of changes
from this repo can be quite painful.

$ { echo HEAD  echo ^$have; } | time git pack-objects --no-use-bitmap-index 
--revs --stdout  /dev/null
Counting objects: 221082, done.
Compressing objects: 100% (20174/20174), done.
Total 221082 (delta 197889), reused 215144 (delta 194084)
86.53user 7.89system 1:12.76elapsed 129%CPU (0avgtext+0avgdata 
5746608maxresident)k
0inputs+0outputs (0major+3177262minor)pagefaults 0swaps

Jeff King's bitmap branch appears to give a very substantial speedup.
After applying this branch, the counting objects phase is basically
free. However, I found that the compression phase still takes a
non-trivial amount of time.

$ { echo HEAD  echo ^$have; } | time ../git-bitmap/install/bin/git 
pack-objects --use-bitmap-index --revs --stdout  /dev/null
Counting objects: 220909, done.
Compressing objects: 100% (20038/20038), done.
Total 220909 (delta 197794), reused 215074 (delta 194050)
47.96user 2.45system 0:28.39elapsed 177%CPU (0avgtext+0avgdata 
5890768maxresident)k
0inputs+0outputs (0major+984875minor)pagefaults 0swaps

It looks like most of the time spent compressing objects was in cases
where the object was already compressed in the packfile, but the delta
was based on an object that the client already had. For some reason,
--thin wasn't enabling reuse of these deltas.

This is a hacky, poorly styled attempt to figure how how much better
performance could be. With the bitmap branch, git should know what
objects the client has already and can easily test if an existing delta
can be reused. I don't know the branch well enough to code this, so
as a hack, I just assumed the client has any delta base that is in
the pack file (for our repo, this is always true, because we have a
linear history)

This greatly reduces the time:

$ { echo HEAD  echo ^$have; } | time ../git-bitmap/install/bin/git 
pack-objects --use-bitmap-index --revs --stdout --thin  /dev/null
Counting objects: 220909, done.
Compressing objects: 100% (14203/14203), done.
Total 220909 (delta 194050), reused 220909 (delta 199885)
3.57user 1.28system 0:04.59elapsed 105%CPU (0avgtext+0avgdata 
2007296maxresident)k
0inputs+0outputs (0major+416243minor)pagefaults 0swaps

I didn't do much testing here, I did run git-index-pack --fix-thin
then git fsck, as a sanity check.

I'd appreciate feedback here, if this is a valid approach, maybe
somebody more involved in the bitmap branch would be interested
in implementing the actual logic to figure out if the thin
revision is valid

 * I was rather confused as to how --thin works today. I could
   not figure out how the choice was made to reuse a thin delta
   in the existing code base.
 * I had a very hacky way of communicating to the pack writing code
   that there was a delta, but that it need not take that into
   account for the pruposes of sorting the file. Does anybody have
   a better suggestion here?

Signed-off-by: Ben Maurer bmau...@fb.com
---
 builtin/pack-objects.c | 26 +++---
 pack-objects.h |  1 +
 2 files changed, 20 insertions(+), 7 deletions(-)

diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index c2f2847..3dc4411 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -53,6 +53,7 @@ static unsigned long pack_size_limit;
 static int depth = 50;
 static int delta_search_threads;
 static int pack_to_stdout;
+static int thin = 0;
 static int num_preferred_base;
 static struct progress *progress_state;
 static int pack_compression_level = Z_DEFAULT_COMPRESSION;
@@ -347,7 +348,8 @@ static unsigned long write_no_reuse_object(struct sha1file 
*f, struct object_ent

 /* Return 0 if we will bust the pack-size limit */
 static unsigned long write_reuse_object(struct sha1file *f, struct 
object_entry *entry,
-   unsigned long limit, int usable_delta)
+   unsigned long limit, int usable_delta,
+   const unsigned char* thin_delta_sha1)
 {
struct packed_git *p = entry-in_pack;
struct pack_window *w_curs = NULL;
@@ -361,7 +363,11 @@ static unsigned long write_reuse_object(struct sha1file 
*f, struct object_entry
if (entry-delta)
type = (allow_ofs_delta  entry-delta-idx.offset) ?
OBJ_OFS_DELTA : OBJ_REF_DELTA;
-   hdrlen = encode_in_pack_object_header(type, entry-size, header);
+
+   if (thin_delta_sha1)
+   type = OBJ_REF_DELTA;
+
+   hdrlen = encode_in_pack_object_header(type, thin_delta_sha1 ? 
entry-delta_size : entry-size, header);

offset = entry-in_pack_offset;
revidx = find_pack_revindex(p, offset);
@@ -403,7 +409,7 @@ static unsigned long write_reuse_object(struct sha1file *f, 
struct object_entry
return 0;
}
sha1write(f, header, 

RE: [PATCH] [RFC] Making use of bitmaps for thin objects

2013-12-22 Thread Ben Maurer
One issue with this approach is that it seems git-pack-index doesn't perform as 
well with thin packs. git-index-pack uses a multi-threaded approach to 
resolving the deltas. However, the multithreading only works on deltas that are 
exclusively in the pack. After the multi-threaded phase, it incrementally 
brings in objects from external packs, but in single threaded manner. Many 
objects in the pack have some dependency on an external object, therefore, 
defeating the multithreading.

What's the use case for a pack file with a SHA1 reference living inside the 
pack file (why not just use an offset?) Would it make sense to assume that all 
such files are external and bring them in in the first phase.--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH/POC 3/7] setup.c: add split-repo support to .git files

2013-12-22 Thread Duy Nguyen
On Sat, Dec 14, 2013 at 3:43 AM, Jonathan Nieder jrnie...@gmail.com wrote:
 Problems:

  * What if I move my worktree with mv?  Then I still need the
corresponding $GIT_SUPER_DIR/repos/id directory, and nobody told
the GIT_SUPER_DIR about it.

  * What if my worktree is on removable media (think network
filesystem) and has just been temporarily unmounted instead of
deleted?

 So maybe it would be nicer to:

   i. When the worktree is on the same filesystem, keep a *hard link* to
  some file in the worktree (e.g., the .git file).  If the link count
  goes down, it is safe to remove the $GIT_SUPER_DIR/repos/id
  directory.

Link count goes down to 1 if I move the worktree to a different
filesystem and it's not safe to remove $GIT_SUPER_DIR/repos/id in
this case, I think.

  ii. When the worktree is on another filesystem, always keep
  $GIT_SUPER_DIR/repos/id unless the user decides to manually
  remove it.  Provide documentation or a command to list basic
  information about $GIT_SUPER_DIR/repos directories (path to
  worktree, what branch they're on, etc).

 (i) doesn't require any futureproofing.  As soon as someone wants it,
 they can implement the check and fall back to (ii) for worktrees
 without the magic hard link.

 (ii) would benefit from recording the working tree directory as a
 possibly unreliable, human-friendly reminder.
-- 
Duy
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html