Re: Slightly off topic, question about git

2022-06-06 Thread David Brownlee
On Mon, 6 Jun 2022 at 06:59, Brian Buhrow  wrote:
>
> Hello.  At the risk of raising the debate about which version control 
> system we should
> use, I have a question about git, as well as a comment about it relative to 
> the NetBSD source
> tree.  I should preface my comments with the caveat that I am not by any 
> means a git expert,
> and, in fact, I'm barely able to get anything I want out of it.  With that 
> said, here are my
> questions and observations.  I'd be interested to know how others work around 
> these issues
> and/or what you think of my observations.
>
> 1.  In CVS, I can do something like:
> cvs log sys/dev/pci/if_bge.c
> and be given a complete history of the changes to that file, as well as a 
> list of all the
> branches that file participates in and which versions apply to each branch.  
> And, I can do this
> without having to download all of the history of that file onto my local 
> storage.
> It seems like the only way to do this with a git repository is to 
> download the entire
> source tree, along with its history and branches, using git clone with an 
> infinite depth.  Is
> this correct?  If not, how can I see all the branches of a given repository 
> without having to
> download the entire repository?

git inherently looks at the local copy of the repo. So your options are
- have a local copy
- ssh to somewhere with a local copy
- use a web tool or similar to browse

> 2.  Also, in my exploration of git, it seems like the git log command shows 
> all the commits for
> each tag, rather than the comments for a specific file or object in the 
> repository.  Again, is
> this correct?

You can do either or both - "git log trunk" "git log build.sh" or "git
log trunk build.sh"

As an aside, I have an alias of gl -> "git log --name-status" as I
really prefer to see the filenames changed in each commit

> If I am correct in my guesses about how git works, it seems like I 
> would have to download
> the entire history of the NetBSD source tree if I want to browse its 
> branches, or the commit
> history for any given file.  This is a lot of overhead to examine tiny 
> portions of the tree,
> relatively speaking, assuming we move to git for our version control system.  
> It strikes me
> that requiring this much storage space from developers, would be a regression 
> from what we
> currently do.  Since I think we're smarter than that and since we have very 
> smart people on our
> development team, I want to understand what it is that I don't get about git 
> that precludes me
> from having to download the entire history of the source tree from day one 
> while still
> retaining access to that history over time.

"It's a feature". Half :) - Seriously though, the ability to actually
browse and search the full history of a source tree as git allows
compared to the godawful eye-of-the-needle view that CVS provides is a
very valuable benefit of the tradeoff of having a local history. When
looking at source tree history I use a cloned copy of the github src,
then apply to the CVS tree as needed.

For people with limited resources it will be a pain, though there are
any number of services which provide remote web access to git trees.
Having said that, the ever increasing memory requirements of modern
gcc is a much bigger pain for limited resources with a relatively
smaller benefit.

I suspect most of this also works with s/git/hg/ assuming NetBSD
switches to a mercurial repo

David


Re: Slightly off topic, question about git

2022-06-06 Thread Greg Troxel

David Brownlee  writes:

> I suspect most of this also works with s/git/hg/ assuming NetBSD
> switches to a mercurial repo

Indeed, all of this is not really about git.  Systems in the class of
"distributed VCS" have two important properties:

  commits are atomic across the repo, not per file

  anyone can prepare commits, whether or not they are authorized to
  apply them to the repo.  an authorized person can apply someone else's
  commit.

These more or less lead to "local copy of the repo".  And there are web
tools for people who just want to look at something occasionally.But
I find that it's not that big, that right now I have 3 copies (8, 9,
current), and that it's nice to be able to do things offline (browse,
diff, commit).

CVS is really just RCS with
  organization into groups of files
  ability to operate over ssh (rsh originally :-)
That was really great in 1994; I remember what a big advance it was
(seriously).
  


signature.asc
Description: PGP signature


Re: Slightly off topic, question about git

2022-06-06 Thread matthew sporleder


> On Jun 6, 2022, at 2:00 AM, Brian Buhrow  wrote:
> 
> Hello.  At the risk of raising the debate about which version control 
> system we should
> use, I have a question about git, as well as a comment about it relative to 
> the NetBSD source
> tree.  I should preface my comments with the caveat that I am not by any 
> means a git expert,
> and, in fact, I'm barely able to get anything I want out of it.  With that 
> said, here are my
> questions and observations.  I'd be interested to know how others work around 
> these issues
> and/or what you think of my observations.
> 
> 1.  In CVS, I can do something like:
> cvs log sys/dev/pci/if_bge.c
> and be given a complete history of the changes to that file, as well as a 
> list of all the
> branches that file participates in and which versions apply to each branch.  
> And, I can do this
> without having to download all of the history of that file onto my local 
> storage.
>It seems like the only way to do this with a git repository is to download 
> the entire
> source tree, along with its history and branches, using git clone with an 
> infinite depth.  Is
> this correct?  If not, how can I see all the branches of a given repository 
> without having to
> download the entire repository?
> 

Look up git shallow clone and git sparse checkout (with options sparse index)

There is also a filtered clone and clone single branch.  


> 2.  Also, in my exploration of git, it seems like the git log command shows 
> all the commits for
> each tag, rather than the comments for a specific file or object in the 
> repository.  Again, is
> this correct?
> 

git log -- file


>If I am correct in my guesses about how git works, it seems like I would 
> have to download
> the entire history of the NetBSD source tree if I want to browse its 
> branches, or the commit
> history for any given file.  This is a lot of overhead to examine tiny 
> portions of the tree,
> relatively speaking, assuming we move to git for our version control system.  
> It strikes me
> that requiring this much storage space from developers, would be a regression 
> from what we
> currently do.  Since I think we're smarter than that and since we have very 
> smart people on our
> development team, I want to understand what it is that I don't get about git 
> that precludes me

Read the docs I referred to above or emails from me in tech-repository or just 
ask these questions to your favorite search engine. 

https://git-scm.com/docs/git-clone


> from having to download the entire history of the source tree from day one 
> while still
> retaining access to that history over time.
> 
> -thanks
> -Brian
> 


Your assumptions are incorrect.  Git is faster and can probably use even less 
space than a cvs checkout if you are smart about it. 

Re: Slightly off topic, question about git

2022-06-06 Thread Johnny Billquist

On 2022-06-06 11:32, Greg Troxel wrote:


David Brownlee  writes:


I suspect most of this also works with s/git/hg/ assuming NetBSD
switches to a mercurial repo


Indeed, all of this is not really about git.  Systems in the class of
"distributed VCS" have two important properties:

   commits are atomic across the repo, not per file


True of most any VCS, distributed or not. It's rather CVS that is the 
odd man out here. But it's sortof a bit loose (or weird) in distributed 
VCSs, since you might do that to your local repo, but then it becomes 
"diluted" when it gets applied to another (upstream) repo.



   anyone can prepare commits, whether or not they are authorized to
   apply them to the repo.  an authorized person can apply someone else's
   commit.


Anyone can "prepare" a commit on any kind of VCS. It's the actual commit 
that is always the gate.
The difference might more be in how you pass a commit over to someone 
else to apply. With CVS, I usually create a diff, and send that to 
someone who have the rights to apply it. Works just fine. (Since I don't 
have any commit rights on NetBSD for example.)



These more or less lead to "local copy of the repo".  And there are web
tools for people who just want to look at something occasionally.But
I find that it's not that big, that right now I have 3 copies (8, 9,
current), and that it's nice to be able to do things offline (browse,
diff, commit).


Local copy is required with git, since everyone actually have their own 
VCS. And then you have some upstream VCS which you work 2-way with, in 
relation to your own VCS.


Pros and cons, as always.


CVS is really just RCS with
   organization into groups of files
   ability to operate over ssh (rsh originally :-)
That was really great in 1994; I remember what a big advance it was
(seriously).


True.

I've recently come to realize a thing with git I really abhor. It has a 
very loose view on history immutability. I've seen branches, which 
claims to come from some point, where the branch is way older than the 
revision it claims to have been branched off.
Which obviously is impossible. But history rewriting seems to be a 
favorite pastime of git users.


For me, one of the really big points of VCS is that history is never 
changed. I can go back and see what was done, where, to what.


Since git actually is multiple, independent VCSs, what happens on one 
don't necessarily at all come across to another, and in the process of 
aligning them, history have to be rewritten to even get close to make 
some kind of sense.


I'm not at all convinced this is a good system. But that's just me. :-)

  Johnny

--
Johnny Billquist  || "I'm on a bus
  ||  on a psychedelic trip
email: b...@softjar.se ||  Reading murder books
pdp is alive! ||  tryin' to stay hip" - B. Idol


Re: Slightly off topic, question about git

2022-06-06 Thread Mouse
> [...], I have a question about git, [...]

I'm not an _expert_ on git, but I have been using it for close on a
decade now and have developed a certain amount of expertise.

> 1.  In CVS, I can do something like:
> cvs log sys/dev/pci/if_bge.c
> and be given a complete history of the changes to that file, as well
> as a list of all the branches that file participates in and which
> versions apply to each branch.

git log -- sys/dev/pci/if_bge.c

> And, I can do this without having to download all of the history of
> that file onto my local storage.

That, you will not find with git.  git does, somewhat, support what is
called a shallow clone of a repo, but it is limited by restricting it
to recent commits, not by restricting it to only certain portions of
the tree.  I'm not aware of any way to do the latter.

> It seems like the only way to do this with a git repository is to
> download the entire source tree, along with its history and branches,
> using git clone with an infinite depth.  Is this correct?

Close.

What you want here is not well-supported by git; it is antithetical to
what as I understand it is one of the underlying tenets of git, the
distributed nature of it.  (See below for a little more on this.)

I would say that the best way to set something like that up with a DVCS
would be to provide ssh logins on a central repo-holding machine; if
you want to lock it down further, restrict what those logins can run.

> 2.  Also, in my exploration of git, it seems like the git log command
> shows all the commits for each tag, rather than the comments for a
> specific file or object in the repository.  Again, is this correct?

Well, I'm not sure what you mean here by "all the commits for each
tag".  In git, a tag is attached to a single commit (which can affect
multiple files, but it's still a single commit).  That is, "all the
commits for [a] tag" is always a set of size one (or size zero, if no
tag with that name exists).

I'm guessing here, but my guess is that you are coming from a CVS
mindset, in which a changeset affecting multiple files is considered
one commit per file.  That's not how git works.  In git, a commit
consists, conceptually, of a tree (a packaging-up of all the file
contents and directory structure) plus some overhead, such as a commit
message, author name, and a few other small things.  There is
cleverness under the hood to optimize away most of the storage that
appears to imply in most cases, but that's the concept.

As for restricting git log output to a single file or directory
subtree, you can do that with something like

git log tagname -- file file file...

> If I am correct in my guesses about how git works, it seems like I
> would have to download the entire history of the NetBSD source tree
> if I want to browse its branches, or the commit history for any given
> file.

Close, yes.

> This is a lot of overhead to examine tiny portions of the tree,
> relatively speaking, assuming we move to git for our version control
> system.

It is.  That's why there are various tools out there that let you look
at only part of a tree, kind of like cvsweb.  I've written one myself,
which uses puffs to present a filesystem view of a git repo.  You can
find a live example of it in my anonymous FTP space (also available
over HTTP), ftp.rodents-montreal.org:/mouse/git-unpacked; this includes
the history of my semi-private forks of three NetBSD versions, which
admittedly is far less than full NetBSD history.  (The version in my
FTP space also includes a lot of other repos; the NetBSD ones are under
Mouse/netbsd-fork/.)

> It strikes me that requiring this much storage space from developers,
> would be a regression from what we currently do.

Yes, it would be.  Personally, I think the benefits it brings would be
worth it.

I have access to a copy, on a work machine, of the Linux kernel git
repo as of sometime 2020-10-15.  I don't know how it would compare to a
repo with full NetBSD history, but it's the closest thing I have access
to.  The checked-out tree size is close to that for NetBSD 5.2 /usr/src
(based on du -s output - Linux kernel, 1149672k, NetBSD src, 947992k).

The .git directory, holding all the overhead, is 1800764k.  (That's for
the Linux repo; for my NetBSD fork, 214196k, but I have comparatively
few commits - I didn't import full NetBSD history, instead just
starting from NetBSD 5.2 source as released.  The size of the overhead
is, in most cases, more dependent on the size of the commit tree than
on the size of the checked-out tree - though that's true only when the
tree is mostly changes to existing files; if you're constantly
introducing new files, it becomes less so.)

Personally, not even I, retrocomputing geek that I am, find two gigs of
overhead onerous for the benefits it brings.  Significantly more
onerous is that git really really wants you to have enough RAM to keep
stat() results for the whole working tree in core; various common
operations become painfully slow i

Re: Slightly off topic, question about git

2022-06-06 Thread Mouse
> I've recently come to realize a thing with git I really abhor.  It
> has a very loose view on history immutability.  I've seen branches,
> which claims to come from some point, where the branch is way older
> than the revision it claims to have been branched off.  Which
> obviously is impossible.  But history rewriting seems to be a
> favorite pastime of git users.

That's not a fault of git; that's a fault of how some people use git.

I recently had occasion to go through and expunge certain content from
a (git) repo.  It was neither convenient, simple, nor fast, even though
the content in question consisted of two files whose names remained
constant throughout their history.

> For me, one of the really big points of VCS is that history is never
> changed.  I can go back and see what was done, where, to what.

And git can be used that way.  No VCS is ever truly never-change,
unless you use write-once media to store it, and even then it is always
vulnerable to reconstructing a new repo from the ground up based on the
old repo.

> Since git actually is multiple, independent VCSs, what happens on one
> don't necessarily at all come across to another, and in the process
> of aligning them, history have to be rewritten to even get close to
> make some kind of sense.

Not really; history doesn't _have_ to be rewritten.  That's what merge
commits are for.  People just choose to rebase work instead of merging.
(Personally, I think that's a mistake, for various reasons, but, as you
point not, not everyone agrees.)

/~\ The ASCII Mouse
\ / Ribbon Campaign
 X  Against HTMLmo...@rodents-montreal.org
/ \ Email!   7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B


Re: Slightly off topic, question about git

2022-06-06 Thread Johnny Billquist

On 2022-06-06 14:33, Mouse wrote:

I've recently come to realize a thing with git I really abhor.  It
has a very loose view on history immutability.  I've seen branches,
which claims to come from some point, where the branch is way older
than the revision it claims to have been branched off.  Which
obviously is impossible.  But history rewriting seems to be a
favorite pastime of git users.


That's not a fault of git; that's a fault of how some people use git.


Well, you could argue that it's a fault in git that it allows it.

If there is a way, then some people will use it that way.


For me, one of the really big points of VCS is that history is never
changed.  I can go back and see what was done, where, to what.


And git can be used that way.  No VCS is ever truly never-change,
unless you use write-once media to store it, and even then it is always
vulnerable to reconstructing a new repo from the ground up based on the
old repo.


Sure. You can change history in CVS as well. But you'll have to go in 
there and much with the data that is beind. It's not like the UI itself 
allows you to work that way. And I've not seen anything similar in a 
whole bunch of other VCSs I've worked with either. But I've generally 
not worked on distributed once before.
And I sortof can see why people want to go that way, since with 
distributed VCSs, it becomes much harder to have a linear history. But 
they still want to kindof/sortof fake it.



Since git actually is multiple, independent VCSs, what happens on one
don't necessarily at all come across to another, and in the process
of aligning them, history have to be rewritten to even get close to
make some kind of sense.


Not really; history doesn't _have_ to be rewritten.  That's what merge
commits are for.  People just choose to rebase work instead of merging.
(Personally, I think that's a mistake, for various reasons, but, as you
point not, not everyone agrees.)


It sortof have to. Since if you've done various work, and others have 
done various work on the same files, and both have done commits, it 
might not be possible to merge as is. And so you'll have to rewrite 
parts that you already committed in order to get things back to a 
coherent state.


This is a nasty problem when you have separate VCSs. Well, it becomes 
nasty because somewhere in the end, you still have a master VCS, which 
holds the source of truth. Distributed VCSs are not truly distributed. 
There is still just one master. It's just about how you work in relation 
to it. I can see some advantages, but I'm still not sure if they 
outweigh the disadvantages that I feel. But that is of course very 
subjective.


  Johnny

--
Johnny Billquist  || "I'm on a bus
  ||  on a psychedelic trip
email: b...@softjar.se ||  Reading murder books
pdp is alive! ||  tryin' to stay hip" - B. Idol


Re: Slightly off topic, question about git

2022-06-06 Thread Gerhard Sittig
[ For those in a hurry: Do get the Pro Git book, maybe watch the
  Linus and Scott videos. They helped me "get it", and wish to
  never go back to a life without git. :-D  Seriously! ]

On Sun, 2022-06-05 at 22:59 -0700, Brian Buhrow wrote:
>
> [ ... ] I should preface my comments with the caveat that I am
> not by any means a git expert, and, in fact, I'm barely able to
> get anything I want out of it.  With that said, here are my
> questions and observations. [ ... ]

There is no problem with that I assume. From personal experience
I can tell that git takes some getting used to. But once you do
you don't want to go back. Seriously. It's just that approaching
git with svn or cvs in mind feels tedious, while it need not.
Accepting that git is a different tool with a different
philosophy is important. But I'm preaching to the choir here,
since this is a Unix ML. :)

Have been struggling with git for several years myself. Couldn't
get it. Was lost. The cheat sheets and cook books did not help.
References are not useful if you don't know what to look for. Did
not like that situation. This must have been some two or three
years of "fighting" git, and Unix people know that users don't
win in that setup.

The Pro Git book was the first document that helped me. A lot.
See git-scm.org -> book. Takes you to https://git-scm.com/book/en/v2
right now. Do yourself a favour and do _not_ skip any of the
first two chapters. Learning how git works internally enables you
to use it in ways that you haven't dreamed of before. No kidding.
If the book doesn't work for you (could be, not everybody is the
same), it's worth a try nevertheless. The concepts are simple,
Linus created the first working version of the tool in two weeks.
Later the UI has improved but the concepts have proven stable.

Another useful resource, faster to watch than reading the book,
could be Linus' and Scott's videos. See git-scm.org -> doc ->
external links -> bottom. Linus (the initial git author) at
Google https://www.youtube.com/watch?v=4XpnKHJAok8 and Scott
(github employee, author of the great Pro Git book) giving an
overview of essential commands. May be a challenge regarding the
audio track, but native speakers may bother less or not notice at
all. Don't get put off by opinions, the points raised are valid,
and precious to keep in mind.

With that being said first, let's see the questions that you
raised.

> 1.  In CVS, I can do something like:
> cvs log sys/dev/pci/if_bge.c
> and be given a complete history of the changes to that file, as
> well as a list of all the branches that file participates in
> and which versions apply to each branch.  And, I can do this
> without having to download all of the history of that file onto
> my local storage.

As others said it's an essential part of the design that git
operates on a local clone. It's the very essence of a distributed
system. The assumption is that disk space is cheaper than network
traffic. Experience strongly suggests that local disk access is
faster than a remote server. Being able to work offline is a
byproduct, though a very nice one. Linus talks about this aspect.
It's not just an arbitrary implementation detail, it's really
essential, and enables you in unseen ways. Tracking single files
is what he doesn't actually do, but being able to track an
individual file from the complete tree's history is what you get
as a byproduct. Changes your mind once you see it (again, not
kidding).

Others stated that the copy may be rather efficient, could even
be smaller than what you got today with other systems. It's
deduplicated, and compressed. Worth checking, you may be
surprised. Works for projects like KDE and others.

When you got some local copy, subsequent copies optionally can
reference it. Saves both disk space and network traffic. Am using
the --reference option here extensively (notebook, SSD, useful).
Others use caching proxies in their lab or classes. Or something
on your local server that just git clones (optionally bare) and
that you clone from to your workstation. Works transparently.
Because: distributed, by design.

Sparse downloads are supported, too, as others noted. But these
are more popular in build setups that are not interested in the
history, only want the current state. A regular clone gets you
the full history, by design. From local disk. Fasten your seat
belt. :) Also lets you switch between branches and revisions
within seconds. (Developers may find bisection a killer feature
that they never want to miss again.)

> 2.  Also, in my exploration of git, it seems like the git log
> command shows all the commits for each tag, rather than the
> comments for a specific file or object in the repository.
> Again, is this correct?

Am finding myself looking up a lot of 'git --help', and
'git  --help'. Alternatively use 'man git-'. Reoccuring
subjects need not be discussed in detail in every individual
command's page. There are generic subjects, the 'git --help'
output refers to t

Re: Slightly off topic, question about git

2022-06-06 Thread Gerhard Sittig
On Mon, 2022-06-06 at 08:33 -0400, Mouse wrote:
>
> > I've recently come to realize a thing with git I really abhor.  It
> > has a very loose view on history immutability.  I've seen branches,
> > which claims to come from some point, where the branch is way older
> > than the revision it claims to have been branched off.  Which
> > obviously is impossible.  But history rewriting seems to be a
> > favorite pastime of git users.
>
> That's not a fault of git; that's a fault of how some people use git.

What Mouse said. Rebasing or force pushing what was _published_
before is even frowned upon. What you do to your local tree is up
to you. Go and arrange at will. This results in a better
submission that is easier to maintain at the upstream project's
side by the way, a useful feature and the reason why it exists.
But what was published, and what others are working with, "is
sacred" and must not change.

Rebasing a work in progress upon submission to upstream is a
useful feature, if the project prefers a linear history. The maze
of merges for feature branches are not preferred by everybody
alike. Rebase is not about making yourself look better to others.
It's beyond cosmetics. It's a means to achieve a code base that
is easier to work with at the maintainers' side.

As always: Go try it for yourself. Get a copy of a kernel, work
for weeks on a local change, and see how rebase against the
then-current version of upstream works for you. I liked it a lot.
It's careless force-pushing which makes me mad, and I don't want
to work with such a repo either. It's unfriendly behaviour.

Regarding dates, that an interesting thing. Git has the concept
of author date and commit date, which reflects the organization
of the project where the tool originated. Work on a change really
can have happened in any order which need not be the order of
acceptance into the common tree. And which repo is considered
_the_ tree of the project is not a git feature, but a convention
among involved persons. Some UIs may be adding to the confusion,
I believe github is one of those which doesn't present the order
of commits as they are in the repo. Their implementation detail,
not a limitation nor the fault of the git tool.

This git feature also BTW lets you locally use and explore work
that others are currently at, without going through one central
server. This is the very point of being distributed. No copying
patch files around manually.


virtually yours
Gerhard Sittig
--
 If you don't understand or are scared by any of the above
 ask your parents or an adult to help you.


Re: Slightly off topic, question about git

2022-06-06 Thread Mouse
>>> [H]istory rewriting seems to be a favorite pastime of git users.
>> That's not a fault of git; that's a fault of how some people use
>> git.
> Well, you could argue that it's a fault in git that it allows it.

> If there is a way, then some people will use it that way.

But, if there isn't, some people will add it.  git rebase is very
little more than a loop containing git cherry-pick.

>> No VCS is ever truly never-change, [...]
> Sure.  You can change history in CVS as well.  But you'll have to go
> in there and much with the data that is beind.

And in git, that's significantly harder to do than it is in CVS (well,
as I recall CVS; it's been long enough since I used it that my memory
is fuzzy).  If you just change (say) a commit message in the underlying
data, the resulting repo will be corrupt and will be noticed as corrupt
by certain operations; git is built on a foundation of a
content-addressible data store, in which a data blob's name is its
SHA-1.  Pointers are to that SHA-1, so if you change the contents you
will change the SHA-1 (unless you can second-preimage SHA-1 to give the
content you want).

> It's not like the UI itself allows you to work that way.

If the UI supports cherry-picks, the UI allows it.  As I remarked
above, rebasing is very little more than just a bunch of cherry-picks.

And I submit that a VCS that doesn't support cherry-picks is
significantly crippled.

> And I sortof can see why people want to go that way, since with
> distributed VCSs, it becomes much harder to have a linear history.
> But they still want to kindof/sortof fake it.

Some people do, perhaps.  Personally, I have no problem with merges.
My own repos, even those which have only me working on them, typically
include "Merge work from multiple machines" commits.

>> Not really; history doesn't _have_ to be rewritten.  That's what
>> merge commits are for.  People just choose to rebase work instead of
>> merging.
> It sortof have to.  Since if you've done various work, and others
> have done various work on the same files, and both have done commits,
> it might not be possible to merge as is.

Yes, merging can require manual assistance.  git includes tools to make
it easier to handle manual-assist merges; others exist as addons.

The need for them is one of the prices of the distributed model, just
as needing to manually perform much the same operations before
committing is a price of the centralized model.

> And so you'll have to rewrite parts that you already committed in
> order to get things back to a coherent state.

Merging two changesets that affect the same portions of the same files
inevitably will require that in some cases.

> This is a nasty problem when you have separate VCSs.  Well, it
> becomes nasty because somewhere in the end, you still have a master
> VCS, which holds the source of truth.  Distributed VCSs are not truly
> distributed.  There is still just one master.

Only if the humans involved insist on seeing it that way.  There is no
technical reason that has to be true.  git lends itself very well to
the "sure, fork it and see whose fork the userbase prefers" model.  Is
that a strength or a weakness?  Each use case has to decide that for
itself.

If the repo in question is used to produce a product with a single
distribution channel, then there will inevitably be some kind of master
in the sense of the one used to produce the distribution.  But that's
inevitable in that case; it's an artifact of the use case, nothing
inherent to the underlying VCS.

/~\ The ASCII Mouse
\ / Ribbon Campaign
 X  Against HTMLmo...@rodents-montreal.org
/ \ Email!   7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B


Re: Slightly off topic, question about git

2022-06-06 Thread Johnny Billquist

On 2022-06-06 15:12, Mouse wrote:

[H]istory rewriting seems to be a favorite pastime of git users.

That's not a fault of git; that's a fault of how some people use
git.

Well, you could argue that it's a fault in git that it allows it.



If there is a way, then some people will use it that way.


But, if there isn't, some people will add it.  git rebase is very
little more than a loop containing git cherry-pick.


It is more, since this can be done without any hint that this actually 
happened.


Basically, what I've seen in git:


o   Some commit (1-Jan-2022)
|
|   o   Some branch commit (29-Dec-2021)
|   |
|   /
|  /
| /
|/
oA commit (31-Dec-2021)
|
|


Now, how could "some branch commit" happen on Dec 29, when what it was 
based on was committed Dec 31 that same year? That's obviously not 
possible, and yet, that is what I see in git.


I assume (pretty sure actually) that the branch was created based on an 
earlier commit, and then rebased on to the later one. But it's not 
clear, nor is it clear which was the original commit it branched from. 
Sure, you could argue that this is not important, as it has been updated 
to be fully based on the newer commit, but I still find this disturbing, 
and there might be meta-understanding lost here. When the original 
branch was made, where was it made from? That might give some hint on 
why the branch was made, which is now lost. Not to mention the very 
strange view where the branch is older than where it stems from.


If this had been through cherry picking, in the normal sense, there 
would have been a commit with the cherry picked changes.


Same if a merge from another branch was brought in to update the code.


No VCS is ever truly never-change, [...]

Sure.  You can change history in CVS as well.  But you'll have to go
in there and much with the data that is beind.


And in git, that's significantly harder to do than it is in CVS (well,
as I recall CVS; it's been long enough since I used it that my memory
is fuzzy).  If you just change (say) a commit message in the underlying
data, the resulting repo will be corrupt and will be noticed as corrupt
by certain operations; git is built on a foundation of a
content-addressible data store, in which a data blob's name is its
SHA-1.  Pointers are to that SHA-1, so if you change the contents you
will change the SHA-1 (unless you can second-preimage SHA-1 to give the
content you want).


I think you missed my point. Like I said, if you go around and start 
mucking with the underlying actual data, be that in a database or 
filesystem, or whatever, then sure. You can do pretty much anything.


But with git, you don't have to do that. You can create this through the 
normal tools and interfaces. While in CVS you need to go under the radar 
of the VCS to do it.



It's not like the UI itself allows you to work that way.


If the UI supports cherry-picks, the UI allows it.  As I remarked
above, rebasing is very little more than just a bunch of cherry-picks.


I disagree. If you cherry pick something, then you make a new commit, 
and things are pretty clear and straight forward. The rebasing don't 
really do it that way.



And I submit that a VCS that doesn't support cherry-picks is
significantly crippled.


I would agree. But in fact, cherry picking is just a fancy way of saying 
you modified a file based on something already existing instead of 
writing it from scratch.
A VCS that don't support pulling out content from a previous commit is 
more than just significantly crippled. I would say it wouldn't be 
working as a VCS.



And I sortof can see why people want to go that way, since with
distributed VCSs, it becomes much harder to have a linear history.
But they still want to kindof/sortof fake it.


Some people do, perhaps.  Personally, I have no problem with merges.
My own repos, even those which have only me working on them, typically
include "Merge work from multiple machines" commits.


Not really; history doesn't _have_ to be rewritten.  That's what
merge commits are for.  People just choose to rebase work instead of
merging.

It sortof have to.  Since if you've done various work, and others
have done various work on the same files, and both have done commits,
it might not be possible to merge as is.


Yes, merging can require manual assistance.  git includes tools to make
it easier to handle manual-assist merges; others exist as addons.


Merging in itself it no magic, or bad. It's just that your history gets 
broken when you have two different histories that needs to be merged.
Merging source code isn't the problem. But history that is in conflict 
can never be cleanly solved. It has to be rewritten, and that is what I 
find distasteful.



The need for them is one of the prices of the distributed model, just
as needing to manually perform much the same operations before
committing is a price of the centralized model.


True. But forcing it to be resolved before doing a commit in a 
centralize

Re: Slightly off topic, question about git

2022-06-06 Thread matthew sporleder
On Mon, Jun 6, 2022 at 8:24 AM Mouse  wrote:

> > [...], I have a question about git, [...]
>
> I'm not an _expert_ on git, but I have been using it for close on a
> decade now and have developed a certain amount of expertise.
>
> > 1.  In CVS, I can do something like:
> > cvs log sys/dev/pci/if_bge.c
> > and be given a complete history of the changes to that file, as well
> > as a list of all the branches that file participates in and which
> > versions apply to each branch.
>
> git log -- sys/dev/pci/if_bge.c
>
> > And, I can do this without having to download all of the history of
> > that file onto my local storage.
>
> That, you will not find with git.  git does, somewhat, support what is
> called a shallow clone of a repo, but it is limited by restricting it
> to recent commits, not by restricting it to only certain portions of
> the tree.  I'm not aware of any way to do the latter.
>


I'm not a git expert or anything but here are some examples of restricting
things and savings disk space/memory

git clone --single-branch --depth 1 --branch netbsd-9 g...@github.com:
NetBSD/src.git
Cloning into 'src'...
remote: Enumerating objects: 182228, done.
remote: Counting objects: 100% (182228/182228), done.
remote: Compressing objects: 100% (146504/146504), done.
remote: Total 182228 (delta 47394), reused 115949 (delta 29844),
pack-reused 0
Receiving objects: 100% (182228/182228), 401.74 MiB | 8.93 MiB/s, done.
Resolving deltas: 100% (47394/47394), done.
Updating files: 100% (177533/177533), done.

src $ git branch
* netbsd-9
src $ du -sh .git/
428M.git/
src $ git log


git clone --single-branch --depth 1 --branch netbsd-9 --no-checkout
g...@github.com:NetBSD/src.git

...
cd src
git sparse-checkout init --cone
git sparse-checkout set sys

src $ git checkout
Updating files: 100% (29032/29032), done.
Your branch is up to date with 'origin/netbsd-9'.
src $ ls -l
total 392
-rw-r--r--   1 msporleder  wheel  53770 Jun  6 14:04 BUILDING
-rw-r--r--   1 msporleder  wheel  16543 Jun  6 14:04 Makefile
-rw-r--r--   1 msporleder  wheel355 Jun  6 14:04 Makefile.inc
-rw-r--r--   1 msporleder  wheel   1751 Jun  6 14:04 README.md
-rw-r--r--   1 msporleder  wheel  34634 Jun  6 14:04 UPDATING
-rwxr-xr-x   1 msporleder  wheel  70768 Jun  6 14:04 build.sh
drwxr-xr-x  36 msporleder  wheel   1024 Jun  6 14:05 sys


Re: Slightly off topic, question about git

2022-06-06 Thread Gerhard Sittig
On Mon, 2022-06-06 at 15:39 +0200, Johnny Billquist wrote:
>
> On 2022-06-06 15:12, Mouse wrote:
> [H]istory rewriting seems to be a favorite pastime of git users.
> >>>That's not a fault of git; that's a fault of how some people use
> >>>git.
> >>Well, you could argue that it's a fault in git that it allows it.
> >
> >>If there is a way, then some people will use it that way.
> >
> >But, if there isn't, some people will add it.  git rebase is very
> >little more than a loop containing git cherry-pick.
>
> It is more, since this can be done without any hint that this
> actually happened.
>
> Basically, what I've seen in git:
>
>
> o   Some commit (1-Jan-2022)
> |
> |   o   Some branch commit (29-Dec-2021)
> |   |
> |   /
> |  /
> | /
> |/
> oA commit (31-Dec-2021)
> |
> |
>
>
> Now, how could "some branch commit" happen on Dec 29, when what it
> was based on was committed Dec 31 that same year? That's obviously
> not possible, and yet, that is what I see in git.

This image may be incomplete. Are you "thinking centrally"
perhaps? Or am I missing something else?

It's true that commit "Some branch commit" could never have been
_committed_ before "A commit". But it perfectly could have been
_authored_ before that commit, pick, merge or whatever the action
was which made the commit appear in the tree. Could be helpful to
see --format=fuller, which is why I have an alias for that to
enable the option by default.

In my perspective git is just more honest here, and tries hard to
reflect what happened in the real world to the source code. Which
now makes you see roles that just did not exist at all in the
centralized setup. IIUC the CVS or SVN history only has the
concept of a committer doing a commit, and this name and this
date is taken and placed into the history. Which results in a
linear sequence based on the server machine's timebase. In git
you have the author (the person who created that content), which
can be different from the committer (a maintainer or integrator
or other user who happened to bring this content into _this_
specific repository). This better reflects the development model
of where git originated. May not translate equally well to the
NetBSD development model which appears to be centralized indeed.
But then git metadata will just have identical names and dates
for authoring and commits, no problem there.

In the past I had occurences where a maintainer picked up a
submitted change some three years after I submitted it. That's
not unusual AFAICS. The BSD projects probably also have branches
sitting somewhere which may get merged later into more recent
releases, and nobody would suggest that the content creation
would have been at the time of the merge. This could even be seen
as a misattribution of the work that was done. And this metadata
may matter with regard to prior art, licensing, or similar. You
remember AT&T who stalled some BSD development, and SCO who tried
to misrepresent the Linux history and resulting ownership?

Something similar would be seen when you pick fixes and merge
them to a release. The branch may be new but the commit with the
fixes may be newer or older than the branch point. Does that
explain what you outline above in the illustration, or am I
missing something?


virtually yours
Gerhard Sittig
--
 If you don't understand or are scared by any of the above
 ask your parents or an adult to help you.


Re: Slightly off topic, question about git

2022-06-07 Thread Gerhard Sittig
[ incomplete list of git features that a BSD developer may like ]

On Mon, 2022-06-06 at 14:40 +0200, Gerhard Sittig wrote:
>
> [ ... Brian Buhrow asked for useful access to VCS history ... ]

Your question started with a specific command accessing the
information for a single file ('cvs log sys/dev/pci/if_bge.c').
There could be something else that you intend perhaps. Let's see
whether these things help you, too.

There is 'git describe' that lets you see what you currently are
looking at. Typically results in the most recent tag plus a
number of commits and the resulting hash of what you specified as
input. Can be helpful to tell you how far your development has
gone since a release.

There is 'git describe --contains' which tells you which next tag
(read: a released version) contains a specific commit (read: a
fix or new feature or changed behaviour). This is useful for
maintenance, and the reason why some mention "Fixes: " in
their commit messages, so that other people or machinery can tell
whether their stable release needs updating/amending.

That git thinks of the whole content of the tree, and that a
filter is applied to narrow the result set when you specify dirs
or files, was mentioned before. Changes your perspective.

Tinker with 'git log' options, there are tons of them, and your
needs will differ depending on which situation you are in. One
aspect is "the zoom level": --oneline alone for bird's view, no
spec for author/date/message, -p to include the content change.
Others are --reverse, --format=, --name-status, --staged, etc.
The --stat option is very useful for reviewers/maintainers.

I don't like the 'git blame' name, because it's additional
information that I'm looking for and not a guilty person. :)
That's why I got an 'ann' alias to invoke 'blame'.

The default 'git status' output is very helpful if you are
looking for help on what to do next. But terrible to digest in
quick iterations. 'git status -s' helps parse that stuff. And
I feel that 'git commit -v' should be the default. To help create
useful commit messages.


Other things that can help the BSD projects regardless of whether
the decentralized development model is considered a fit:

Bisection lets you quickly navigate to a commit of interest. When
HEAD works for you, and an older version is said to have an
issue, then 'git bisect' lets you identify the commit which
probably introduced the change in behaviour. Either driven by a
human and supported by the machine, or fully automated if you
have a test condition which the machine can check for you. This
reduces a set of 10k changes between two releases to some 15
steps that you need to look at, before your attention is where
it needs to be. Of course this depends on whether a one-liner
enables something that got introduced a few hundred commits
before that which just did not take effect immediately. But it
helps you narrow in to what you should be looking at. Linear
history helps there. That's again why rebase is useful.

Interactive add ('git add -u -p', and 'git reset -p' before 'git
commit') lets you separate unrelated changes into individual
commits as they should be. Without tediously copying files around
or losing part of what you accumulated. Interactive rebase ('git
rebase -i') lets you create a proper patch series that others can
digest during review, and reason about why it's desirable to pick
up. Rebasing is useful in these iterations before submission, and
what ends up in the mainline project is clean and maintainable.

Showing other people you local changes could be done by pushing
to a public place. Or by the 'git format-patch -o $DIR $BASE..'
command. The counterpart is 'git am' (apply a set of changes).
Even if you don't do the "popularity contest" among several
mainlines(?) that others suggested, it's a useful feature to have
between involved developers. The result would be similar to
patches that currently are attached to emails, just easier to
review and test perhaps. There is 'git apply' if you have a patch
that is not git-am formatted.

'git diff' has a few options that can make it more pleasant to
view than mere diff(1) output. But that's cosmetics. Diffstats
are very useful to have once you get familiar with them.


This is not an attempt to cram git down your throat just because
I happen to like it after getting to know it. It's a plea to
consider whether the tool could help you in ways that you haven't
seen before, before assuming that "these people" don't know how
to create a proper tool because it's different from what you used
so far in your daily routine.

The reason why git is different is that no other tool matched the
Linux kernel development model with its thousands of mostly
disconnected participants. The tool still can be applied in a
centralized manner (many companies do), and can be more
convenient than other tools. I'm often using git to communicate
to a subversion server. Don't know if there is a similar bridge
for CVS (outside of repo migr

Re: Slightly off topic, question about git

2022-06-08 Thread Mouse
> That git thinks of the whole content of the tree, and that a filter
> is applied to narrow the result set when you specify dirs or files,
> was mentioned before.  Changes your perspective.

Also breaks git for certain uses, though.  Much as I like git, there
are places where I'd like to use it but effectively can't because of
its insistence on owning an entire directory per repo as work tree.
This makes it impossible to, for example, keep one repo for ~/.cshrc, a
different repo for ~/.gdbinit, and a third for ~/.procmailrc, without
creating a directory somewhere for each one and playing games with
links.  But that's very different from its designed-for use case, so
it's not too surprising.

> Bisection lets you quickly navigate to a commit of interest.

Very useful.  I agree.

> When HEAD works for you, and an older version is said to have an
> issue, then 'git bisect' lets you identify the commit which probably
> introduced the change in behaviour.

Minor note: the command line ("git bisect good", "git bisect bad") is
designed for the reverse, where the earlier commit works and the later
commit doesn't.  People are more often interested in when bugs were
introduced than in when they were fixed.

> Linear history helps there.  That's again why rebase is useful.

Linear history is not necessary, though; git bisect can deal with
branches-and-merges just fine, at the cost of slightly more tests.
(Personally, I don't like rebasing.)

> Interactive add ('git add -u -p', and 'git reset -p' before 'git
> commit') lets you separate unrelated changes into individual commits
> as they should be.

Based on the documentation I've seen, though, they're somewhat
crippled, in that they work at diff-hunk granularity, and tend to be
single-pass - I haven't used them myself, because they depend on perl,
so I have to depend on documentation.  I wrote a curses tool that,
while still pre-alpha quality, does philosophically similar things
(without needing perl).

That it was feasible for me to write it is, I would say, another point
in favour of git, albeit a relatively mild one for most use cases.

> And do not mistake github.com the service provider or its many
> unaware users as representative for git-scm.org the tool and concept.

Eh.  The tool is git.  git-scm.org is a domain name (colloquially used
to refer to a website), not the tool.

> Though I can see how github enables those who otherwise would never
> have shared, or used a VCS at all, I strongly disagree with many
> things that are done at this site.

Me too.  I gave up on them partially when I discovered they thought the
MUSTs in the ssh spec didn't apply to them and they thus couldn't
interoperate with my ssh implementation; I gave up on them the rest of
the way when they, ironically enough, stopped supporting git's own
protocol for access to (supposedly) publicly accessible repos.

/~\ The ASCII Mouse
\ / Ribbon Campaign
 X  Against HTMLmo...@rodents-montreal.org
/ \ Email!   7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B


Re: Slightly off topic, question about git

2022-06-08 Thread Gerhard Sittig
On Wed, 2022-06-08 at 06:45 -0400, Mouse wrote:
>
> > Interactive add ('git add -u -p', and 'git reset -p' before 'git
> > commit') lets you separate unrelated changes into individual commits
> > as they should be.
>
> Based on the documentation I've seen, though, they're somewhat
> crippled, in that they work at diff-hunk granularity, and tend to be
> single-pass - I haven't used them myself, because they depend on perl,
> so I have to depend on documentation.  I wrote a curses tool that,
> while still pre-alpha quality, does philosophically similar things
> (without needing perl).

Not crippled at all, quite the opposite. Interactive add
_defaults_ to diff hunks, and you can pick them or skip them from
the staging which constructs the next commit. But you can also
's' split hunks for finer selection, and 'e' edit the hunk to
create exactly what you would like to get staged. You can have
chunks revisited later in the selection after seeing other hunks.
Total control, your choice. Just see the legend near the prompt
(assuming the git subcommand is available to you, of course).

In the past I've seen people use some other UI than 'git add -p'
but cannot remember the name. These UIs may differ, can't tell
from first hand experience.

> > And do not mistake github.com the service provider or its many
> > unaware users as representative for git-scm.org the tool and concept.
>
> Eh.  The tool is git.  git-scm.org is a domain name (colloquially used
> to refer to a website), not the tool.

I'd assume that you got what I intended to express. The fact that
'github' has 'git' in its name leads unaware people to think
they'd be the same. They are not. That's what I tried to say.


virtually yours
Gerhard Sittig
--
 If you don't understand or are scared by any of the above
 ask your parents or an adult to help you.


Re: Slightly off topic, question about git

2022-06-08 Thread Reinoud Zandijk
On Mon, Jun 06, 2022 at 09:12:58AM -0400, Mouse wrote:
> > It sortof have to.  Since if you've done various work, and others
> > have done various work on the same files, and both have done commits,
> > it might not be possible to merge as is.
> 
> Yes, merging can require manual assistance.  git includes tools to make
> it easier to handle manual-assist merges; others exist as addons.
> 
> The need for them is one of the prices of the distributed model, just
> as needing to manually perform much the same operations before
> committing is a price of the centralized model.

My (little) experience with git is that merging ie applying patches from
others is a lot better in CVS! I never really have merge conflicts in CVS but
in the project I used git on it was horrible for no obvious reasons so I
switched to rebase since that just worked fine. But then I might have missed
some `magic' git incantation *grumbl*

Reinoud



Re: Slightly off topic, question about git

2022-06-08 Thread Taylor R Campbell
> Date: Wed, 8 Jun 2022 22:22:40 +0200
> From: Reinoud Zandijk 
> 
> My (little) experience with git is that merging ie applying patches
> from others is a lot better in CVS! I never really have merge
> conflicts in CVS but in the project I used git on it was horrible
> for no obvious reasons so I switched to rebase since that just
> worked fine. But then I might have missed some `magic' git
> incantation *grumbl*

If you have an actual scenario where applying patches appears to be
more difficult in git than in cvs, let me know and I can help with
that.

Note that a cvs merge (that is, cvs update -j -j) is more or less
squashing the branch and then rebasing it.  git merge, in contrast,
formally records the history with a pointer to both parents of the
merge -- this can be confusing when nearly every commit to a shared
repository that many people are working on is a merge commit with a
complicated branching DAG structure.

It is more likely that the branching DAG structure has led to
confusion -- and, perhaps, the same commit appearing with multiple
identities on different branches -- than that you have actual merge
conflicts.  The cvs merge algorithm is essentially a special case of
the git merge algorithm, restricted to a simpler history structure
than the general structures git supports; if cvs has no trouble with a
merge then git is unlikely to have trouble with it.

(hg has a better way to track the same commit appearing with multiple
identities on different branches over rebases, with hg-evolve
(devel/py-hg-evolve), so one tends not to run into this kind of thing
as much with hg.)


Re: Slightly off topic, question about git

2022-06-12 Thread David Holland
On Mon, Jun 06, 2022 at 02:40:29PM +0200, Gerhard Sittig wrote:
 > There is no problem with that I assume. From personal experience
 > I can tell that git takes some getting used to. But once you do
 > you don't want to go back. Seriously.

Every time I have to use the damn thing, I want to go back to hg.

(SCNR)

More seriously,

On Wed, Jun 08, 2022 at 06:45:07AM -0400, Mouse wrote:
 > > That git thinks of the whole content of the tree, and that a filter
 > > is applied to narrow the result set when you specify dirs or files,
 > > was mentioned before.  Changes your perspective.
 > 
 > Also breaks git for certain uses, though.  Much as I like git, there
 > are places where I'd like to use it but effectively can't because of
 > its insistence on owning an entire directory per repo as work tree.
 > This makes it impossible to, for example, keep one repo for ~/.cshrc, a
 > different repo for ~/.gdbinit, and a third for ~/.procmailrc, without
 > creating a directory somewhere for each one and playing games with
 > links.  But that's very different from its designed-for use case, so
 > it's not too surprising.

One of the things Someone(TM) should write is a modern replacement for
RCS.

Versioning single files is actually useful and RCS is... dated.

-- 
David A. Holland
dholl...@netbsd.org