Re: [RFC][GSoC] Project proposal: convert interactive rebase to C

2018-03-24 Thread Christian Couder
Hi,

On Thu, Mar 22, 2018 at 11:03 PM, Alban Gruin  wrote:
> Hi,
>
> here is my second draft of my proposal. As last time, any feedback is
> welcome :)
>
> I did not write my phone number and address here for obvious reasons,
> but they will be in the “about me” section of the final proposal.
>
> Apart from that, do you think there is something to add?

Please take a look at the comments that have been made on other's
proposal. Many proposals could be improved in the same way and it is a
bit annoying for us to repeat the same things many times.

[...]

> The goal of this project is to rewrite git-rebase--interactive in C
> as it has been discussed on the git mailing list[1], for multiple
> reasons :

In general when the project or some issues related to the project have
already been worked on or discussed on the mailing list, it is a good
thing to summarize those discussions and link to them in your
proposal. It shows that you want to take the time to gather existing
information, to understand that information and to take it into
account in your proposal, and it can also make your proposal easier to
read and understand.

More specifically your proposal has some links which is nice, but I
think it would be better if it summarized a bit more what the links
contain.

[...]

> Weeks 1 & 2 -- May 14, 2018 - May 27, 2018
> /From May 14 to 18, I have exams at the university, so I won’t be able
> to work full time./
>
> I would search for edge cases not covered by current tests and write
> some if needed.
>
> Week 3 -- May 28, 2018 - June 3, 2018
> At the same time, I would refactor --preserve-merges in its own
> shell script (as described in Dscho’s email[1]), if it has
> not been deprecated or moved in the meantime.

Here for example it is better if we could get a better idea about how
you plan to do it without having to read Dscho's email.

> This operation is not
> really tricky by itself, as --preserve-merges is about only 50 lines
> of code into git_rebase__interactive().
>
> Weeks 4 to 7 -- May 4, 2018 - July 1, 2018
> Then, I would start to incrementally rewrite
> git-rebase--interactive.sh functions in C, and move them
> git-rebase--helper.c (as in commits 0cce4a2756[2] (rebase -i
> -x: add exec commands via the rebase--helper) and b903674b35[3]
> (bisect--helper: `is_expected_rev` & `check_expected_revs` shell
> function in C)).

I know what you mean but I would still appreciate if you could summarize it.

> There is a lot of functions into git-rebase--interactive.sh to
> rewrite. Most of them are small, and some of them are even wrappers
> for a single command (eg. is_merge_commit()), so they shouldn’t be
> really problematic.
>
> A couple of them are quite long (eg. pick_one()), and will probably
> be even longer once rewritten in C due to the low-level nature of the
> language. They also tend to depend a lot on other smaller functions.
>
> The plan here would be to start rewriting the smaller functions when
> applicable (ie. they’re not a simple command wrapper) before
> working on the biggest of them.
>
> Week 8 -- July 2, 2018 - July 8, 2018
> When all majors functions from git-rebase--interactive.sh have been
> rewritten in C, I would retire the script in favor of a builtin.
>
> Weeks 9 & 10 -- July 9, 2018 - July 22, 2018
> I plan to spend theses two weeks to improve the code coverage where
> needed.
>
> Weeks 11 & 12 -- July 23, 2018 - August 5, 2018
> In the last two weeks, I would polish the code where needed, in order
> to improve its performance or to make it more readable.

We like to have big improvements be split into batches of patches,
also called patch series, and polishing of the first batches happening
as soon as possible so that they can be ready to be merged soon. The
patch series that are sent to the mailing list often need a number of
round of reviews and improvements which can take a long time, so it is
better if this process starts as soon as possible.

Thanks.


Re: [RFC][GSoC] Project proposal: convert interactive rebase to C

2018-03-22 Thread Alban Gruin
Hi,

here is my second draft of my proposal. As last time, any feedback is
welcome :)

I did not write my phone number and address here for obvious reasons,
but they will be in the “about me” section of the final proposal.

Apart from that, do you think there is something to add?

---
ABSTRACT

git is a modular source control management software, and all of its
subcommands are programs on their own. A lot of them are written in C,
but a couple of them are shell or Perl scripts. This is the case of
git-rebase--interactive (or interactive rebase), which is a shell
script. Rewriting it in C would improve its performance, its
portability, and maybe its robustness.


ABOUT `git-rebase` and `git-rebase--interactive`

git-rebase allows to re-apply changes on top of another branch. For
instance, when a local branch and a remote branch have diverged,
git-rebase can re-unify them, applying each change made on the
local branch on top of the remote branch.

git-rebase--interactive is used to reorganize commits by reordering,
rewording, or squashing them. To achieve this purpose, git opens the
list of commits to be modified in a text editor (hence the
interactivity), as well as the actions to be performed for each of
them.


PROJECT GOALS

The goal of this project is to rewrite git-rebase--interactive in C
as it has been discussed on the git mailing list[1], for multiple
reasons :


Performance improvements
Shell scripts are inherently slow. That’s because each command is a
program by itself. So, for each command, the shell interpreter has to
spawn a new process and to load a new program (with fork() and
exec() syscalls), which is an expensive process.

Those commands can be other git commands. Sometimes, they are
wrappers to call internal C functions (eg. git-rebase--helper),
something shell scripts can’t do natively. These wrappers basically
parse the parameters, then start the appropriate function, which is
obviously slower than just calling a function from C.

Other commands can be POSIX utilities (eg. sed, cut, etc.). They
have their own problems (speed aside), namely portability.

Portability improvements
Shell scripts often relies on many of those POSIX utilities, which are
not necessarily natively available on all platforms (most notably,
Windows), or may have more or less features depending on the
implementation.

Although C is not perfect portability-wise, it’s still better than
shell scripts. For instance, the resulting binaries will not
necessarily depend on third-party programs or libraries.


RISKS

Of course, rewriting a piece of software takes time, and can lead to
regressions (ie. new bugs). To mitigate that risk, I should understand
well the functions I want to rewrite, run tests on a regular basis and
write new if needed, and of course discuss about my code with the
community during reviews.


APPROXIMATIVE TIMELINE

Community bonding -- April 23, 2018 - May 14, 2018
During the community bonding, I would like to dive into git’s
codebase, and to understand what git-rebase--interactive does under
the hood. At the same time, I’d communicate with the community and my
mentor, seeking for clarifications, and asking questions about how
things should or should not be done.

Weeks 1 & 2 -- May 14, 2018 - May 27, 2018
/From May 14 to 18, I have exams at the university, so I won’t be able
to work full time./

I would search for edge cases not covered by current tests and write
some if needed.

Week 3 -- May 28, 2018 - June 3, 2018
At the same time, I would refactor --preserve-merges in its own
shell script (as described in Dscho’s email[1]), if it has
not been deprecated or moved in the meantime. This operation is not
really tricky by itself, as --preserve-merges is about only 50 lines
of code into git_rebase__interactive().

Weeks 4 to 7 -- May 4, 2018 - July 1, 2018
Then, I would start to incrementally rewrite
git-rebase--interactive.sh functions in C, and move them
git-rebase--helper.c (as in commits 0cce4a2756[2] (rebase -i
-x: add exec commands via the rebase--helper) and b903674b35[3]
(bisect--helper: `is_expected_rev` & `check_expected_revs` shell
function in C)).

There is a lot of functions into git-rebase--interactive.sh to
rewrite. Most of them are small, and some of them are even wrappers
for a single command (eg. is_merge_commit()), so they shouldn’t be
really problematic.

A couple of them are quite long (eg. pick_one()), and will probably
be even longer once rewritten in C due to the low-level nature of the
language. They also tend to depend a lot on other smaller functions.

The plan here would be to start rewriting the smaller functions when
applicable (ie. they’re not a simple command wrapper) before
working on the biggest of them.

Week 8 -- July 2, 2018 - July 8, 2018
When all majors functions from git-rebase--interactive.sh have been
rewritten in C, I would retire the script in favor of a builtin.

Weeks 9 & 10 -- July 9, 2018 - July 22, 2018
I plan to spend theses two weeks to improve the 

Re: [RFC][GSoC] Project proposal: convert interactive rebase to C

2018-03-21 Thread Johannes Schindelin
Hi Alban,

On Wed, 21 Mar 2018, Alban Gruin wrote:

> Le mardi 20 mars 2018 17:29:28 CET, vous avez écrit :
> 
> > Also, I have a hunch that there is actually almost nothing left to
> > rewrite after my sequencer improvements that made it into Git v2.13.0,
> > together with the upcoming changes (which are on top of the
> > --recreate-merges patch series, hence I did not send them to the
> > mailing list yet) in
> > https://github.com/dscho/git/commit/c261f17a4a3e
> 
> One year ago, you said[2] that converting this script "will fill up 3
> month, very easily". Is this not accurate anymore?

Let me read that mail ;-)

*goes and reads*

Well, I was talking about two different aspects to Ivan and to you. I
should have been clearer. So let me try again:

To convert `git-rebase--interactive.sh`, I think the most important part
is to factor out the preserve-merges code into its own script. After that,
there is little I can think of (apart from support for --root, which a
not-yet-contributed patch in my sequencer-shears branch on
https://github.com/dscho/git addresses) that still needs to be converted.
For somebody familiar with Git's source code, I would estimate one week
(and therefore 3 weeks would be a realistic estimate :-)).

Come to think of it, a better approach might be to leave the
preserve-merges stuff in, and teach `git-rebase.sh` to call the sequencer
directly for --interactive without --preserve-merges, then rename the
script to git-rebase--preserve.sh

The other aspect, the one I thought would take up to 3 months, easily, was
to convert the entirety of rebase -i into C. That would entail also the
option parsing, for which you would have to convert also git-rebase.sh
(and if you do not convert git-rebase--am.sh and git-rebase--merge.sh
first, you would then have to teach builtin/rebase.c to populate the
environment variables expected by those shell scripts while spawning
them).

I still think that the latter is too big a task for a single GSoC.

> I’ll send a new draft as soon as possible (hopefully this afternoon).

I look forward to reading it!

Ciao,
Johannes

Re: [RFC][GSoC] Project proposal: convert interactive rebase to C

2018-03-21 Thread Alban Gruin
Hi Johannes,

Le mardi 20 mars 2018 17:29:28 CET, vous avez écrit :
> > Weeks 1 & 2 — May 14, 2018 – May 28, 2018
> > First, I would refactor --preserve-merges in its own shell script, as
> > described in Dscho’s email.
> 
> Could you go into detail a bit here? Like, describe what parts of the
> git-rebase--interactive.sh script would need to be duplicated, which ones
> would be truly moved, etc
> 

It would lead to duplicate a good chunk of git_rebase__interactive(), 
apparently. The moved parts would be everything in `if test t = 
"$preserve_merges"; then …; fi` statements. That is, about 50 lines of shell 
code.

Judging by that, beginning by that is probably not the right thing to do. 
Also, somebody is already working on that[1]. 

> > Weeks 3 & 4 — May 18, 2018 – June 11, 2018
> > Then, I would start to rewrite git-rebase--interactive, and get rid of
> > git-
> > rebase--helper.
> 
> I think this is a bit premature, as the rebase--helper would not only
> serve the --interactive part, but in later conversions also --am and
> --merge, and only in the very end, when git-rebase.sh gets converted,
> would we be able to simply rename the rebase--helper to rebase.
> 

Yes, Christian Couder told me that it would not be a good methodology too.

> Also, I have a hunch that there is actually almost nothing left to rewrite
> after my sequencer improvements that made it into Git v2.13.0, together
> with the upcoming changes (which are on top of the --recreate-merges patch
> series, hence I did not send them to the mailing list yet) in
> https://github.com/dscho/git/commit/c261f17a4a3e

One year ago, you said[2] that converting this script "will fill up 3 month, 
very easily". Is this not accurate anymore?

> 
> So I would like to see more details here... ;-)

Yep, I’m working on that. 

> > Weeks 5 to 9 — June 11, 2018 – July 15, 2018
> > During this period, I would continue to rewrite git-rebase--interactive.
> 
> It would be good if the proposal said what parts of the conversion are
> tricky, to merit spending a month on them.
> 
> > Weeks 10 & 11 — July 16, 2018 – July 29, 2018
> > In the second half of July, I would look for bugs in the new code, test
> > it,
> > and improve its coverage.
> 
> As I mentioned in a related mail, the test suite coverage would be most
> sensibly extended *before* starting to rewrite code in C, as it helps
> catching bugs early and avoids having to merge buggy code that needs to be
> fixed immediately.

Makes sense.

> 
> > Weeks 12 — July 30, 2018 – August 5, 2018
> > In the last week, I would polish the code where needed, in order to
> > improve for performance or to make the code more readable.
> 
> Thank you for sharing this draft with us!
> Johannes

I’ll send a new draft as soon as possible (hopefully this afternoon).

Thank you for your enthousiasm :)
Alban

[1] https://public-inbox.org/git/20180320204507.12623-1-w...@saville.com/
[2] https://public-inbox.org/git/alpine.DEB.
2.20.1703231827060.3767@virtualbox/




Re: [RFC][GSoC] Project proposal: convert interactive rebase to C

2018-03-20 Thread Johannes Schindelin
Hi Alban,

thank you for your proposal!

I will only comment on the parts that I feel could use improvement, the
rest is fine by me.

On Sat, 17 Mar 2018, Alban Gruin wrote:

> APPROXIMATIVE TIMELINE
> 
> Community bonding — April 23, 2018 – May 14, 2018
> During the community bonding, I would like to dive into git’s codebase,
> and to understand what git-rebase--interactive does under the hood. At
> the same time, I’d communicate with the community and my mentor, seeking
> for clarifications, and asking questions about how things should or
> should not be done.
> 
> Weeks 1 & 2 — May 14, 2018 – May 28, 2018
> First, I would refactor --preserve-merges in its own shell script, as
> described in Dscho’s email.

Could you go into detail a bit here? Like, describe what parts of the
git-rebase--interactive.sh script would need to be duplicated, which ones
would be truly moved, etc

> Weeks 3 & 4 — May 18, 2018 – June 11, 2018
> Then, I would start to rewrite git-rebase--interactive, and get rid of git-
> rebase--helper.

I think this is a bit premature, as the rebase--helper would not only
serve the --interactive part, but in later conversions also --am and
--merge, and only in the very end, when git-rebase.sh gets converted,
would we be able to simply rename the rebase--helper to rebase.

Also, I have a hunch that there is actually almost nothing left to rewrite
after my sequencer improvements that made it into Git v2.13.0, together
with the upcoming changes (which are on top of the --recreate-merges patch
series, hence I did not send them to the mailing list yet) in
https://github.com/dscho/git/commit/c261f17a4a3e

So I would like to see more details here... ;-)

> Weeks 5 to 9 — June 11, 2018 – July 15, 2018
> During this period, I would continue to rewrite git-rebase--interactive.

It would be good if the proposal said what parts of the conversion are
tricky, to merit spending a month on them.

> Weeks 10 & 11 — July 16, 2018 – July 29, 2018
> In the second half of July, I would look for bugs in the new code, test it, 
> and improve its coverage.

As I mentioned in a related mail, the test suite coverage would be most
sensibly extended *before* starting to rewrite code in C, as it helps
catching bugs early and avoids having to merge buggy code that needs to be
fixed immediately.

> Weeks 12 — July 30, 2018 – August 5, 2018
> In the last week, I would polish the code where needed, in order to
> improve for performance or to make the code more readable.

Thank you for sharing this draft with us!
Johannes

Re: [RFC][GSoC] Project proposal: convert interactive rebase to C

2018-03-17 Thread Christian Couder
Hi,

On Sat, Mar 17, 2018 at 8:14 PM, Alban Gruin  wrote:
>
> Weeks 3 & 4 — May 18, 2018 – June 11, 2018
> Then, I would start to rewrite git-rebase--interactive, and get rid of git-
> rebase--helper.

Usually to rewrite a shell script in C, we first rewrite shell
functions into option arguments in a C builtin helper and make the
schell script call the builtin helper (instead of the original shell
functions). Eventually when the shell script is mostly only calling
the builtin helper, we add what is needed into the builtin helper and
we rename it to make it fully replace the shell script.

See for example 0cce4a2756 (rebase -i -x: add exec commands via the
rebase--helper, 2017-12-05) or b903674b35 (bisect--helper:
`is_expected_rev` & `check_expected_revs` shell function in C,
2017-09-29). These examples show that we can do step by step rewrites.

I would suggest planning to use the same approach, and describing in
your proposal which shell functions you would like to rewrite into the
C builtin helper in which order, before planning to fully replace the
current git-rebase--interactive.

Thanks,
Christian.