Re: [RFC][GSoC] Project proposal: convert interactive rebase to C
Hi, On Thu, Mar 22, 2018 at 11:03 PM, Alban Gruin wrote: > Hi, > > here is my second draft of my proposal. As last time, any feedback is > welcome :) > > I did not write my phone number and address here for obvious reasons, > but they will be in the “about me” section of the final proposal. > > Apart from that, do you think there is something to add? Please take a look at the comments that have been made on other's proposal. Many proposals could be improved in the same way and it is a bit annoying for us to repeat the same things many times. [...] > The goal of this project is to rewrite git-rebase--interactive in C > as it has been discussed on the git mailing list[1], for multiple > reasons : In general when the project or some issues related to the project have already been worked on or discussed on the mailing list, it is a good thing to summarize those discussions and link to them in your proposal. It shows that you want to take the time to gather existing information, to understand that information and to take it into account in your proposal, and it can also make your proposal easier to read and understand. More specifically your proposal has some links which is nice, but I think it would be better if it summarized a bit more what the links contain. [...] > Weeks 1 & 2 -- May 14, 2018 - May 27, 2018 > /From May 14 to 18, I have exams at the university, so I won’t be able > to work full time./ > > I would search for edge cases not covered by current tests and write > some if needed. > > Week 3 -- May 28, 2018 - June 3, 2018 > At the same time, I would refactor --preserve-merges in its own > shell script (as described in Dscho’s email[1]), if it has > not been deprecated or moved in the meantime. Here for example it is better if we could get a better idea about how you plan to do it without having to read Dscho's email. > This operation is not > really tricky by itself, as --preserve-merges is about only 50 lines > of code into git_rebase__interactive(). > > Weeks 4 to 7 -- May 4, 2018 - July 1, 2018 > Then, I would start to incrementally rewrite > git-rebase--interactive.sh functions in C, and move them > git-rebase--helper.c (as in commits 0cce4a2756[2] (rebase -i > -x: add exec commands via the rebase--helper) and b903674b35[3] > (bisect--helper: `is_expected_rev` & `check_expected_revs` shell > function in C)). I know what you mean but I would still appreciate if you could summarize it. > There is a lot of functions into git-rebase--interactive.sh to > rewrite. Most of them are small, and some of them are even wrappers > for a single command (eg. is_merge_commit()), so they shouldn’t be > really problematic. > > A couple of them are quite long (eg. pick_one()), and will probably > be even longer once rewritten in C due to the low-level nature of the > language. They also tend to depend a lot on other smaller functions. > > The plan here would be to start rewriting the smaller functions when > applicable (ie. they’re not a simple command wrapper) before > working on the biggest of them. > > Week 8 -- July 2, 2018 - July 8, 2018 > When all majors functions from git-rebase--interactive.sh have been > rewritten in C, I would retire the script in favor of a builtin. > > Weeks 9 & 10 -- July 9, 2018 - July 22, 2018 > I plan to spend theses two weeks to improve the code coverage where > needed. > > Weeks 11 & 12 -- July 23, 2018 - August 5, 2018 > In the last two weeks, I would polish the code where needed, in order > to improve its performance or to make it more readable. We like to have big improvements be split into batches of patches, also called patch series, and polishing of the first batches happening as soon as possible so that they can be ready to be merged soon. The patch series that are sent to the mailing list often need a number of round of reviews and improvements which can take a long time, so it is better if this process starts as soon as possible. Thanks.
Re: [RFC][GSoC] Project proposal: convert interactive rebase to C
Hi, here is my second draft of my proposal. As last time, any feedback is welcome :) I did not write my phone number and address here for obvious reasons, but they will be in the “about me” section of the final proposal. Apart from that, do you think there is something to add? --- ABSTRACT git is a modular source control management software, and all of its subcommands are programs on their own. A lot of them are written in C, but a couple of them are shell or Perl scripts. This is the case of git-rebase--interactive (or interactive rebase), which is a shell script. Rewriting it in C would improve its performance, its portability, and maybe its robustness. ABOUT `git-rebase` and `git-rebase--interactive` git-rebase allows to re-apply changes on top of another branch. For instance, when a local branch and a remote branch have diverged, git-rebase can re-unify them, applying each change made on the local branch on top of the remote branch. git-rebase--interactive is used to reorganize commits by reordering, rewording, or squashing them. To achieve this purpose, git opens the list of commits to be modified in a text editor (hence the interactivity), as well as the actions to be performed for each of them. PROJECT GOALS The goal of this project is to rewrite git-rebase--interactive in C as it has been discussed on the git mailing list[1], for multiple reasons : Performance improvements Shell scripts are inherently slow. That’s because each command is a program by itself. So, for each command, the shell interpreter has to spawn a new process and to load a new program (with fork() and exec() syscalls), which is an expensive process. Those commands can be other git commands. Sometimes, they are wrappers to call internal C functions (eg. git-rebase--helper), something shell scripts can’t do natively. These wrappers basically parse the parameters, then start the appropriate function, which is obviously slower than just calling a function from C. Other commands can be POSIX utilities (eg. sed, cut, etc.). They have their own problems (speed aside), namely portability. Portability improvements Shell scripts often relies on many of those POSIX utilities, which are not necessarily natively available on all platforms (most notably, Windows), or may have more or less features depending on the implementation. Although C is not perfect portability-wise, it’s still better than shell scripts. For instance, the resulting binaries will not necessarily depend on third-party programs or libraries. RISKS Of course, rewriting a piece of software takes time, and can lead to regressions (ie. new bugs). To mitigate that risk, I should understand well the functions I want to rewrite, run tests on a regular basis and write new if needed, and of course discuss about my code with the community during reviews. APPROXIMATIVE TIMELINE Community bonding -- April 23, 2018 - May 14, 2018 During the community bonding, I would like to dive into git’s codebase, and to understand what git-rebase--interactive does under the hood. At the same time, I’d communicate with the community and my mentor, seeking for clarifications, and asking questions about how things should or should not be done. Weeks 1 & 2 -- May 14, 2018 - May 27, 2018 /From May 14 to 18, I have exams at the university, so I won’t be able to work full time./ I would search for edge cases not covered by current tests and write some if needed. Week 3 -- May 28, 2018 - June 3, 2018 At the same time, I would refactor --preserve-merges in its own shell script (as described in Dscho’s email[1]), if it has not been deprecated or moved in the meantime. This operation is not really tricky by itself, as --preserve-merges is about only 50 lines of code into git_rebase__interactive(). Weeks 4 to 7 -- May 4, 2018 - July 1, 2018 Then, I would start to incrementally rewrite git-rebase--interactive.sh functions in C, and move them git-rebase--helper.c (as in commits 0cce4a2756[2] (rebase -i -x: add exec commands via the rebase--helper) and b903674b35[3] (bisect--helper: `is_expected_rev` & `check_expected_revs` shell function in C)). There is a lot of functions into git-rebase--interactive.sh to rewrite. Most of them are small, and some of them are even wrappers for a single command (eg. is_merge_commit()), so they shouldn’t be really problematic. A couple of them are quite long (eg. pick_one()), and will probably be even longer once rewritten in C due to the low-level nature of the language. They also tend to depend a lot on other smaller functions. The plan here would be to start rewriting the smaller functions when applicable (ie. they’re not a simple command wrapper) before working on the biggest of them. Week 8 -- July 2, 2018 - July 8, 2018 When all majors functions from git-rebase--interactive.sh have been rewritten in C, I would retire the script in favor of a builtin. Weeks 9 & 10 -- July 9, 2018 - July 22, 2018 I plan to spend theses two weeks to improve the
Re: [RFC][GSoC] Project proposal: convert interactive rebase to C
Hi Alban, On Wed, 21 Mar 2018, Alban Gruin wrote: > Le mardi 20 mars 2018 17:29:28 CET, vous avez écrit : > > > Also, I have a hunch that there is actually almost nothing left to > > rewrite after my sequencer improvements that made it into Git v2.13.0, > > together with the upcoming changes (which are on top of the > > --recreate-merges patch series, hence I did not send them to the > > mailing list yet) in > > https://github.com/dscho/git/commit/c261f17a4a3e > > One year ago, you said[2] that converting this script "will fill up 3 > month, very easily". Is this not accurate anymore? Let me read that mail ;-) *goes and reads* Well, I was talking about two different aspects to Ivan and to you. I should have been clearer. So let me try again: To convert `git-rebase--interactive.sh`, I think the most important part is to factor out the preserve-merges code into its own script. After that, there is little I can think of (apart from support for --root, which a not-yet-contributed patch in my sequencer-shears branch on https://github.com/dscho/git addresses) that still needs to be converted. For somebody familiar with Git's source code, I would estimate one week (and therefore 3 weeks would be a realistic estimate :-)). Come to think of it, a better approach might be to leave the preserve-merges stuff in, and teach `git-rebase.sh` to call the sequencer directly for --interactive without --preserve-merges, then rename the script to git-rebase--preserve.sh The other aspect, the one I thought would take up to 3 months, easily, was to convert the entirety of rebase -i into C. That would entail also the option parsing, for which you would have to convert also git-rebase.sh (and if you do not convert git-rebase--am.sh and git-rebase--merge.sh first, you would then have to teach builtin/rebase.c to populate the environment variables expected by those shell scripts while spawning them). I still think that the latter is too big a task for a single GSoC. > I’ll send a new draft as soon as possible (hopefully this afternoon). I look forward to reading it! Ciao, Johannes
Re: [RFC][GSoC] Project proposal: convert interactive rebase to C
Hi Johannes, Le mardi 20 mars 2018 17:29:28 CET, vous avez écrit : > > Weeks 1 & 2 — May 14, 2018 – May 28, 2018 > > First, I would refactor --preserve-merges in its own shell script, as > > described in Dscho’s email. > > Could you go into detail a bit here? Like, describe what parts of the > git-rebase--interactive.sh script would need to be duplicated, which ones > would be truly moved, etc > It would lead to duplicate a good chunk of git_rebase__interactive(), apparently. The moved parts would be everything in `if test t = "$preserve_merges"; then …; fi` statements. That is, about 50 lines of shell code. Judging by that, beginning by that is probably not the right thing to do. Also, somebody is already working on that[1]. > > Weeks 3 & 4 — May 18, 2018 – June 11, 2018 > > Then, I would start to rewrite git-rebase--interactive, and get rid of > > git- > > rebase--helper. > > I think this is a bit premature, as the rebase--helper would not only > serve the --interactive part, but in later conversions also --am and > --merge, and only in the very end, when git-rebase.sh gets converted, > would we be able to simply rename the rebase--helper to rebase. > Yes, Christian Couder told me that it would not be a good methodology too. > Also, I have a hunch that there is actually almost nothing left to rewrite > after my sequencer improvements that made it into Git v2.13.0, together > with the upcoming changes (which are on top of the --recreate-merges patch > series, hence I did not send them to the mailing list yet) in > https://github.com/dscho/git/commit/c261f17a4a3e One year ago, you said[2] that converting this script "will fill up 3 month, very easily". Is this not accurate anymore? > > So I would like to see more details here... ;-) Yep, I’m working on that. > > Weeks 5 to 9 — June 11, 2018 – July 15, 2018 > > During this period, I would continue to rewrite git-rebase--interactive. > > It would be good if the proposal said what parts of the conversion are > tricky, to merit spending a month on them. > > > Weeks 10 & 11 — July 16, 2018 – July 29, 2018 > > In the second half of July, I would look for bugs in the new code, test > > it, > > and improve its coverage. > > As I mentioned in a related mail, the test suite coverage would be most > sensibly extended *before* starting to rewrite code in C, as it helps > catching bugs early and avoids having to merge buggy code that needs to be > fixed immediately. Makes sense. > > > Weeks 12 — July 30, 2018 – August 5, 2018 > > In the last week, I would polish the code where needed, in order to > > improve for performance or to make the code more readable. > > Thank you for sharing this draft with us! > Johannes I’ll send a new draft as soon as possible (hopefully this afternoon). Thank you for your enthousiasm :) Alban [1] https://public-inbox.org/git/20180320204507.12623-1-w...@saville.com/ [2] https://public-inbox.org/git/alpine.DEB. 2.20.1703231827060.3767@virtualbox/
Re: [RFC][GSoC] Project proposal: convert interactive rebase to C
Hi Alban, thank you for your proposal! I will only comment on the parts that I feel could use improvement, the rest is fine by me. On Sat, 17 Mar 2018, Alban Gruin wrote: > APPROXIMATIVE TIMELINE > > Community bonding — April 23, 2018 – May 14, 2018 > During the community bonding, I would like to dive into git’s codebase, > and to understand what git-rebase--interactive does under the hood. At > the same time, I’d communicate with the community and my mentor, seeking > for clarifications, and asking questions about how things should or > should not be done. > > Weeks 1 & 2 — May 14, 2018 – May 28, 2018 > First, I would refactor --preserve-merges in its own shell script, as > described in Dscho’s email. Could you go into detail a bit here? Like, describe what parts of the git-rebase--interactive.sh script would need to be duplicated, which ones would be truly moved, etc > Weeks 3 & 4 — May 18, 2018 – June 11, 2018 > Then, I would start to rewrite git-rebase--interactive, and get rid of git- > rebase--helper. I think this is a bit premature, as the rebase--helper would not only serve the --interactive part, but in later conversions also --am and --merge, and only in the very end, when git-rebase.sh gets converted, would we be able to simply rename the rebase--helper to rebase. Also, I have a hunch that there is actually almost nothing left to rewrite after my sequencer improvements that made it into Git v2.13.0, together with the upcoming changes (which are on top of the --recreate-merges patch series, hence I did not send them to the mailing list yet) in https://github.com/dscho/git/commit/c261f17a4a3e So I would like to see more details here... ;-) > Weeks 5 to 9 — June 11, 2018 – July 15, 2018 > During this period, I would continue to rewrite git-rebase--interactive. It would be good if the proposal said what parts of the conversion are tricky, to merit spending a month on them. > Weeks 10 & 11 — July 16, 2018 – July 29, 2018 > In the second half of July, I would look for bugs in the new code, test it, > and improve its coverage. As I mentioned in a related mail, the test suite coverage would be most sensibly extended *before* starting to rewrite code in C, as it helps catching bugs early and avoids having to merge buggy code that needs to be fixed immediately. > Weeks 12 — July 30, 2018 – August 5, 2018 > In the last week, I would polish the code where needed, in order to > improve for performance or to make the code more readable. Thank you for sharing this draft with us! Johannes
Re: [RFC][GSoC] Project proposal: convert interactive rebase to C
Hi, On Sat, Mar 17, 2018 at 8:14 PM, Alban Gruin wrote: > > Weeks 3 & 4 — May 18, 2018 – June 11, 2018 > Then, I would start to rewrite git-rebase--interactive, and get rid of git- > rebase--helper. Usually to rewrite a shell script in C, we first rewrite shell functions into option arguments in a C builtin helper and make the schell script call the builtin helper (instead of the original shell functions). Eventually when the shell script is mostly only calling the builtin helper, we add what is needed into the builtin helper and we rename it to make it fully replace the shell script. See for example 0cce4a2756 (rebase -i -x: add exec commands via the rebase--helper, 2017-12-05) or b903674b35 (bisect--helper: `is_expected_rev` & `check_expected_revs` shell function in C, 2017-09-29). These examples show that we can do step by step rewrites. I would suggest planning to use the same approach, and describing in your proposal which shell functions you would like to rewrite into the C builtin helper in which order, before planning to fully replace the current git-rebase--interactive. Thanks, Christian.
[RFC][GSoC] Project proposal: convert interactive rebase to C
Hi, here is my first draft of my proposal for the GSoC, about the "convert interactive rebase to C" project. Any feedback is welcome :) --- ABSTRACT git is a modular source control management software, and all of its subcommands are programs on their own. A lot of them are written in C, but a couple of them are shell or Perl scripts. This is the case of git-rebase-- interactive (or interactive rebase), which is a shell script. Rewriting it in C would improve its performance, its portability, and maybe its robustness. ABOUT `git-rebase{,--interactive}` git-rebase allows to re-apply changes on top of another branch. For instance, when a local branch and a remote branch have diverged, git-rebase can re-unify them, applying each change made on the local branch on top of the remote branch. git-rebase--interactive is used to reorganize commits by reordering, rewording, or squashing them. To achieve this purpose, git opens the list of commits to be modified in a text editor (hence the interactivity), as well as the actions to be performed for each of them. PROJECT GOALS The goal of this project is to rewrite git-rebase--interactive in C as it has been discussed on the git mailing list[1], for multiple reasons : Performance improvements Shell scripts are inherently slow. That’s because each command is a program by itself. So, for each command, the shell interpreter has to spawn a new process and to load a new program. Those commands can be other git commands. Sometimes, they are wrappers to call internal C functions (eg. git-rebase--helper), something shell scripts can’t do natively. These wrappers basically parse the parameters, then start the appropriate function, which is obviously slower than just calling a function from C. Other commands can be POSIX utilities (eg. sed, cut, etc.). They have their own problems (speed aside), namely portability. Portability improvements Shell scripts often relies on many of those POSIX utilities, which are not necessarily natively available on all platforms (most notably, Windows), or may have more or less features depending on the implementation. APPROXIMATIVE TIMELINE Community bonding — April 23, 2018 – May 14, 2018 During the community bonding, I would like to dive into git’s codebase, and to understand what git-rebase--interactive does under the hood. At the same time, I’d communicate with the community and my mentor, seeking for clarifications, and asking questions about how things should or should not be done. Weeks 1 & 2 — May 14, 2018 – May 28, 2018 First, I would refactor --preserve-merges in its own shell script, as described in Dscho’s email. Weeks 3 & 4 — May 18, 2018 – June 11, 2018 Then, I would start to rewrite git-rebase--interactive, and get rid of git- rebase--helper. Weeks 5 to 9 — June 11, 2018 – July 15, 2018 During this period, I would continue to rewrite git-rebase--interactive. Weeks 10 & 11 — July 16, 2018 – July 29, 2018 In the second half of July, I would look for bugs in the new code, test it, and improve its coverage. Weeks 12 — July 30, 2018 – August 5, 2018 In the last week, I would polish the code where needed, in order to improve for performance or to make the code more readable. ABOUT ME My name is Alban Gruin, I am an undergraduate at the Paul Sabatier University in Toulouse, France, where I have been studying Computer Sciences for the past year and a half. My timezone currently is UTC+01:00, but will be UTC+02:00 starting from March 25th, because of the daylight saving time in Europe. I have been programming in C for the last 5 years. I learned using freely available resources online, and by attending class ever since last year. I am also quite familiar with shell scripts, and I have been using git for the last 3 years. My e-mail address is alban gruin gmail com. My IRC nick is abngrn. My micro-project was "userdiff: add built-in pattern for golang"[2][3]. --- You can find the Google Doc version here[4]. Regards, Alban Gruin [1] https://public-inbox.org/git/alpine.DEB.2.20.1609021432070.129229@virtualbox/ [2] https://public-inbox.org/git/20180228172906.30582-1-alban.gr...@gmail.com/ [3] https://git.kernel.org/pub/scm/git/git.git/commit/?id=1dbf0c0a [4] https://docs.google.com/document/d/1Jx0w867tVAht7QI1_prieiXg_iQ_nTloOyaIIOnm85g/edit?usp=sharing