Re: [RFC v3][GSoC] Project proposal: convert interactive rebase to C

2018-03-24 Thread Alban Gruin
Hi,
this is the first draft of my proposal.

---
ABSTRACT

git is a modular source control management software, and all of its
subcommands are programs on their own. A lot of them are written in C,
but a couple of them are shell or Perl scripts. This is the case of
=git-rebase--interactive= (or interactive rebase), which is a shell
script. Rewriting it in C would improve its performance, its
portability, and maybe its robustness.


ABOUT `git-rebase` AND `git-rebase--interactive`

git-rebase allows to re-apply changes on top of another branch. For
instance, when a local branch and a remote branch have diverged,
git-rebase can re-unify them, applying each change made on the
local branch on top of the remote branch.

git-rebase--interactive is used to reorganize commits by reordering,
rewording, or squashing them. To achieve this purpose, =git= opens the
list of commits to be modified in a text editor (hence the
interactivity), as well as the actions to be performed for each of
them.


PROJECT GOALS

Back in 2016, Johannes Schindelin discussed[1] about retiring
git-rebase.sh (described here as a “hacky shell script”) in favor of
a builtin. He explained that, as it’s only a command-line parser for
git-rebase--am.sh, git-rebase--interactive.sh, and
git-rebase--merge.sh, these 3 scripts should be rewritten first.

The goal of this project is to rewrite git-rebase--interactive.sh in
C, for multiple reasons :

Performance improvements
Shell scripts are inherently slow. When Johannes Schindelin began to
rewrite some parts of git-rebase--interactive in C, its performance
increased from 3 to 5 times, depending on the platform[2].

That’s because each command is a program by itself. So, for each
command, the shell interpreter has to spawn a new process and to load
a new program (with fork() and exec() syscalls), which is an
expensive process.

Those commands can be other git commands. Sometimes, they are
wrappers to call internal C functions (eg. git-rebase--helper),
something shell scripts can’t do natively. These wrappers basically
parse the parameters, then start the appropriate function, which is
obviously slower than just calling a function from C.

Other commands can be POSIX utilities (eg. sed, cut, etc.). They
have their own problems (speed aside), namely portability.

Portability improvements
Shell scripts often relies on many of those POSIX utilities, which are
not necessarily natively available on all platforms (most notably,
Windows), or may have more or less features depending on the
implementation.

Although C is not perfect portability-wise, it’s still better than
shell scripts. For instance, the resulting binaries will not
necessarily depend on third-party programs or libraries.


RISKS

Of course, rewriting a piece of software takes time, and can lead to
regressions (ie. new bugs). To mitigate that risk, I should understand
well the functions I want to rewrite, run tests on a regular basis and
write new if needed, and of course discuss about my code with the
community during reviews.


APPROXIMATIVE TIMELINE

Normally, I would be able to work 35 to 40 hours a week. When I have
courses or exams at university, I could work between 20 and 25 hours a
week.

Community bonding --- April 23, 2018 -- May 14, 2018
/I’ll still have courses at the university during this period./

During the community bonding, I would like to dive into git’s
codebase, and to understand what git-rebase--interactive does under
the hood. At the same time, I’d communicate with the community and my
mentor, seeking for clarifications, and asking questions about how
things should or should not be done.

Weeks 1 & 2 --- May 14, 2018 -- May 27, 2018
/From May 14 to 18, I have exams at the university, so I won’t be able
to work full time./

I would search for edge cases not covered by current tests and write
some if needed.

Week 3 --- May 28, 2018 -- June 3, 2018
At the same time, I would refactor --preserve-merges in its own
shell script, if it has not been deprecated or moved in the
meantime. Dscho explained that this would be the first step of the
conversion[1]. This operation is not really tricky by itself,
as --preserve-merges is about only 50 lines of code into
git_rebase__interactive(), plus some specific functions
(eg. pick_one()).

Weeks 4 to 7 --- June 4, 2018 -- July 1, 2018
Then, I would start to incrementally rewrite
git-rebase--interactive.sh functions in C, and move them
git-rebase--helper.c. Newly-rewritten C functions are then
associated to command-line parameters to be able to use them from
shell scripts.

Examples of such conversion can be found in commits
0cce4a2756[3] (rebase -i -x: add exec commands via the
rebase--helper) and b903674b35[4] (bisect--helper:
`is_expected_rev` & `check_expected_revs` shell function in C).

There is a lot of functions into git-rebase--interactive.sh to
rewrite. Most of them are small, and some of them are even wrappers
for a single command (eg. do_next()), so they shouldn’t be really

Re: [RFC] [GSoC] Project proposal: convert scripts to builtins

2018-03-24 Thread Christian Couder
Hi,

On Wed, Mar 21, 2018 at 7:16 AM, Pratik Karki  wrote:
>
> Thanks for the feedback. Thanks to you, I realized my proposal was
> a bit ambitious. Both git-stash and git-rebase are big
> commitment. After much analyzing, I found out I cannot complete
> both in the given time frame. So, I decided to stick to one and
> complete it.

Great.

[...]

> There has been some development in `git-stash` as seen on
> []
> (https://public-inbox.org/git/20171110231314.30711-1-j...@teichroeb.net/).
> To maximize the productivity, the findings from the patch submitted can
> be used. Since, there are already much discussions regarding the
> rewrite.

In general it would be nice if you summarized what has already been
done, how you can reuse it and what is needed to complete it.

I see that you talk about some of that below, but a more general
overview might be nice too.

It could be interesting also to put the author(s) of the work that you
will reuse in Cc.

[...]

> Timeline and Development Cycle
> --
>
> -   Apr 23: Accepted student proposals announced.
>
> -   Apr 23 onwards: Researching of all the test suites. Discussion of
> possible test improvements in for `git-stash`.
>
> Firstly, the test suite coverage of every command will be reviewed
> using gcov and kcov.

I don't think it is necessary to spend a lot of time on the test suite coverage.

> The test suite might not be perfect or
> comprehensive but must cover all the major code paths and
> command-line switches of the script. For the tests which seem
> inadequate, minimum required tests are written and developed
> incrementally. The minimum tests must provide safety net for
> migration of scripts to built-ins. The tests would be sent as a
> separate patch for parallel development and review process so that
> development of built-ins can happen at the same time productively.

Nice.

> The tests will be written for every code changes and will be worked
> throughout the summer.
>
> -   May 1: Rewriting skeleton begins.
>
> The shell scripts are translated on a line-by-line basis into C
> code. The C code will be written in a way to maximize the use of git
> internal API. In git-stash `parse-options` API can be used for
> implementing parsing argument of command-line. This would be way
> better than parsing via the scripts. Firstly, I will start
> implementing `stash --helper`from respective scripts to C code. Then
> increment it further more. Then I'll start converting git-stash.sh
> on a line-by-line basis.

Not sure what you mean by line-by-line basis.

>  Again for git-stash some work seem to be done
> 
> []
> (https://public-inbox.org/git/20171110231314.30711-1-j...@teichroeb.net/).
> Now, to maximize the output I'll be taking findings from the
> previous patch and use it for my patch. As seen from the comments in
> the patch some tests for checking branch when `git stash branch`
> fails needs to be written.

Nice. Maybe writing those tests can come earlier in you schedule.

> New tests will be written and code
> coverage tools will be used for the written code.

Not sure that code coverage tools need to be used.

> -   May 13: Making minimal `builtin/stash.c` with `stash--helper` ready
> for review process. (This goes on for some time.)
>
> The initial review of minimal builtin would be ready for git-stash.
> The result C code at this stage may not be necessarily be efficient
> but would be free from obvious bugs and can serve as a baseline for
> the final patch. This is sent for review process which can take some
> time. The code will ofcourse be tested using the test suite with
> some additional tests.

How does that relates with the existing work? Will this be one or
several patch series? What will each patch do?

[...]

> -   June 10 - Jul 20: Start optimizing `builtin/stash.c`. Benchmarking
> and profiling is done. They are exclusively compared to their
> original shell scripts, to check whether they are more performant or
> not and the results are published in the mailing list for further
> discussion.

Will the performance tests be added to the t/perf tests?

> The C code will be optimized for speed and efficiency in this stage. The
> built-ins will now be profiled using the new efficient test suites to
> find hot spots. Bench-marking is also done in comparison to original
> scripts.The performance for stash can be measured by making it stash
> large number of changes in another working directory and measuring the
> time for completion of the task. After finding out, a graphical
> representation of performance findings will be published to git mailing
> list and discussions will commence on more 

Re: [RFC][GSoC] Project proposal: convert interactive rebase to C

2018-03-24 Thread Christian Couder
Hi,

On Thu, Mar 22, 2018 at 11:03 PM, Alban Gruin  wrote:
> Hi,
>
> here is my second draft of my proposal. As last time, any feedback is
> welcome :)
>
> I did not write my phone number and address here for obvious reasons,
> but they will be in the “about me” section of the final proposal.
>
> Apart from that, do you think there is something to add?

Please take a look at the comments that have been made on other's
proposal. Many proposals could be improved in the same way and it is a
bit annoying for us to repeat the same things many times.

[...]

> The goal of this project is to rewrite git-rebase--interactive in C
> as it has been discussed on the git mailing list[1], for multiple
> reasons :

In general when the project or some issues related to the project have
already been worked on or discussed on the mailing list, it is a good
thing to summarize those discussions and link to them in your
proposal. It shows that you want to take the time to gather existing
information, to understand that information and to take it into
account in your proposal, and it can also make your proposal easier to
read and understand.

More specifically your proposal has some links which is nice, but I
think it would be better if it summarized a bit more what the links
contain.

[...]

> Weeks 1 & 2 -- May 14, 2018 - May 27, 2018
> /From May 14 to 18, I have exams at the university, so I won’t be able
> to work full time./
>
> I would search for edge cases not covered by current tests and write
> some if needed.
>
> Week 3 -- May 28, 2018 - June 3, 2018
> At the same time, I would refactor --preserve-merges in its own
> shell script (as described in Dscho’s email[1]), if it has
> not been deprecated or moved in the meantime.

Here for example it is better if we could get a better idea about how
you plan to do it without having to read Dscho's email.

> This operation is not
> really tricky by itself, as --preserve-merges is about only 50 lines
> of code into git_rebase__interactive().
>
> Weeks 4 to 7 -- May 4, 2018 - July 1, 2018
> Then, I would start to incrementally rewrite
> git-rebase--interactive.sh functions in C, and move them
> git-rebase--helper.c (as in commits 0cce4a2756[2] (rebase -i
> -x: add exec commands via the rebase--helper) and b903674b35[3]
> (bisect--helper: `is_expected_rev` & `check_expected_revs` shell
> function in C)).

I know what you mean but I would still appreciate if you could summarize it.

> There is a lot of functions into git-rebase--interactive.sh to
> rewrite. Most of them are small, and some of them are even wrappers
> for a single command (eg. is_merge_commit()), so they shouldn’t be
> really problematic.
>
> A couple of them are quite long (eg. pick_one()), and will probably
> be even longer once rewritten in C due to the low-level nature of the
> language. They also tend to depend a lot on other smaller functions.
>
> The plan here would be to start rewriting the smaller functions when
> applicable (ie. they’re not a simple command wrapper) before
> working on the biggest of them.
>
> Week 8 -- July 2, 2018 - July 8, 2018
> When all majors functions from git-rebase--interactive.sh have been
> rewritten in C, I would retire the script in favor of a builtin.
>
> Weeks 9 & 10 -- July 9, 2018 - July 22, 2018
> I plan to spend theses two weeks to improve the code coverage where
> needed.
>
> Weeks 11 & 12 -- July 23, 2018 - August 5, 2018
> In the last two weeks, I would polish the code where needed, in order
> to improve its performance or to make it more readable.

We like to have big improvements be split into batches of patches,
also called patch series, and polishing of the first batches happening
as soon as possible so that they can be ready to be merged soon. The
patch series that are sent to the mailing list often need a number of
round of reviews and improvements which can take a long time, so it is
better if this process starts as soon as possible.

Thanks.


Re: [RFC][GSoC] Project proposal: convert interactive rebase to C

2018-03-22 Thread Alban Gruin
Hi,

here is my second draft of my proposal. As last time, any feedback is
welcome :)

I did not write my phone number and address here for obvious reasons,
but they will be in the “about me” section of the final proposal.

Apart from that, do you think there is something to add?

---
ABSTRACT

git is a modular source control management software, and all of its
subcommands are programs on their own. A lot of them are written in C,
but a couple of them are shell or Perl scripts. This is the case of
git-rebase--interactive (or interactive rebase), which is a shell
script. Rewriting it in C would improve its performance, its
portability, and maybe its robustness.


ABOUT `git-rebase` and `git-rebase--interactive`

git-rebase allows to re-apply changes on top of another branch. For
instance, when a local branch and a remote branch have diverged,
git-rebase can re-unify them, applying each change made on the
local branch on top of the remote branch.

git-rebase--interactive is used to reorganize commits by reordering,
rewording, or squashing them. To achieve this purpose, git opens the
list of commits to be modified in a text editor (hence the
interactivity), as well as the actions to be performed for each of
them.


PROJECT GOALS

The goal of this project is to rewrite git-rebase--interactive in C
as it has been discussed on the git mailing list[1], for multiple
reasons :


Performance improvements
Shell scripts are inherently slow. That’s because each command is a
program by itself. So, for each command, the shell interpreter has to
spawn a new process and to load a new program (with fork() and
exec() syscalls), which is an expensive process.

Those commands can be other git commands. Sometimes, they are
wrappers to call internal C functions (eg. git-rebase--helper),
something shell scripts can’t do natively. These wrappers basically
parse the parameters, then start the appropriate function, which is
obviously slower than just calling a function from C.

Other commands can be POSIX utilities (eg. sed, cut, etc.). They
have their own problems (speed aside), namely portability.

Portability improvements
Shell scripts often relies on many of those POSIX utilities, which are
not necessarily natively available on all platforms (most notably,
Windows), or may have more or less features depending on the
implementation.

Although C is not perfect portability-wise, it’s still better than
shell scripts. For instance, the resulting binaries will not
necessarily depend on third-party programs or libraries.


RISKS

Of course, rewriting a piece of software takes time, and can lead to
regressions (ie. new bugs). To mitigate that risk, I should understand
well the functions I want to rewrite, run tests on a regular basis and
write new if needed, and of course discuss about my code with the
community during reviews.


APPROXIMATIVE TIMELINE

Community bonding -- April 23, 2018 - May 14, 2018
During the community bonding, I would like to dive into git’s
codebase, and to understand what git-rebase--interactive does under
the hood. At the same time, I’d communicate with the community and my
mentor, seeking for clarifications, and asking questions about how
things should or should not be done.

Weeks 1 & 2 -- May 14, 2018 - May 27, 2018
/From May 14 to 18, I have exams at the university, so I won’t be able
to work full time./

I would search for edge cases not covered by current tests and write
some if needed.

Week 3 -- May 28, 2018 - June 3, 2018
At the same time, I would refactor --preserve-merges in its own
shell script (as described in Dscho’s email[1]), if it has
not been deprecated or moved in the meantime. This operation is not
really tricky by itself, as --preserve-merges is about only 50 lines
of code into git_rebase__interactive().

Weeks 4 to 7 -- May 4, 2018 - July 1, 2018
Then, I would start to incrementally rewrite
git-rebase--interactive.sh functions in C, and move them
git-rebase--helper.c (as in commits 0cce4a2756[2] (rebase -i
-x: add exec commands via the rebase--helper) and b903674b35[3]
(bisect--helper: `is_expected_rev` & `check_expected_revs` shell
function in C)).

There is a lot of functions into git-rebase--interactive.sh to
rewrite. Most of them are small, and some of them are even wrappers
for a single command (eg. is_merge_commit()), so they shouldn’t be
really problematic.

A couple of them are quite long (eg. pick_one()), and will probably
be even longer once rewritten in C due to the low-level nature of the
language. They also tend to depend a lot on other smaller functions.

The plan here would be to start rewriting the smaller functions when
applicable (ie. they’re not a simple command wrapper) before
working on the biggest of them.

Week 8 -- July 2, 2018 - July 8, 2018
When all majors functions from git-rebase--interactive.sh have been
rewritten in C, I would retire the script in favor of a builtin.

Weeks 9 & 10 -- July 9, 2018 - July 22, 2018
I plan to spend theses two weeks to improve the 

Re: [RFC][GSoC] Project proposal: convert interactive rebase to C

2018-03-21 Thread Johannes Schindelin
Hi Alban,

On Wed, 21 Mar 2018, Alban Gruin wrote:

> Le mardi 20 mars 2018 17:29:28 CET, vous avez écrit :
> 
> > Also, I have a hunch that there is actually almost nothing left to
> > rewrite after my sequencer improvements that made it into Git v2.13.0,
> > together with the upcoming changes (which are on top of the
> > --recreate-merges patch series, hence I did not send them to the
> > mailing list yet) in
> > https://github.com/dscho/git/commit/c261f17a4a3e
> 
> One year ago, you said[2] that converting this script "will fill up 3
> month, very easily". Is this not accurate anymore?

Let me read that mail ;-)

*goes and reads*

Well, I was talking about two different aspects to Ivan and to you. I
should have been clearer. So let me try again:

To convert `git-rebase--interactive.sh`, I think the most important part
is to factor out the preserve-merges code into its own script. After that,
there is little I can think of (apart from support for --root, which a
not-yet-contributed patch in my sequencer-shears branch on
https://github.com/dscho/git addresses) that still needs to be converted.
For somebody familiar with Git's source code, I would estimate one week
(and therefore 3 weeks would be a realistic estimate :-)).

Come to think of it, a better approach might be to leave the
preserve-merges stuff in, and teach `git-rebase.sh` to call the sequencer
directly for --interactive without --preserve-merges, then rename the
script to git-rebase--preserve.sh

The other aspect, the one I thought would take up to 3 months, easily, was
to convert the entirety of rebase -i into C. That would entail also the
option parsing, for which you would have to convert also git-rebase.sh
(and if you do not convert git-rebase--am.sh and git-rebase--merge.sh
first, you would then have to teach builtin/rebase.c to populate the
environment variables expected by those shell scripts while spawning
them).

I still think that the latter is too big a task for a single GSoC.

> I’ll send a new draft as soon as possible (hopefully this afternoon).

I look forward to reading it!

Ciao,
Johannes

Re: [RFC][GSoC] Project proposal: convert interactive rebase to C

2018-03-21 Thread Alban Gruin
Hi Johannes,

Le mardi 20 mars 2018 17:29:28 CET, vous avez écrit :
> > Weeks 1 & 2 — May 14, 2018 – May 28, 2018
> > First, I would refactor --preserve-merges in its own shell script, as
> > described in Dscho’s email.
> 
> Could you go into detail a bit here? Like, describe what parts of the
> git-rebase--interactive.sh script would need to be duplicated, which ones
> would be truly moved, etc
> 

It would lead to duplicate a good chunk of git_rebase__interactive(), 
apparently. The moved parts would be everything in `if test t = 
"$preserve_merges"; then …; fi` statements. That is, about 50 lines of shell 
code.

Judging by that, beginning by that is probably not the right thing to do. 
Also, somebody is already working on that[1]. 

> > Weeks 3 & 4 — May 18, 2018 – June 11, 2018
> > Then, I would start to rewrite git-rebase--interactive, and get rid of
> > git-
> > rebase--helper.
> 
> I think this is a bit premature, as the rebase--helper would not only
> serve the --interactive part, but in later conversions also --am and
> --merge, and only in the very end, when git-rebase.sh gets converted,
> would we be able to simply rename the rebase--helper to rebase.
> 

Yes, Christian Couder told me that it would not be a good methodology too.

> Also, I have a hunch that there is actually almost nothing left to rewrite
> after my sequencer improvements that made it into Git v2.13.0, together
> with the upcoming changes (which are on top of the --recreate-merges patch
> series, hence I did not send them to the mailing list yet) in
> https://github.com/dscho/git/commit/c261f17a4a3e

One year ago, you said[2] that converting this script "will fill up 3 month, 
very easily". Is this not accurate anymore?

> 
> So I would like to see more details here... ;-)

Yep, I’m working on that. 

> > Weeks 5 to 9 — June 11, 2018 – July 15, 2018
> > During this period, I would continue to rewrite git-rebase--interactive.
> 
> It would be good if the proposal said what parts of the conversion are
> tricky, to merit spending a month on them.
> 
> > Weeks 10 & 11 — July 16, 2018 – July 29, 2018
> > In the second half of July, I would look for bugs in the new code, test
> > it,
> > and improve its coverage.
> 
> As I mentioned in a related mail, the test suite coverage would be most
> sensibly extended *before* starting to rewrite code in C, as it helps
> catching bugs early and avoids having to merge buggy code that needs to be
> fixed immediately.

Makes sense.

> 
> > Weeks 12 — July 30, 2018 – August 5, 2018
> > In the last week, I would polish the code where needed, in order to
> > improve for performance or to make the code more readable.
> 
> Thank you for sharing this draft with us!
> Johannes

I’ll send a new draft as soon as possible (hopefully this afternoon).

Thank you for your enthousiasm :)
Alban

[1] https://public-inbox.org/git/20180320204507.12623-1-w...@saville.com/
[2] https://public-inbox.org/git/alpine.DEB.
2.20.1703231827060.3767@virtualbox/




Re: [RFC] [GSoC] Project proposal: convert scripts to builtins

2018-03-21 Thread Pratik Karki
Hi Johannes,

Thanks for the feedback. Thanks to you, I realized my proposal was
a bit ambitious. Both git-stash and git-rebase are big
commitment. After much analyzing, I found out I cannot complete
both in the given time frame. So, I decided to stick to one and
complete it. I decided to stick with git-stash. Thank you for directing
me to the un-merged matches. Now, I can find the points where the
patch couldn't be effective and work towards completing those
effective things.

Please provide feedback for this updated proposal.

Cheers,
Pratik Karki


Convert Scripts to builtins
===

Abstract


Many components of Git are still in the form of shell and Perl scripts.
This has certain advantages of being extensible but causes problems in
production code on multiple platforms like Windows.\
I propose to rewrite a couple of shell and perl scripts into portable
and performant C code, making them built-ins. The major advantage of
doing this is improvement in efficiency and performance.

Much more scripts like `git-am` , `git-pull`, `git-branch` have already
been rewritten in C. Much more scripts like `git-rebase`, `git-stash`,
`git-add --interactive` are still present in shell and perl scripts. I
propose to work in `git-stash`.

### Shell Scripts:

Although shell scripts are more faster can be extensible in
functionality and can be more easier to write, they introduce certain
disadvantages.

1.  Dependencies:\
 The scripting languages and shell scripts make more productive code
but there is an overhead of dependencies. The shell scripts are
lighter and simpler and call other executables to perform
non-trivial tasks. Taking `git-stash` shell script for example.
`sed`, `rm`, `echo`, `test` are constantly present in `git-stash`.
These look common to POSIX platforms but for non-POSIX platforms
there needs some extra work for porting these commands. For example,
in Git for Windows, the workaround for these commands in non-POSIX
platform adds some extra utilities and adds MSYS2 shell commands and
needs to pack language runtime for Perl. This increases the
installation size and requires more disk space. Again, adding more
batteries again needs implementation in all of the dependency
libraries and executables.

2.  Inefficiency:\
 Git has internal caches for configuration values, the repository
index and repository objects. The porcelain commands do not have
access to git's internal API and so they spawn git processes to
perform git operations. For every git invocation, git would re-read
the user's configuration files, repository index, repopulate the
filesystem cache, etc. This leads to overhead and unnecessary I/O.
Windows is known to have worse I/O performance compared to Linux.
There is also slower I/O performance of HDD compared to SSD. This
unnecessary I/O operations causes runtime overhead and becomes
slower in poor I/O performance setups. Now, writing the porcelain
into C built-ins leverages the git API and there is no need of
spawning separate git processes, caching can be used to reduce
unnecessary I/O processes.

3.  Spawing processes is less performant:\
 Shell scripts usually spawn a lot of processes. Shell scripts are
very lighter and hence have limited functionalites. For
`git-stash.sh` to work it needs to perform lots of git operations
like `git rev-parse` `git config` and thus spawns git executable
processes for performing these operations. Again for invoking
`git config` and providing configuration values, it spawn new
processes to handle that. Spawning is implemented by `fork()` and
`exec()` by shells. Now, on systems that do not support
copy-on-write semantics for `fork()`, there is duplication of the
memory of the parent process for every `fork()` call which turns out
to be an expensive process. Now, in Windows, Git uses MSYS2
exclusively to emulate `fork()` but since, Windows doesnot support
forking semantics natively, the workaround provided by MSYS2
emulates `fork()` without [copy-on-write
semantics](https://www.cygwin.com/faq.html#faq.api.fork). Doing this
creates another layer over Windows processes and thus slows git.

Rewriting C built-ins
-

These above mentioned problems need to be fixed. The only fix for these
problems would be to write built-ins in C for all these shell scripts
leveraging the git API. Writing in built-in reduces the dependency
required by shell scripts. Since, Git is native executable in Windows,
doing this can make MSYS2 POSIX emulation obsolete. Then, using git's
internal API and C data types, built-in `git_config_get_value()` can be
used to get configuration value rather than spawning another git-config
process. This removes the necessary to re-read git configuration cache
everytime and reduces I/O. Furthermore, git-stash will be more faster
and show 

Re: [RFC][GSoC] Project proposal: convert interactive rebase to C

2018-03-20 Thread Johannes Schindelin
Hi Alban,

thank you for your proposal!

I will only comment on the parts that I feel could use improvement, the
rest is fine by me.

On Sat, 17 Mar 2018, Alban Gruin wrote:

> APPROXIMATIVE TIMELINE
> 
> Community bonding — April 23, 2018 – May 14, 2018
> During the community bonding, I would like to dive into git’s codebase,
> and to understand what git-rebase--interactive does under the hood. At
> the same time, I’d communicate with the community and my mentor, seeking
> for clarifications, and asking questions about how things should or
> should not be done.
> 
> Weeks 1 & 2 — May 14, 2018 – May 28, 2018
> First, I would refactor --preserve-merges in its own shell script, as
> described in Dscho’s email.

Could you go into detail a bit here? Like, describe what parts of the
git-rebase--interactive.sh script would need to be duplicated, which ones
would be truly moved, etc

> Weeks 3 & 4 — May 18, 2018 – June 11, 2018
> Then, I would start to rewrite git-rebase--interactive, and get rid of git-
> rebase--helper.

I think this is a bit premature, as the rebase--helper would not only
serve the --interactive part, but in later conversions also --am and
--merge, and only in the very end, when git-rebase.sh gets converted,
would we be able to simply rename the rebase--helper to rebase.

Also, I have a hunch that there is actually almost nothing left to rewrite
after my sequencer improvements that made it into Git v2.13.0, together
with the upcoming changes (which are on top of the --recreate-merges patch
series, hence I did not send them to the mailing list yet) in
https://github.com/dscho/git/commit/c261f17a4a3e

So I would like to see more details here... ;-)

> Weeks 5 to 9 — June 11, 2018 – July 15, 2018
> During this period, I would continue to rewrite git-rebase--interactive.

It would be good if the proposal said what parts of the conversion are
tricky, to merit spending a month on them.

> Weeks 10 & 11 — July 16, 2018 – July 29, 2018
> In the second half of July, I would look for bugs in the new code, test it, 
> and improve its coverage.

As I mentioned in a related mail, the test suite coverage would be most
sensibly extended *before* starting to rewrite code in C, as it helps
catching bugs early and avoids having to merge buggy code that needs to be
fixed immediately.

> Weeks 12 — July 30, 2018 – August 5, 2018
> In the last week, I would polish the code where needed, in order to
> improve for performance or to make the code more readable.

Thank you for sharing this draft with us!
Johannes

Re: [RFC] [GSoC] Project proposal: convert scripts to builtins

2018-03-20 Thread Johannes Schindelin
Hi Pratik,

thank you so much for posting this inline, to make it easier to review.

I will quote only on specific parts below; Please just assume that I like
the other parts and have nothing to add.

On Tue, 20 Mar 2018, Pratik Karki wrote:

> 
> Timeline and Development Cycle
> --
> 
> -   Apr 23: Accepted student proposals announced.
> -   Apr 23 onwards: Researching of all the test suites. Discussion of
> possible test improvements in for `git-stash` and `git-rebase`.
> -   May 1: Rewriting skeleton begins.

I would have liked more detail here. Like, maybe even a rudimentary
initial version identifying, say, a part of `git stash` and/or `git
rebase` that could be put into a builtin (stash--helper and
rebase--helper, respectively).

It is my experience from several GSoCs working on this huge overarching
project to convert the scripts (which are good prototypes, but lack in
stringency in addition to performance) to C that even the individual
scripts are too much to stem for a single GSoC.

> -   May 13: Making `builtin/stash.c` ready for review process. (This
> goes on for some time.)

There have been two past efforts to turn stash into a builtin:

https://github.com/git-for-windows/git/pull/508

and

https://public-inbox.org/git/20171110231314.30711-1-j...@teichroeb.net/

It would be good to read up on those and incorporate the learnings into
the proposal.

> -   May 26: Making `builtin/rebase.c` ready for review process. (This
> goes on for some time.)

The `git-rebase.sh` script is itself not terribly interesting, as it hands
off to `git-rebase--am.sh`, `git-rebase--interactive.sh` and
`git-rebase--merge.sh`, respectively.

Converting `git-rebase` into a builtin without first converting all of
those scripts would make little sense.

It would probably be better to choose one of those latter scripts and move
their functionality into a builtin, in an incremental fashion.

By doing it incrementally, you can also avoid...

> -   June 10: Make second versions with more improvements and more
> batteries ready for next review cycle.

... leaving two weeks between checkpoints. Also, doing it incrementally
lets you avoid sitting on your hands while waiting for the first patches
to be reviewed.

> -   June 20: Writing new tests and using more code-coverage tools to
> squash bugs present.

Typically it helps a lot to have those tests *during* the conversion.
That's how I found most of the bugs when converting difftool, for example.

> -   June 25 - Jul 20: Start optimizing `builtin/stash.c` and
> `builtin/rebase.c`. Benchmarking and profiling is done. They are
> exclusively compared to their original shell scripts, to check
> whether they are more performant or not and the results are
> published in the mailing list for further discussion.

Could you add details how you would perform benchmarking and profiling?

> -   Jul 20 - Aug 5: More optimizing and polishing of `builtin/stash.c`
> and `builtin/rebase.c` and running of new tests series written and
> send them for code review.
> -   Aug 14: Submit final patches.
> -   Aug 22: Results announced.
> -   Apr 24 - Aug 14: Documentation is written. "What I'm working on" is
> written and posted in my blog regarding GSoC with Git.

The timeline is a bit ambitious. I would like to caution you that these
are all big tasks, and maybe you want to cut down on the deliverables, and
add more detail what exactly you want to deliver (such as: what part of
stash/rebase do you find under-tested in our test suite and would
therefore want to augment, what parts of stash/rebase do you think you
would handle first, and how?).

Ciao,
Johannes


Re: [RFC] [GSoC] Project proposal: convert scripts to builtins

2018-03-20 Thread Pratik Karki
Hi,
This is my draft for my proposal on "Convert Scripts to builtins" for GSoC.
Please review and provide feedback.


Cheers,
Pratik Karki


Convert Scripts to builtins
===

Abstract


Many components of Git are still in the form of shell and Perl scripts.
This has certain advantages of being extensible but causes problems in
production code on multiple platforms like Windows.
I propose to rewrite a couple of shell and perl scripts into portable
and performant C code, making them built-ins. The major advantage of
doing this is improvement in efficiency and performance.

Much more scripts like `git-am` , `git-pull`, `git-branch` have already
been rewritten in C. Much more scripts like `git-rebase`, `git-stash`,
`git-add --interactive` are still present in shell and perl scripts.
I propose to work in `git-rebase` and `git-stash`.

Shell Scripts:
--

Although shell scripts are more faster can be extensible in
functionality and can be more easier to write, they introduce certain
disadvantages.

1.  Dependencies:
 The scripting languages and shell scripts make more productive code
but there is an overhead of dependencies. The shell scripts are
lighter and simpler and call other executables to perform
non-trivial tasks. Taking `git-stash` shell script for example.
`sed`, `rm`, `echo`, `test` are constantly present in `git-stash`.
These look common to POSIX platforms but for non-POSIX platforms
there needs some extra work for porting these commands. For example,
in Git for Windows, the workaround for these commands in non-POSIX
platform adds some extra utilities and adds MSYS2 shell commands and
needs to pack language runtime for Perl. This increases the
installation size and requires more disk space. Again, adding more
batteries again needs implementation in all of the dependency
libraries and executables.

2.  Inefficiency:
 Git has internal caches for configuration values, the repository
index and repository objects. The porcelain commands do not have
access to git's internal API and so they spawn git processes to
perform git operations. For every git invocation, git would re-read
the user's configuration files, repository index, repopulate the
filesystem cache, etc. This leads to overhead and unnecessary I/O.
Windows is known to have worse I/O performance compared to Linux.
There is also slower I/O performance of HDD compared to SSD. This
unnecessary I/O operations causes runtime overhead and becomes
slower in poor I/O performance setups. Now, writing the porcelain
into C built-ins leverages the git API and there is no need of
spawning separate git processes, caching can be used to reduce
unnecessary I/O processes.

3.  Spawing processes is less performant:
 Shell scripts usually spawn a lot of processes. Shell scripts are
very lighter and hence have limited functionalites. For
`git-stash.sh` to work it needs to perform lots of git operations
like `git rev-parse` `git config` and thus spawns git executable
processes for performing these operations. Again for invoking
`git config` and providing configuration values, it spawn new
processes to handle that. Spawning is implemented by `fork()` and
`exec()` by shells. Now, on systems that do not support
copy-on-write semantics for `fork()`, there is duplication of the
memory of the parent process for every `fork()` call which turns out
to be an expensive process. Now, in Windows, Git uses MSYS2
exclusively to emulate `fork()` but since, Windows doesnot support
forking semantics natively, the workaround provided by MSYS2
emulates `fork()` without [copy-on-write
semantics](https://www.cygwin.com/faq.html#faq.api.fork). Doing this
creates another layer over Windows processes and thus slows git.

Rewriting C built-ins
-

These above mentioned problems need to be fixed. The only fix for these
problems would be to write built-ins in C for all these shell scripts
leveraging the git API. Writing in built-in reduces the dependency
required by shell scripts. Since, Git is native executable in Windows,
doing this can make MSYS2 POSIX emulation obsolete. Then, using git's
internal API and C data types, built-in `git_config_get_value()` can be
used to get configuration value rather than spawning another git-config
process. This removes the necessary to re-read git configuration cache
everytime and reduces I/O. Furthermore, git-stash and git-rebase will be
more faster and show consistent behaviour as instead of spawing another
process and parsing command-line arguments manually, they can be
hardcoded to be built-in and leverage all the required git's internal
API's like `parse-options`.

To implement git-stash and git-rebase in C, I propose to avoid spawning
lots of external git processes and reduce redundant I/O by taking
advantage of the internal 

Re: [RFC] [GSoC] Project proposal: convert scripts to builtins

2018-03-20 Thread Christian Couder
Hi,

On Tue, Mar 20, 2018 at 10:00 AM, Pratik Karki  wrote:
> Hi,
> This is my draft for my proposal on "Convert Scripts to builtin" for GSoC.
> Please review and provide feedbacks.
>
> https://gist.github.com/prertik/daaa73a39d3ce30811d9a208043dc235

It would be easier for us to comment if the markdown was sent inline.

Thanks,
Christian.


[RFC] [GSoC] Project proposal: convert scripts to builtins

2018-03-20 Thread Pratik Karki
Hi,
This is my draft for my proposal on "Convert Scripts to builtin" for GSoC.
Please review and provide feedbacks.

https://gist.github.com/prertik/daaa73a39d3ce30811d9a208043dc235

Cheers,
Pratik Karki


Re: [RFC][GSoC] Project proposal: convert interactive rebase to C

2018-03-17 Thread Christian Couder
Hi,

On Sat, Mar 17, 2018 at 8:14 PM, Alban Gruin  wrote:
>
> Weeks 3 & 4 — May 18, 2018 – June 11, 2018
> Then, I would start to rewrite git-rebase--interactive, and get rid of git-
> rebase--helper.

Usually to rewrite a shell script in C, we first rewrite shell
functions into option arguments in a C builtin helper and make the
schell script call the builtin helper (instead of the original shell
functions). Eventually when the shell script is mostly only calling
the builtin helper, we add what is needed into the builtin helper and
we rename it to make it fully replace the shell script.

See for example 0cce4a2756 (rebase -i -x: add exec commands via the
rebase--helper, 2017-12-05) or b903674b35 (bisect--helper:
`is_expected_rev` & `check_expected_revs` shell function in C,
2017-09-29). These examples show that we can do step by step rewrites.

I would suggest planning to use the same approach, and describing in
your proposal which shell functions you would like to rewrite into the
C builtin helper in which order, before planning to fully replace the
current git-rebase--interactive.

Thanks,
Christian.


[RFC][GSoC] Project proposal: convert interactive rebase to C

2018-03-17 Thread Alban Gruin
Hi,

here is my first draft of my proposal for the GSoC, about the "convert 
interactive rebase to C" project. Any feedback is welcome :)

---
ABSTRACT
git is a modular source control management software, and all of its 
subcommands are programs on their own. A lot of them are written in C, but a 
couple of them are shell or Perl scripts. This is the case of git-rebase--
interactive (or interactive rebase), which is a shell script. Rewriting it in 
C would improve its performance, its portability, and maybe its robustness.


ABOUT `git-rebase{,--interactive}`

git-rebase allows to re-apply changes on top of another branch. For instance, 
when a local branch and a remote branch have diverged, git-rebase can re-unify 
them, applying each change made on the local branch on top of the remote 
branch.

git-rebase--interactive is used to reorganize commits by reordering, 
rewording, or squashing them. To achieve this purpose, git opens the list of 
commits to be modified in a text editor (hence the interactivity), as well as 
the actions to be performed for each of them.


PROJECT GOALS

The goal of this project is to rewrite git-rebase--interactive in C as it has 
been discussed on the git mailing list[1], for multiple reasons :

Performance improvements
Shell scripts are inherently slow. That’s because each command is a program by 
itself. So, for each command, the shell interpreter has to spawn a new process 
and to load a new program.

Those commands can be other git commands. Sometimes, they are wrappers to call 
internal C functions (eg. git-rebase--helper), something shell scripts can’t 
do natively. These wrappers basically parse the parameters, then start the 
appropriate function, which is
obviously slower than just calling a function from C.

Other commands can be POSIX utilities (eg. sed, cut, etc.). They have their 
own problems (speed aside), namely portability.

Portability improvements
Shell scripts often relies on many of those POSIX utilities, which are not 
necessarily natively available on all platforms (most notably, Windows), or 
may have more or less features depending on the implementation.


APPROXIMATIVE TIMELINE

Community bonding — April 23, 2018 – May 14, 2018
During the community bonding, I would like to dive into git’s codebase, and to 
understand what git-rebase--interactive does under the hood. At the same time, 
I’d communicate with the community and my mentor, seeking for clarifications, 
and asking questions about how things should or should not be done.

Weeks 1 & 2 — May 14, 2018 – May 28, 2018
First, I would refactor --preserve-merges in its own shell script, as 
described in Dscho’s email.

Weeks 3 & 4 — May 18, 2018 – June 11, 2018
Then, I would start to rewrite git-rebase--interactive, and get rid of git-
rebase--helper.

Weeks 5 to 9 — June 11, 2018 – July 15, 2018
During this period, I would continue to rewrite git-rebase--interactive.

Weeks 10 & 11 — July 16, 2018 – July 29, 2018
In the second half of July, I would look for bugs in the new code, test it, 
and improve its coverage.

Weeks 12 — July 30, 2018 – August 5, 2018
In the last week, I would polish the code where needed, in order to improve 
for performance or to make the code more readable.


ABOUT ME

My name is Alban Gruin, I am an undergraduate at the Paul Sabatier University 
in Toulouse, France, where I have been studying Computer Sciences for the past 
year and a half. My timezone currently is UTC+01:00, but will be UTC+02:00 
starting from March 25th, because of the daylight saving time in Europe.

I have been programming in C for the last 5 years. I learned using freely 
available resources online, and by attending class ever since last year.

I am also quite familiar with shell scripts, and I have been using git for the 
last 3 years. 

My e-mail address is alban  gruin  gmail  com. My IRC nick is 
abngrn.

My micro-project was "userdiff: add built-in pattern for golang"[2][3].

---

You can find the Google Doc version here[4].

Regards,
Alban Gruin

[1] 
https://public-inbox.org/git/alpine.DEB.2.20.1609021432070.129229@virtualbox/
[2] https://public-inbox.org/git/20180228172906.30582-1-alban.gr...@gmail.com/
[3] https://git.kernel.org/pub/scm/git/git.git/commit/?id=1dbf0c0a
[4] 
https://docs.google.com/document/d/1Jx0w867tVAht7QI1_prieiXg_iQ_nTloOyaIIOnm85g/edit?usp=sharing




Project Proposal

2017-10-03 Thread Joseph Taylorl
Dear sir,

I am Barrister Joseph Taylor, a legal Solicitor. I was the Personal
Attorney and legal adviser to Mr. John ALBIN, a national of your country,
who was an expatriate engineer to British Petroleum oil Company. My
client, his wife, and their three children were involved in the ill fated
Kenya Airways crash in the coasts of Abidjan in January 2000 in which all
passengers on board died. Since then I have made several inquiries to your
embassy to locate any of my clients extended relatives, this has proved
unsuccessful.

After these several unsuccessful attempts, I decided to trace his
relatives over the Internet, to locate any member of his family but of no
avail, hence I contacted you. I have contacted you to assist in
repatriating the fund and property left behind by my client before the
bank diverts the fund to government treasury.

I seek your consent to present you as a relative to the deceased since you
are from the same country so that his fund valued at Twenty Million United
States Dollars (US$20,000,000.00) can be transferred to you by the bank. I
will procure all the necessary claim documents from the court to make your
claim most legal and legitimate. I will also present you officially to the
bank since I am in a position to do so being the executor of his will.

We shall decide on the sharing formula once I hear from you. However, I
will expect to hear from you urgently because time is running out on us to
claim the fund.

I am Waiting for your urgent response.

Best Regards,
Joseph Taylor












[GSoC] Choosing a Project Proposal

2014-03-19 Thread Brian Bourn
Hi all,

I'm Currently trying to decide on a project to work on in for Google
Summer of Code, I'm stuck choosing between three which I find really
interesting and I was wondering if any of them are particularly more
pressing then the others.  I would also love some comments on each of
these three if possible expanding on them. the three projects I'm
considering are,

1.  Unifying git branch -l, git tag -l, and git for-each-ref

2.  Refactor tempfile handling

3.  Improve triangular workflow support


Once again, I would appreciate all feedback on which of these are most
important.

Thanks for the Help,
Brian Bourn
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html