Re: GSoC 2016 | Proposal | Incremental Rewrite of git bisect

2016-03-25 Thread Christian Couder
On Fri, Mar 25, 2016 at 11:15 AM, Pranit Bauva  wrote:
>> - you will add an option to "git bisect--helper" to perform what the
>> git-bisect.sh function did, and
>> - you will create a test script for "git bisect--helper" in which you
>> will test each option?
>
> I had very initially planned to do this. But Matthieu pointed out that
> it would be much better to use the existing test suite rather than
> creating one which can lead to less coverage.

Ok, then perhaps:

- you will add tests to existing test scripts, so that each "git
bisect--helper" option is (indirectly) tested.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: GSoC 2016 | Proposal | Incremental Rewrite of git bisect

2016-03-25 Thread Pranit Bauva
> - you will add an option to "git bisect--helper" to perform what the
> git-bisect.sh function did, and
> - you will create a test script for "git bisect--helper" in which you
> will test each option?

I had very initially planned to do this. But Matthieu pointed out that
it would be much better to use the existing test suite rather than
creating one which can lead to less coverage.

Thanks,
Pranit Bauva
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: GSoC 2016 | Proposal | Incremental Rewrite of git bisect

2016-03-25 Thread Pranit Bauva
On Fri, Mar 25, 2016 at 2:45 PM, Matthieu Moy
<matthieu@grenoble-inp.fr> wrote:
> Christian Couder <christian.cou...@gmail.com> writes:
>
>> On Thu, Mar 24, 2016 at 12:27 AM, Pranit Bauva <pranit.ba...@gmail.com> 
>> wrote:
>>
>>> Unification of bisect.c and bisect--helper.c
>>>
>>> This will unify the algorithmic and non-algorithmic parts of bisect
>>> bringing them under one heading to make the code clean.
>>
>> I am not sure this is needed and a good idea. Maybe you will rename
>> "builtin/bisect--helper.c" to "builtin/bisect.c" and remove
>> git-bisect.sh at the same time to complete the shell to C move. But
>> the actual bisect.{c,h} might be useful as they are for other
>> purposes.
>
> Yes. My view on this is that builtin/*.c should be just user-interface,
> and actual stuff should be outside builtin, ideally in a well-designed
> and reusable library (typically re-usable by libgit2 or others to
> provide another UI for the same feature). Not all commands work this
> way, but I think this is a good direction to take.

Okay. I didn't know about this. Thanks for completing Christian's point.

>> When you have sent one patch series, even a small one, then your main
>> goal should be to have this patch series merged.
>
> I'd add: to get a patch series merged, two things take time:
>
> 1) latency: let time to other people to read and comment on your code.
>
> 2) extra-work required by reviewers.
>
> You want to send series early because of 1) (then you can work on the
> next series while waiting for reviews on the current one), and you need
> to prioritize 2) over working on the next series to minimize in-flight
> topics.

I had planned to work this way. I will include this in the proposal.
Though it creates some confusion for me and I tend to mix some things
up but I will maintain a hard copy to jot down the discussions and my
thoughts.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: GSoC 2016 | Proposal | Incremental Rewrite of git bisect

2016-03-25 Thread Pranit Bauva
On Fri, Mar 25, 2016 at 2:32 PM, Christian Couder
<christian.cou...@gmail.com> wrote:
> On Thu, Mar 24, 2016 at 12:27 AM, Pranit Bauva <pranit.ba...@gmail.com> wrote:
>> Hey!
>>
>> I have prepared a proposal for Google Summer of Code 2016. I know this
>> is a bit late, but please try to give your comments and suggestions.
>> My proposal could greatly improve from this. Some questions:
>>
>> 1. Should I include more ways in which it can help windows?
>
> I don't think it is necessary.
>
>> 2. Should I include the function names I intend to convert?
>
> I don't think it is necessary, but if you want, you can take a look at
> some big ones (or perhaps just one big) and explain how you plan to
> convert it (using which C functions or apis).

I try to do it for one big one if there is some time left.

>> 3. Is my timeline (a bit different) going to affect me in any way?
>
> What is important with the timeline is just that it looks realistic.
> So each task should have a realistic amount of time and the order in
> which tasks are listed should be logical.
> I commented below about how I think you could improve your timeline.

Your suggestions seem nice to me. I have thought about changing some
parts. I have described some changes below.

>> Here is a Google doc for my proposal.
>> https://docs.google.com/document/d/1stnDPA5Hs3u0a8sqoWZicTFpCz1wHP9bkifcKY13Ocw/edit?usp=sharing
>>
>> For the people who prefer the text only version :
>>
>> ---
>>
>> Incremental rewrite of Git bisect
>>
>> About Me
>>
>> Basic Information
>>
>>
>> Name   Pranit Bauva
>>
>> University IIT Kharagpur
>>
>> MajorMining Engineering
>>
>> Emailpranit.ba...@gmail.com
>>
>> IRC  pungi-man
>>
>> Blog http://bauva.in
>>
>> Timezone IST (UTC +5:30)
>>
>> Background
>>
>> I am a first year undergraduate in the department of Mining
>> Engineering at Indian Institute of Technology, Kharagpur. I am an open
>> source enthusiast. I am a part of Kharagpur Linux Users Group which is
>> basically a group of open-source enthusiasts. I am quite familiar with
>> C and I have been using shell for some time now and still find new
>> things about it everyday. I have used SVN when I was on Windows and
>> then I switched to Git when I moved to linux. Git seems like magic. I
>> always wanted to involve in the development process and Google Summer
>> of Code is an a awesome way to achieve it.
>>
>>
>> Abstract
>>
>> Git bisect is a frequently used command which helps the developers in
>> finding the commit which introduced the bug. Some part of it is
>> written in shell script. I intend to convert it to low level C code
>> thus making them builtins. This will increase Git’s portability.
>> Efficiency of git bisect will definitely increase but it would not
>> really matter much as most of the time is consumed in compiling or
>> testing when in bisection mode but it will definitely reduce the
>> overhead IO which can make the heavy process of compiling relatively
>> lighter.
>>
>>
>> Problems Shell creates
>>
>> System Dependencies
>>
>> Using shell code introduces various dependencies even though they
>> allowing prototyping of the code quickly. Shell script often use some
>> POSIX utilities like cat, grep, ls, mkdir, etc which are not included
>> in non-POSIX systems by default. These scripts do not have access to
>> the git’s internal low level API. So even trivial tasks have to be
>> performed by spawning new process every time. So when git is ported to
>> windows, it has to include all the utilities (namely a shell
>> interpreter, perl bindings and much more).
>>
>> Scripts introduce extra overheads
>>
>> Shell scripts do not have access to Git’s internal API which has
>> excellent use of cache thus reducing the unnecessary IO of user
>> configuration files, repository index and filesystem access. By using
>> a builtin we could exploit the cache system thus reducing the
>> overhead. As compiling / testing already involves quite a number of
>> resources, it would be good if we could do our best to make more
>> resources available for that.
>>
>> Potential Problems
>>
>> Rewriting may introduce bugs
>>
>> Rewriting the shell script to C might introduce some bugs. This
>> problem will be properly taken care of in my method of

Re: GSoC 2016 | Proposal | Incremental Rewrite of git bisect

2016-03-25 Thread Matthieu Moy
Christian Couder  writes:

> On Thu, Mar 24, 2016 at 12:27 AM, Pranit Bauva  wrote:
>
>> Unification of bisect.c and bisect--helper.c
>>
>> This will unify the algorithmic and non-algorithmic parts of bisect
>> bringing them under one heading to make the code clean.
>
> I am not sure this is needed and a good idea. Maybe you will rename
> "builtin/bisect--helper.c" to "builtin/bisect.c" and remove
> git-bisect.sh at the same time to complete the shell to C move. But
> the actual bisect.{c,h} might be useful as they are for other
> purposes.

Yes. My view on this is that builtin/*.c should be just user-interface,
and actual stuff should be outside builtin, ideally in a well-designed
and reusable library (typically re-usable by libgit2 or others to
provide another UI for the same feature). Not all commands work this
way, but I think this is a good direction to take.

> When you have sent one patch series, even a small one, then your main
> goal should be to have this patch series merged.

I'd add: to get a patch series merged, two things take time:

1) latency: let time to other people to read and comment on your code.

2) extra-work required by reviewers.

You want to send series early because of 1) (then you can work on the
next series while waiting for reviews on the current one), and you need
to prioritize 2) over working on the next series to minimize in-flight
topics.

-- 
Matthieu Moy
http://www-verimag.imag.fr/~moy/
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: GSoC 2016 | Proposal | Incremental Rewrite of git bisect

2016-03-25 Thread Christian Couder
On Thu, Mar 24, 2016 at 12:27 AM, Pranit Bauva <pranit.ba...@gmail.com> wrote:
> Hey!
>
> I have prepared a proposal for Google Summer of Code 2016. I know this
> is a bit late, but please try to give your comments and suggestions.
> My proposal could greatly improve from this. Some questions:
>
> 1. Should I include more ways in which it can help windows?

I don't think it is necessary.

> 2. Should I include the function names I intend to convert?

I don't think it is necessary, but if you want, you can take a look at
some big ones (or perhaps just one big) and explain how you plan to
convert it (using which C functions or apis).

> 3. Is my timeline (a bit different) going to affect me in any way?

What is important with the timeline is just that it looks realistic.
So each task should have a realistic amount of time and the order in
which tasks are listed should be logical.
I commented below about how I think you could improve your timeline.

> Here is a Google doc for my proposal.
> https://docs.google.com/document/d/1stnDPA5Hs3u0a8sqoWZicTFpCz1wHP9bkifcKY13Ocw/edit?usp=sharing
>
> For the people who prefer the text only version :
>
> ---
>
> Incremental rewrite of Git bisect
>
> About Me
>
> Basic Information
>
>
> Name   Pranit Bauva
>
> University IIT Kharagpur
>
> MajorMining Engineering
>
> Emailpranit.ba...@gmail.com
>
> IRC  pungi-man
>
> Blog http://bauva.in
>
> Timezone IST (UTC +5:30)
>
> Background
>
> I am a first year undergraduate in the department of Mining
> Engineering at Indian Institute of Technology, Kharagpur. I am an open
> source enthusiast. I am a part of Kharagpur Linux Users Group which is
> basically a group of open-source enthusiasts. I am quite familiar with
> C and I have been using shell for some time now and still find new
> things about it everyday. I have used SVN when I was on Windows and
> then I switched to Git when I moved to linux. Git seems like magic. I
> always wanted to involve in the development process and Google Summer
> of Code is an a awesome way to achieve it.
>
>
> Abstract
>
> Git bisect is a frequently used command which helps the developers in
> finding the commit which introduced the bug. Some part of it is
> written in shell script. I intend to convert it to low level C code
> thus making them builtins. This will increase Git’s portability.
> Efficiency of git bisect will definitely increase but it would not
> really matter much as most of the time is consumed in compiling or
> testing when in bisection mode but it will definitely reduce the
> overhead IO which can make the heavy process of compiling relatively
> lighter.
>
>
> Problems Shell creates
>
> System Dependencies
>
> Using shell code introduces various dependencies even though they
> allowing prototyping of the code quickly. Shell script often use some
> POSIX utilities like cat, grep, ls, mkdir, etc which are not included
> in non-POSIX systems by default. These scripts do not have access to
> the git’s internal low level API. So even trivial tasks have to be
> performed by spawning new process every time. So when git is ported to
> windows, it has to include all the utilities (namely a shell
> interpreter, perl bindings and much more).
>
> Scripts introduce extra overheads
>
> Shell scripts do not have access to Git’s internal API which has
> excellent use of cache thus reducing the unnecessary IO of user
> configuration files, repository index and filesystem access. By using
> a builtin we could exploit the cache system thus reducing the
> overhead. As compiling / testing already involves quite a number of
> resources, it would be good if we could do our best to make more
> resources available for that.
>
> Potential Problems
>
> Rewriting may introduce bugs
>
> Rewriting the shell script to C might introduce some bugs. This
> problem will be properly taken care of in my method of approach
> (described below). Still this approach will definitely not guarantee
> that the functionality of the new will be exactly similar to the old
> one, though it will greatly reduce its possibility. The reviews
> provided by the seniors in the git community would help a lot in
> reducing bugs since they know the common bugs and how to work around
> them. The test suite of git is quite nice which has an awesome
> coverage.
>
> Rewritten can be hard to understand
>
> Git does not like having many external dependencies, libraries or
> executables other than what is provided by git itself and the
> rewritten code should follow this. C does not p

Re: [GSoC] Proposal

2016-03-25 Thread 惠轶群
Well, I should have done some search before ask.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [GSoC] Proposal

2016-03-24 Thread Pranit Bauva
Some developers are already working on that[1].

[1]: http://thread.gmane.org/gmane.comp.version-control.git/288306

On Fri, Mar 25, 2016 at 10:12 AM, 惠轶群  wrote:
> There is an interesting idea as an idea for GSoC of 2008, is it still
> proposable?
>
> https://git.wiki.kernel.org/index.php/SoC2008Ideas#Restartable_Clone
>
> 2016-03-25 11:45 GMT+08:00 惠轶群 :
>> Hi,
>>
>> I'm proposing to take part in GSoC as a developer of git.
>>
>> Here is my 
>> [Draft](https://docs.google.com/document/d/1zqOVb_cnYcaix48ep1KNPeLpRHvNKA26kNXc78yjhMg/edit?usp=sharing).
>>
>> I'm planning to refactor some part of git. Following is what I'm interested 
>> in:
>>
>> - port parts of “git rebase” to a C helper
>> - “git status” during non-interactive rebase
>> - etc interesting during the development
>>
>> If time allow, I'd like to also improve git-bisect, for example:
>>
>> - convert “git-bisect.sh” to a builtin
>> - etc
>>
>> Sorry for t late. I was so busy these days. sorry again.
> --
> To unsubscribe from this list: send the line "unsubscribe git" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [GSoC] Proposal

2016-03-24 Thread 惠轶群
There is an interesting idea as an idea for GSoC of 2008, is it still
proposable?

https://git.wiki.kernel.org/index.php/SoC2008Ideas#Restartable_Clone

2016-03-25 11:45 GMT+08:00 惠轶群 :
> Hi,
>
> I'm proposing to take part in GSoC as a developer of git.
>
> Here is my 
> [Draft](https://docs.google.com/document/d/1zqOVb_cnYcaix48ep1KNPeLpRHvNKA26kNXc78yjhMg/edit?usp=sharing).
>
> I'm planning to refactor some part of git. Following is what I'm interested 
> in:
>
> - port parts of “git rebase” to a C helper
> - “git status” during non-interactive rebase
> - etc interesting during the development
>
> If time allow, I'd like to also improve git-bisect, for example:
>
> - convert “git-bisect.sh” to a builtin
> - etc
>
> Sorry for t late. I was so busy these days. sorry again.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[GSoC] Proposal

2016-03-24 Thread 惠轶群
Hi,

I'm proposing to take part in GSoC as a developer of git.

Here is my 
[Draft](https://docs.google.com/document/d/1zqOVb_cnYcaix48ep1KNPeLpRHvNKA26kNXc78yjhMg/edit?usp=sharing).

I'm planning to refactor some part of git. Following is what I'm interested in:

- port parts of “git rebase” to a C helper
- “git status” during non-interactive rebase
- etc interesting during the development

If time allow, I'd like to also improve git-bisect, for example:

- convert “git-bisect.sh” to a builtin
- etc

Sorry for t late. I was so busy these days. sorry again.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[GSoC] Proposal

2016-03-24 Thread XZS
Greetings,

I hope it is not yet too late to jump on the Summer of Code bandwagon.

I would appreciate comments on my application [1] and my microproject
contribution, which will follow this mail as a reply.

My proposal mostly stems from what was noted under "convert scripts to
builtins" and "git rebase improvements" in the ideas page. Both list no mentor,
so please let me know if you know anyone who should be mentioned in CC.

Regards,
XZS.


[1]: 
https://docs.google.com/document/d/1-BV-s5VUGTvBlcVDeo6tVqQO5D1hqeQDqaf37iYuIfU/edit?usp=sharing
-- 
2.7.4

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


GSoC proposal

2016-03-24 Thread work
As I was strongly encouraged to submit my GSoC proposal, I'll post it 
here and CC to my possible mentor.
Please, provide with your feedback about my draft. You can also comment 
it right in the Google doc. Thanks in advance


Proposal: 
https://docs.google.com/document/d/1Hpu9FfD3wb7qgWgTiKtIAie41OXK3ufgnhnNuRaEH4E



--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


GSoC 2016 | Proposal | Incremental Rewrite of git bisect

2016-03-23 Thread Pranit Bauva
Hey!

I have prepared a proposal for Google Summer of Code 2016. I know this
is a bit late, but please try to give your comments and suggestions.
My proposal could greatly improve from this. Some questions:

1. Should I include more ways in which it can help windows?
2. Should I include the function names I intend to convert?
3. Is my timeline (a bit different) going to affect me in any way?

Here is a Google doc for my proposal.
https://docs.google.com/document/d/1stnDPA5Hs3u0a8sqoWZicTFpCz1wHP9bkifcKY13Ocw/edit?usp=sharing

For the people who prefer the text only version :

---

Incremental rewrite of Git bisect

About Me

Basic Information


Name   Pranit Bauva

University IIT Kharagpur

MajorMining Engineering

Emailpranit.ba...@gmail.com

IRC  pungi-man

Blog http://bauva.in

Timezone IST (UTC +5:30)

Background

I am a first year undergraduate in the department of Mining
Engineering at Indian Institute of Technology, Kharagpur. I am an open
source enthusiast. I am a part of Kharagpur Linux Users Group which is
basically a group of open-source enthusiasts. I am quite familiar with
C and I have been using shell for some time now and still find new
things about it everyday. I have used SVN when I was on Windows and
then I switched to Git when I moved to linux. Git seems like magic. I
always wanted to involve in the development process and Google Summer
of Code is an a awesome way to achieve it.


Abstract

Git bisect is a frequently used command which helps the developers in
finding the commit which introduced the bug. Some part of it is
written in shell script. I intend to convert it to low level C code
thus making them builtins. This will increase Git’s portability.
Efficiency of git bisect will definitely increase but it would not
really matter much as most of the time is consumed in compiling or
testing when in bisection mode but it will definitely reduce the
overhead IO which can make the heavy process of compiling relatively
lighter.


Problems Shell creates

System Dependencies

Using shell code introduces various dependencies even though they
allowing prototyping of the code quickly. Shell script often use some
POSIX utilities like cat, grep, ls, mkdir, etc which are not included
in non-POSIX systems by default. These scripts do not have access to
the git’s internal low level API. So even trivial tasks have to be
performed by spawning new process every time. So when git is ported to
windows, it has to include all the utilities (namely a shell
interpreter, perl bindings and much more).

Scripts introduce extra overheads

Shell scripts do not have access to Git’s internal API which has
excellent use of cache thus reducing the unnecessary IO of user
configuration files, repository index and filesystem access. By using
a builtin we could exploit the cache system thus reducing the
overhead. As compiling / testing already involves quite a number of
resources, it would be good if we could do our best to make more
resources available for that.

Potential Problems

Rewriting may introduce bugs

Rewriting the shell script to C might introduce some bugs. This
problem will be properly taken care of in my method of approach
(described below). Still this approach will definitely not guarantee
that the functionality of the new will be exactly similar to the old
one, though it will greatly reduce its possibility. The reviews
provided by the seniors in the git community would help a lot in
reducing bugs since they know the common bugs and how to work around
them. The test suite of git is quite nice which has an awesome
coverage.

Rewritten can be hard to understand

Git does not like having many external dependencies, libraries or
executables other than what is provided by git itself and the
rewritten code should follow this. C does not provide with a lot of
other facilities like text processing which shell does whose C
implementation often spans to multiple lines. C is also notorious for
being a bit “cryptic”. This problem can be compensated by having well
written documentation with well defined inputs, outputs and behavior.

A peek into git bisect

How does it help?

Git bisect helps the software developers to find the commit that
introduced a regression. Software developers are interested in knowing
this because a commit changes a small set of code (most time).  It is
much easier to understand and fix a problem when you know only need to
check a very small set of changes, than when you don’t know where to
look at it. It is not that the problem will be exactly in that commit
but it will be related to the behavior introduced in the commit.
Software bugs can be a nightmare when the code base is very large.
There would be a lot of sleepless night in figuring out the part which
causes the error. This is where git bisect helps. This is the one of
the most sought after tool

Re: [GSOC/RFC] GSoC Proposal Draft | Git Beginner

2016-03-22 Thread Sidhant Sharma
Updated examples with better description for force push and reset HEAD, as
suggested by Lars [11].

Thanks and regards,
Sidhant Sharma

[11]: http://thread.gmane.org/gmane.comp.version-control.git/289365/focus=289495

---

Implement a beginner mode for Git.

Abstract

Git is a very powerful version control system, with an array of features
that lend the user with great capabilities. But it often so happens that some
beginners are overwhelmed by its complexity and are unable to fully understand
and thus, utilize Git. Moreover, often beginners do not fully understand
the command they are using and end up making destructive (and occasionally,
irreversible) changes to the repository.

The beginner mode will assist such  users in using Git by warning them
before making possibly destructive changes. It will also display tips and
short snippets of documentation for better understanding the Git model.

Google summer of code Idea suggested here:
http://git.github.io/SoC-2016-Ideas/#git-beginner

About Me

Name : Sidhant Sharma
Email [1] : Sidhant.Sharma1208  gmail.com
Email [2] : Tigerkid001   gmail.com
College : Delhi Technological University
Studying : Software Engineering
IRC : tk001 (or _tk_)
Phone : 91-9990-606-081
Country : India
Interests : Computers, Books, Photography
Github : Tigerkid001
LinkedIn : https://in.linkedin.com/in/sidhantsharma12

Technical Experience

Authored several Mozilla Firefox and Google Chrome extensions:
Firefox: Owl [1], Blink [2], Spoiler Jedi [3]
Chrome: Blink [4]

Developed a robust Plugin framework for Android [5] for a startup.
Learning Linux kernel programming via the Eudyptula Challenge [6]
(currently level 6).
Developed natural language processor for sarcasm detection [7] in tweets.
Developed hand gesture detection module [8] as a college minor project.
Active Firefox Add-ons Editor at AMO [9].
Currently working on a restaurant image classification project as second college
minor project.

Why I chose Git

I have been using Git for about two years now, and it has become an
indispensable daily-use tool for me. Getting a chance to participate in GSoC
for the first time under Git is very exciting. It will give me an opportunity
to intimately know the system and a chance to help in making it better and more
powerful.

Proposal

Ideas Page: Git Beginner [10]

The following tasks summarize the project:

Implement a wrapper around Git

A wrapper is to be implemented around (currently called 'ggit'), which will
provide the following user interface:
`ggit  `
For example, `ggit add --all`
The wrapper will assess the arguments passed to it, and if they are detected to
be safe, it will simply pass them through to 'git'. This approach is favorable
as the existing users of git will not be affected by the wrapper.

Warning for potentially destructive commands

For every command that is entered, the wrapper will assess the subcommand and
its options. In that, it will first check if the subcommand (eg. add,
commit, rebase) is present in a list of predefined 'potentially destructive'
commands. This can be done by searching through a radix tree for the subcommand.
If found, then the arguments to the subcommand will be checked for specific
flags. The graylisted flags for the destructive commands will be stored as an
array of regular expressions, and the current command's arguments will be
checked against them. If matches are found, a warning is displayed. 'ggit'
for the warning would be
"You are about to do X, which will permanently destroy Y. Are you sure you wish
to continue? [Y/n] "
If the user enters Y[es], the command will be executed as is (by passing it
unaltered to git). In the case of Y[es], 'ggit' will also give tips for undoing
the changes made by this command (by referring the user to correct commands and
reflog),  if the command can be undone. In case the command cannot be undone,
'ggit' will display an additional line in the warning like
"The changes made by this command cannot be undone. Please proceed cautiously".
In the case of n[o], 'ggit' will exit without executing the command.

Currently, the list consists of commands like:

$ git rebase
$ git reset --hard
$ git clean -f
$ git gc --prune=now --aggressive
$ git push -f 
$ git push remote [+/:]
$ git branch -D

The list will be updated after some more discussion on the list.

Usage tips and documentation

The wrapper will also be responsible for showing a short description of every
command that is entered through 'ggit'. This shall be done for every command
unconditionally. The description will be derived from the actual documentation,
but  will primarily aim to help the beginner understand the Git workflow and the
Git model.

A few examples to illustrate the working of the wrapper are:

$ ggit add --all
Staging all changes and untracked files. Use ` [g]git commit` to commit the
changes.

$ ggit commit -m “Second commit”
Committing staged changes…
[master 0be3142] Second commit
 4 files changed, 6 insert

Re: [GSOC/RFC] GSoC Proposal Draft | Git Beginner

2016-03-22 Thread Sidhant Sharma

On Tuesday 22 March 2016 02:08 PM, Lars Schneider wrote:
> On 21 Mar 2016, at 11:19, Sidhant Sharma <tigerkid...@gmail.com> wrote:
>
>> Hi,
>> I updated the draft with links, ggit usage examples and some changes to the
>> timeline. I placed the links with reference here, but in the Google Doc, 
>> they're
>> inline.
>>
>> Thanks and regards,
>> Sidhant Sharma
>>
>> ---
>>
>> Implement a beginner mode for Git.
>>
>> Abstract
>>
>> Git is a very powerful version control system, with an array of features
>> that lend the user with great capabilities. But it often so happens that some
>> beginners are overwhelmed by its complexity and are unable to fully 
>> understand
>> and thus, utilize Git. Moreover, often beginners do not fully understand
>> the command they are using and end up making destructive (and occasionally,
>> irreversible) changes to the repository.
>>
>> The beginner mode will assist such  users in using Git by warning them
>> before making possibly destructive changes. It will also display tips and
>> short snippets of documentation for better understanding the Git model.
>>
>> Google summer of code Idea suggested here:
>> http://git.github.io/SoC-2016-Ideas/#git-beginner
>>
>> About Me
>>
>> Name : Sidhant Sharma
>> Email [1] : Sidhant.Sharma1208  gmail.com
>> Email [2] : Tigerkid001   gmail.com
>> College : Delhi Technological University
>> Studying : Software Engineering
>> IRC : tk001 (or _tk_)
>> Phone : 91-9990-606-081
>> Country : India
>> Interests : Computers, Books, Photography
>> Github : Tigerkid001
>> LinkedIn : https://in.linkedin.com/in/sidhantsharma12
>>
>> Technical Experience
>>
>> Authored several Mozilla Firefox and Google Chrome extensions:
>> Firefox: Owl [1], Blink [2], Spoiler Jedi [3]
>> Chrome: Blink [4]
>>
>> Developed a robust Plugin framework for Android [5] for a startup.
>> Learning Linux kernel programming via the Eudyptula Challenge [6]
>> (currently level 6).
>> Developed natural language processor for sarcasm detection [7] in tweets.
>> Developed hand gesture detection module [8] as a college minor project.
>> Active Firefox Add-ons Editor at AMO [9].
>> Currently working on a restaurant image classification project as second 
>> college
>> minor project.
>>
>> Why I chose Git
>>
>> I have been using Git for about two years now, and it has become an
>> indispensable daily-use tool for me. Getting a chance to participate in GSoC
>> for the first time under Git is very exciting. It will give me an opportunity
>> to intimately know the system and a chance to help in making it better and 
>> more
>> powerful.
>>
>> Proposal
>>
>> Ideas Page: Git Beginner [10]
>>
>> The following tasks summarize the project:
>>
>> Implement a wrapper around Git
>>
>> A wrapper is to be implemented around (currently called 'ggit'), which will
>> provide the following user interface:
>> `ggit  `
>> For example, `ggit add --all`
>> The wrapper will assess the arguments passed to it, and if they are detected 
>> to
>> be safe, it will simply pass them through to 'git'. This approach is 
>> favorable as the existing
>> users of git will not be affected by the wrapper.
>>
>> Warning for potentially destructive commands
>>
>> For every command that is entered, the wrapper will assess the subcommand and
>> its options. In that, it will first check if the subcommand (eg. add,
>> commit, rebase) is present in a list of predefined 'potentially destructive'
>> commands. This can be done by searching through a radix tree for the 
>> subcommand.
>> If found, then the arguments to the subcommand will be checked for specific
>> flags. The graylisted flags for the destructive commands will be stored as an
>> array of regular expressions, and the current command's arguments will be
>> checked against them. If matches are found, a warning is displayed. 'ggit'
>> for the warning would be
>> "You are about to do X, which will permanently destroy Y. Are you sure you 
>> wish
>> to continue? [Y/n] "
>> If the user enters Y[es], the command will be executed as is (by passing it
>> unaltered to git). In the case of Y[es], 'ggit' will also give tips for 
>> undoing
>> the changes made by this command (by referring the user to correct commands 
>> and
>> reflog),  if the command can be undone. In case the command cannot be undone,
>> 'ggit'

Re: [GSOC/RFC] GSoC Proposal Draft | Git Beginner

2016-03-22 Thread Lars Schneider

On 21 Mar 2016, at 11:19, Sidhant Sharma <tigerkid...@gmail.com> wrote:

> Hi,
> I updated the draft with links, ggit usage examples and some changes to the
> timeline. I placed the links with reference here, but in the Google Doc, 
> they're
> inline.
> 
> Thanks and regards,
> Sidhant Sharma
> 
> ---
> 
> Implement a beginner mode for Git.
> 
> Abstract
> 
> Git is a very powerful version control system, with an array of features
> that lend the user with great capabilities. But it often so happens that some
> beginners are overwhelmed by its complexity and are unable to fully understand
> and thus, utilize Git. Moreover, often beginners do not fully understand
> the command they are using and end up making destructive (and occasionally,
> irreversible) changes to the repository.
> 
> The beginner mode will assist such  users in using Git by warning them
> before making possibly destructive changes. It will also display tips and
> short snippets of documentation for better understanding the Git model.
> 
> Google summer of code Idea suggested here:
> http://git.github.io/SoC-2016-Ideas/#git-beginner
> 
> About Me
> 
> Name : Sidhant Sharma
> Email [1] : Sidhant.Sharma1208  gmail.com
> Email [2] : Tigerkid001   gmail.com
> College : Delhi Technological University
> Studying : Software Engineering
> IRC : tk001 (or _tk_)
> Phone : 91-9990-606-081
> Country : India
> Interests : Computers, Books, Photography
> Github : Tigerkid001
> LinkedIn : https://in.linkedin.com/in/sidhantsharma12
> 
> Technical Experience
> 
> Authored several Mozilla Firefox and Google Chrome extensions:
> Firefox: Owl [1], Blink [2], Spoiler Jedi [3]
> Chrome: Blink [4]
> 
> Developed a robust Plugin framework for Android [5] for a startup.
> Learning Linux kernel programming via the Eudyptula Challenge [6]
> (currently level 6).
> Developed natural language processor for sarcasm detection [7] in tweets.
> Developed hand gesture detection module [8] as a college minor project.
> Active Firefox Add-ons Editor at AMO [9].
> Currently working on a restaurant image classification project as second 
> college
> minor project.
> 
> Why I chose Git
> 
> I have been using Git for about two years now, and it has become an
> indispensable daily-use tool for me. Getting a chance to participate in GSoC
> for the first time under Git is very exciting. It will give me an opportunity
> to intimately know the system and a chance to help in making it better and 
> more
> powerful.
> 
> Proposal
> 
> Ideas Page: Git Beginner [10]
> 
> The following tasks summarize the project:
> 
> Implement a wrapper around Git
> 
> A wrapper is to be implemented around (currently called 'ggit'), which will
> provide the following user interface:
> `ggit  `
> For example, `ggit add --all`
> The wrapper will assess the arguments passed to it, and if they are detected 
> to
> be safe, it will simply pass them through to 'git'. This approach is 
> favorable as the existing
> users of git will not be affected by the wrapper.
> 
> Warning for potentially destructive commands
> 
> For every command that is entered, the wrapper will assess the subcommand and
> its options. In that, it will first check if the subcommand (eg. add,
> commit, rebase) is present in a list of predefined 'potentially destructive'
> commands. This can be done by searching through a radix tree for the 
> subcommand.
> If found, then the arguments to the subcommand will be checked for specific
> flags. The graylisted flags for the destructive commands will be stored as an
> array of regular expressions, and the current command's arguments will be
> checked against them. If matches are found, a warning is displayed. 'ggit'
> for the warning would be
> "You are about to do X, which will permanently destroy Y. Are you sure you 
> wish
> to continue? [Y/n] "
> If the user enters Y[es], the command will be executed as is (by passing it
> unaltered to git). In the case of Y[es], 'ggit' will also give tips for 
> undoing
> the changes made by this command (by referring the user to correct commands 
> and
> reflog),  if the command can be undone. In case the command cannot be undone,
> 'ggit' will display an additional line in the warning like
> "The changes made by this command cannot be undone. Please proceed 
> cautiously".
> In the case of n[o], 'ggit' will exit without executing the command.
> 
> Currently, the list consists of commands like:
> 
> $ git rebase
> $ git reset --hard
> $ git clean -f
> $ git gc --prune=now --aggressive
> $ git push -f 
> $ git push remote [+/:]
> $ git branch -D
> 
> The list

Re: [GSOC/RFC] GSoC Proposal Draft | Git Beginner

2016-03-21 Thread Sidhant Sharma
Hi,
I updated the draft with links, ggit usage examples and some changes to the
timeline. I placed the links with reference here, but in the Google Doc, they're
inline.

Thanks and regards,
Sidhant Sharma

---

Implement a beginner mode for Git.

Abstract

Git is a very powerful version control system, with an array of features
that lend the user with great capabilities. But it often so happens that some
beginners are overwhelmed by its complexity and are unable to fully understand
and thus, utilize Git. Moreover, often beginners do not fully understand
the command they are using and end up making destructive (and occasionally,
irreversible) changes to the repository.

The beginner mode will assist such  users in using Git by warning them
before making possibly destructive changes. It will also display tips and
short snippets of documentation for better understanding the Git model.

Google summer of code Idea suggested here:
http://git.github.io/SoC-2016-Ideas/#git-beginner

About Me

Name : Sidhant Sharma
Email [1] : Sidhant.Sharma1208  gmail.com
Email [2] : Tigerkid001   gmail.com
College : Delhi Technological University
Studying : Software Engineering
IRC : tk001 (or _tk_)
Phone : 91-9990-606-081
Country : India
Interests : Computers, Books, Photography
Github : Tigerkid001
LinkedIn : https://in.linkedin.com/in/sidhantsharma12

Technical Experience

Authored several Mozilla Firefox and Google Chrome extensions:
Firefox: Owl [1], Blink [2], Spoiler Jedi [3]
Chrome: Blink [4]

Developed a robust Plugin framework for Android [5] for a startup.
Learning Linux kernel programming via the Eudyptula Challenge [6]
(currently level 6).
Developed natural language processor for sarcasm detection [7] in tweets.
Developed hand gesture detection module [8] as a college minor project.
Active Firefox Add-ons Editor at AMO [9].
Currently working on a restaurant image classification project as second college
minor project.

Why I chose Git

I have been using Git for about two years now, and it has become an
indispensable daily-use tool for me. Getting a chance to participate in GSoC
for the first time under Git is very exciting. It will give me an opportunity
to intimately know the system and a chance to help in making it better and more
powerful.

Proposal

Ideas Page: Git Beginner [10]

The following tasks summarize the project:

Implement a wrapper around Git

A wrapper is to be implemented around (currently called 'ggit'), which will
provide the following user interface:
`ggit  `
For example, `ggit add --all`
The wrapper will assess the arguments passed to it, and if they are detected to
be safe, it will simply pass them through to 'git'. This approach is favorable 
as the existing
users of git will not be affected by the wrapper.

Warning for potentially destructive commands

For every command that is entered, the wrapper will assess the subcommand and
its options. In that, it will first check if the subcommand (eg. add,
commit, rebase) is present in a list of predefined 'potentially destructive'
commands. This can be done by searching through a radix tree for the subcommand.
If found, then the arguments to the subcommand will be checked for specific
flags. The graylisted flags for the destructive commands will be stored as an
array of regular expressions, and the current command's arguments will be
checked against them. If matches are found, a warning is displayed. 'ggit'
for the warning would be
"You are about to do X, which will permanently destroy Y. Are you sure you wish
to continue? [Y/n] "
If the user enters Y[es], the command will be executed as is (by passing it
unaltered to git). In the case of Y[es], 'ggit' will also give tips for undoing
the changes made by this command (by referring the user to correct commands and
reflog),  if the command can be undone. In case the command cannot be undone,
'ggit' will display an additional line in the warning like
"The changes made by this command cannot be undone. Please proceed cautiously".
In the case of n[o], 'ggit' will exit without executing the command.

Currently, the list consists of commands like:

$ git rebase
$ git reset --hard
$ git clean -f
$ git gc --prune=now --aggressive
$ git push -f 
$ git push remote [+/:]
$ git branch -D

The list will be updated after some more discussion on the list.

Usage tips and documentation

The wrapper will also be responsible for showing a short description of every
command that is entered through 'ggit'. This shall be done for every command
unconditionally. The description will be derived from the actual documentation,
but  will primarily aim to help the beginner understand the Git workflow and the
Git model.

A few examples to illustrate the working of the wrapper are:

$ ggit add --all
Staging all changes and untracked files. Use ` [g]git commit` to commit the 
changes.

$ ggit commit -m “Second commit”
Committing staged changes…
[master 0be3142] Second commit
 4 files changed, 6 insertions(+), 2 

Re: [GSOC/RFC] GSoC Proposal Draft | Git Beginner

2016-03-21 Thread Sidhant Sharma


On Monday 21 March 2016 01:59 PM, Matthieu Moy wrote:
> Sidhant Sharma  writes:
>
>> On Monday 21 March 2016 12:22 AM, Matthieu Moy wrote:
>>
>>> Note that it implies writting an almost full-blown option parser to
>>> recognize commands like
>>>
>>> ggit --work-tree git --namespace reset --git-dir --hard git log
>>>
>>> (just looking for "git", "reset" and "--hard" in the command-line would
>>> not work here).
>> Could you please elaborate on the above command, I'm unable to
>> understand its syntax. I thought all git commands follow the
>> `git command ` syntax, so using simple string
>> manipulations and regexes would work. Am I missing something?
> The full syntax is
>
> git [global options]  [options and arguments for a command]
>
> For example:
>
> git -p log => -p is the option for "git" itself, which means "paginate"
> git log -p => -p is the option for "git log", which means "patch"
>
> Options can have stuck or non-stuck form, for example
>
> git --work-tree=foo <=> git --work-tree foo
>
> git --work-tree git --namespace reset --git-dir --hard git log
> <=>
> git --work-tree=git --namespace=reset --git-dir=--hard git log
>
> (This is probably a stupid command to type, but it's legal)
>
> The later is source of issues for a parser since you can't just iterate
> through argv[] and search for problematic commands/options, since you
> have to distinguish options themselves (--work-tree above) and option
> arguments (foo above).
Thanks for the explanation; I knew of the global options but didn't know
that the last command would be syntactically legal. For commands like such
iterating over argv[] wouldn't work (not in all cases). Though a beginner
may not enter commands of this sort, I agree we shouldn't rely on  that. If
it were only for stuck commands, regexes would've worked.
I can now see why a parser would be needed here, which can recognize global
options and the above command syntax. But for this example,
> In my example above, I played with global options (before "git" in the
> command-line), but I could also have done that with per-command options
> taking arguments, like
>
> git push --repo --force
>
> Here, --force is the name of the repo (again, probably a stupid name,
> but why not), not the --force option.
would the parser also be required to understand all options and arguments for
all git commands? Although --force could not be a branch name (git denies it),
but it may not be so for other commands.
>> I wasn't sure if we are allowed to code before the actual coding period 
>> begins
>> so I kept it that way. I'll update it now.
> You're not "forced" to, but you can write code whenever you like. We've
> already seen code written before the application!
>
Nice! I too would like to get started early :)
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [GSOC/RFC] GSoC Proposal Draft | Git Beginner

2016-03-21 Thread Matthieu Moy
Sidhant Sharma  writes:

> On Monday 21 March 2016 12:22 AM, Matthieu Moy wrote:
>
>> Note that it implies writting an almost full-blown option parser to
>> recognize commands like
>>
>> ggit --work-tree git --namespace reset --git-dir --hard git log
>>
>> (just looking for "git", "reset" and "--hard" in the command-line would
>> not work here).
>
> Could you please elaborate on the above command, I'm unable to
> understand its syntax. I thought all git commands follow the
> `git command ` syntax, so using simple string
> manipulations and regexes would work. Am I missing something?

The full syntax is

git [global options]  [options and arguments for a command]

For example:

git -p log => -p is the option for "git" itself, which means "paginate"
git log -p => -p is the option for "git log", which means "patch"

Options can have stuck or non-stuck form, for example

git --work-tree=foo <=> git --work-tree foo

git --work-tree git --namespace reset --git-dir --hard git log
<=>
git --work-tree=git --namespace=reset --git-dir=--hard git log

(This is probably a stupid command to type, but it's legal)

The later is source of issues for a parser since you can't just iterate
through argv[] and search for problematic commands/options, since you
have to distinguish options themselves (--work-tree above) and option
arguments (foo above).

In my example above, I played with global options (before "git" in the
command-line), but I could also have done that with per-command options
taking arguments, like

git push --repo --force

Here, --force is the name of the repo (again, probably a stupid name,
but why not), not the --force option.

> I wasn't sure if we are allowed to code before the actual coding period begins
> so I kept it that way. I'll update it now.

You're not "forced" to, but you can write code whenever you like. We've
already seen code written before the application!

-- 
Matthieu Moy
http://www-verimag.imag.fr/~moy/
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [GSOC/RFC] GSoC Proposal Draft | Git Beginner

2016-03-21 Thread Sidhant Sharma

On Monday 21 March 2016 12:22 AM, Matthieu Moy wrote:
> Sidhant Sharma <tigerkid...@gmail.com> writes:
>
>> A wrapper is to be implemented around (currently called 'ggit'), which will
>> provide the following user interface:
>> `ggit  `
> There's actually already a tool doing this:
>
>   https://people.gnome.org/~newren/eg/
>
> I'm Cc-ing the author.
>
> I heard good feedback about the tool in the early days of Git, when git
> itself was rather clearly not ready for bare mortals. The tool seems
> abandonned since 2013 (last release), my guess is that git became usable
> enough and eg is not needed as much as it was. For example, eg defaulted
> to push.default=tracking before we did the change to push.default=simple
> in git.
Nice! I'll take a look at its source and see how it works.
>
> I think the "wrapper" approach is sound. It avoids touching git itself
> and breaking things that depend on git (for example, adding
> core.denyHardReset to let "git reset --hard" error out would be
> unacceptable because it would mean that any script using "git reset
> --hard" would break when a user has the option set in ~/.gitconfig).
>
> Note that it implies writting an almost full-blown option parser to
> recognize commands like
>
> ggit --work-tree git --namespace reset --git-dir --hard git log
>
> (just looking for "git", "reset" and "--hard" in the command-line would
> not work here).

Could you please elaborate on the above command, I'm unable to
understand its syntax. I thought all git commands follow the
`git command ` syntax, so using simple string
manipulations and regexes would work. Am I missing something?

>> The wrapper will assess the arguments passed to it, and if they are detected 
>> to
>> be safe, it will simply pass them through to 'git'.
>>
>> Warning for potentially destructive commands
>>
>> For every command that is entered, the wrapper will assess the subcommand and
>> its options. In that, it will first check if the subcommand (eg. add,
>> commit, rebase) is present in a list of predefined 'potentially destructive'
>> commands. This can be done by searching through a radix tree for the 
>> subcommand.
>> If found, then the arguments to the subcommand will be checked for specific
>> flags. The graylisted flags for the destructive commands will be stored as an
>> array of regular expressions, and the current command's arguments will be
>> checked against them. If matches are found, a warning is displayed. 'ggit'
>> for the warning would be
>> "You are about to do X, which will permanently destroy Y. Are you sure you 
>> wish
>> to continue? [Y/n] "
>> If the user enters Y[es], the command will be executed as is (by passing it
>> unaltered to git). In the case of Y[es], 'ggit' will also give tips for 
>> undoing
>> the changes made by this command (by referring the user to correct commands 
>> and
>> reflog),  if the command can be undone. In case the command cannot be undone,
>> 'ggit' will display an additional line in the warning like
>> "The changes made by this command cannot be undone. Please proceed 
>> cautiously".
>> In the case of n[o], 'ggit' will exit without executing the command.
>> Usage tips and documentation
>>
>> The wrapper will also be responsible for showing a short description of every
>> command that is entered through 'ggit'. This shall be done for every command
>> unconditionally.
> I'm not 100% convinced that this is a good idea: it'd be tempting for
> the user to run a command just to know what it does. Perhaps it's better
> to let the user run "git  -h" instead. But it could indeed help
> for commands doing very different things depending on the options, like
>
> $ git checkout foo
> Checks-out branch foo
> $ git checkout -b bar
> Creating a new branch bar and checking it out
> $ git checkout HEAD -- .
> Reverting directory . to its last commited state

Yes, I did consider that and came up with this: I thought we can
have an option like --intro or --doc that will just print the
intro snippet for the command without actually running. Though
"git  -h" is an option, I wasn't inclined towards it as I
think sometimes the output from -h may not make sense to a new user.
Plus, -h only gives an elaborate list of syntax and options/arguments
but not say what the command does.

> ...
>
> (I think a list of examples would be an important addition to your
> proposal to clarify the plans)

Will do that.

>> The description will be derived from the actual documentation, but
>> will primarily aim to help the begi

Re: [GSOC/RFC] GSoC Proposal Draft | Git Beginner

2016-03-20 Thread Matthieu Moy
Sidhant Sharma <tigerkid...@gmail.com> writes:

> Implement a beginner mode for Git.
> 
> Abstract
> 
> Git is a very powerful version control system, with an array of features
> that lend the user with great capabilities. But it often so happens that some
> beginners are overwhelmed by its complexity and are unable to fully understand
> and thus, utilize Git. Moreover, often beginners do not fully understand
> the command they are using and end up making destructive (and occasionally,
> irreversible) changes to the repository.
> 
> The beginner mode will assist such  users in using Git by warning them
> before making possibly destructive changes. It will also display tips and
> short snippets of documentation for better understanding the Git model.
[...]

(Google summer of code Idea suggested here:
http://git.github.io/SoC-2016-Ideas/#git-beginner )

> A wrapper is to be implemented around (currently called 'ggit'), which will
> provide the following user interface:
> `ggit  `

There's actually already a tool doing this:

  https://people.gnome.org/~newren/eg/

I'm Cc-ing the author.

I heard good feedback about the tool in the early days of Git, when git
itself was rather clearly not ready for bare mortals. The tool seems
abandonned since 2013 (last release), my guess is that git became usable
enough and eg is not needed as much as it was. For example, eg defaulted
to push.default=tracking before we did the change to push.default=simple
in git.

I think the "wrapper" approach is sound. It avoids touching git itself
and breaking things that depend on git (for example, adding
core.denyHardReset to let "git reset --hard" error out would be
unacceptable because it would mean that any script using "git reset
--hard" would break when a user has the option set in ~/.gitconfig).

Note that it implies writting an almost full-blown option parser to
recognize commands like

ggit --work-tree git --namespace reset --git-dir --hard git log

(just looking for "git", "reset" and "--hard" in the command-line would
not work here).

Another option would be to have a C implementation of ggit that would
reuse the whole git source code, but set a flag "beginner_mode" to true
before starting, and then introduce "if (beginner_mode)" within Git's
source code. I think the wrapper approach is better since it avoids
"polluting" Git's source code itself.

> The wrapper will assess the arguments passed to it, and if they are detected 
> to
> be safe, it will simply pass them through to 'git'.
>
> Warning for potentially destructive commands
>
> For every command that is entered, the wrapper will assess the subcommand and
> its options. In that, it will first check if the subcommand (eg. add,
> commit, rebase) is present in a list of predefined 'potentially destructive'
> commands. This can be done by searching through a radix tree for the 
> subcommand.
> If found, then the arguments to the subcommand will be checked for specific
> flags. The graylisted flags for the destructive commands will be stored as an
> array of regular expressions, and the current command's arguments will be
> checked against them. If matches are found, a warning is displayed. 'ggit'
> for the warning would be
> "You are about to do X, which will permanently destroy Y. Are you sure you 
> wish
> to continue? [Y/n] "
> If the user enters Y[es], the command will be executed as is (by passing it
> unaltered to git). In the case of Y[es], 'ggit' will also give tips for 
> undoing
> the changes made by this command (by referring the user to correct commands 
> and
> reflog),  if the command can be undone. In case the command cannot be undone,
> 'ggit' will display an additional line in the warning like
> "The changes made by this command cannot be undone. Please proceed 
> cautiously".
> In the case of n[o], 'ggit' will exit without executing the command.
> Usage tips and documentation
>
> The wrapper will also be responsible for showing a short description of every
> command that is entered through 'ggit'. This shall be done for every command
> unconditionally.

I'm not 100% convinced that this is a good idea: it'd be tempting for
the user to run a command just to know what it does. Perhaps it's better
to let the user run "git  -h" instead. But it could indeed help
for commands doing very different things depending on the options, like

$ git checkout foo
Checks-out branch foo
$ git checkout -b bar
Creating a new branch bar and checking it out
$ git checkout HEAD -- .
Reverting directory . to its last commited state

...

(I think a list of examples would be an important addition to your
proposal to clarify the plans)

> The description will be derived from the actual document

[GSOC/RFC] GSoC Proposal Draft | Git Beginner

2016-03-20 Thread Sidhant Sharma
Hi,

I have drafted my proposal for the project 'Git Beginner', and would
like to request your suggestions on improving it. I'm also reading up the Git 
documentation and the Git ProBook (again) to make notes for the beginner
documentation. Would be great to hear your comments on it.

Thanks and regards,
Sidhant Sharma

---

Implement a beginner mode for Git.

Abstract

Git is a very powerful version control system, with an array of features
that lend the user with great capabilities. But it often so happens that some
beginners are overwhelmed by its complexity and are unable to fully understand
and thus, utilize Git. Moreover, often beginners do not fully understand
the command they are using and end up making destructive (and occasionally,
irreversible) changes to the repository.

The beginner mode will assist such  users in using Git by warning them
before making possibly destructive changes. It will also display tips and
short snippets of documentation for better understanding the Git model.

About Me

Name : Sidhant Sharma
Email [1] : Sidhant.Sharma1208  gmail.com
Email [2] : Tigerkid001   gmail.com
College : Delhi Technological University
Studying : Software Engineering
IRC : tk001 (or _tk_)
Phone : 91-9990-606-081
Country : India
Interests : Computers, Books, Photography
Github : Tigerkid001
LinkedIn : https://in.linkedin.com/in/sidhantsharma12

Technical Experience

Authored several Mozilla Firefox and Google Chrome extensions.
Developed a robust Plugin framework for Android for a startup. Learning Linux
kernel programming via the Eudyptula Challenge.
Developed natural language processor for sarcasm detection in tweets.
Developed gesture detection module as a college minor project.
Active Firefox Add-ons Editor at AMO (addons  mozilla  org).
Currently working on a restaurant image classification project as second college
minor project.



Why I chose Git

I have been using Git for about two years now, and it has become an
indispensable daily-use tool for me. Getting a chance to participate in GSoC
for the first time under Git is very exciting. It will give me an opportunity
to intimately know the system and a chance to help in making it better and more
powerful.

Proposal

Ideas Page: Git Beginner

The following tasks summarize the project:

Implement a wrapper around Git

A wrapper is to be implemented around (currently called 'ggit'), which will
provide the following user interface:
`ggit  `
For example, `ggit add --all`
The wrapper will assess the arguments passed to it, and if they are detected to
be safe, it will simply pass them through to 'git'.

Warning for potentially destructive commands

For every command that is entered, the wrapper will assess the subcommand and
its options. In that, it will first check if the subcommand (eg. add,
commit, rebase) is present in a list of predefined 'potentially destructive'
commands. This can be done by searching through a radix tree for the subcommand.
If found, then the arguments to the subcommand will be checked for specific
flags. The graylisted flags for the destructive commands will be stored as an
array of regular expressions, and the current command's arguments will be
checked against them. If matches are found, a warning is displayed. 'ggit'
for the warning would be
"You are about to do X, which will permanently destroy Y. Are you sure you wish
to continue? [Y/n] "
If the user enters Y[es], the command will be executed as is (by passing it
unaltered to git). In the case of Y[es], 'ggit' will also give tips for undoing
the changes made by this command (by referring the user to correct commands and
reflog),  if the command can be undone. In case the command cannot be undone,
'ggit' will display an additional line in the warning like
"The changes made by this command cannot be undone. Please proceed cautiously".
In the case of n[o], 'ggit' will exit without executing the command.
Usage tips and documentation

The wrapper will also be responsible for showing a short description of every
command that is entered through 'ggit'. This shall be done for every command
unconditionally. The description will be derived from the actual documentation,
but  will primarily aim to help the beginner understand the Git workflow and the
Git model.

Timeline

Community Bonding Period

Week 1 : Discuss the flow of course with the mentor. Discuss adequate data
structures and search techniques to be used.

Week 2-3 : Discuss over an extensive list of commands that should be classified
as destructive. Discuss appropriate short descriptions for commands.

Week 4 : Discuss code structure, tests, optimization for least overhead and
other details.

Coding Starts

Week 1-2 : Submit code for a basic wrapper that will warn for a subset of the
potentially destructive command, and continue if the command is safe.
and this is stored as per to provide backward compatibility.

Week 3-6 : Extend the wrapper to warn for all commands in the list, along with

Re: "Medium" log format: change proposal for author != committer

2015-09-16 Thread Jacob Keller
On Tue, Sep 15, 2015 at 6:52 PM, Junio C Hamano  wrote:
>
>  * Enhance the "--pretty=format:" thing so that the current set of
>hardcoded --pretty=medium,short,... formats and your modified
>"medium" can be expressed as a custom format string.
>
>  * Introduce a configuration mechanism to allow users to define new
>short-hand, e.g. if you have this in your $HOME/.gitconfig:
>
> [pretty "robin"]
> format = "commit %H%nAuthor: %an <%ae>%n..."
>

Afiak there is already support for this.. from "git help config":

pretty.
Alias for a --pretty= format string, as specified in git-log(1). Any
aliases defined here can be used just as the built-in pretty formats
could. For example, running git config pretty.changelog "format:* %H
%s" would cause the invocation git log --pretty=changelog to be
equivalent to running git log "--pretty=format:* %H %s". Note that an
alias with the same name as a built-in format will be silently
ignored.

>and run "git log --pretty=robin", it would behave as if you said
>"git log --pretty="format:commit %H%nAuthor: %an <%ae>%n...".
>

So this should already be supported... but to support "robinsformat"
we'd need to be able to "show committer only if different from
author"... Not sure how that would work.

>  * (optional) Replace the hardcoded implementations of pretty
>formats with short-hand names like "medium", "short", etc. with a
>built-in set of pretty.$name.format using the configuration
>mechanism.  But we need to make sure this does not hurt
>performance for common cases.
>

This part obviously hasn't been done, I don't know if any particular
format is not expressable today by the pretty syntax or not..

But at least configuration does work. I use it as part of displaying the

Fixes:  ("name")

used by the upstream kernel for marking bug fixes of known commits.

Thus the only real thing would be implementing a % modifier which
allows showing commiter if it's not the same as author. (or vise
versa) Ideally we could take work from the ref-filter library and the
proposed "%if" stuff but I don't tihnk this was actually implemented
yet, and I don't know if that would even work in the pretty modifiers.

Regards,
Jake
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: "Medium" log format: change proposal for author != committer

2015-09-15 Thread Junio C Hamano
"Robin H. Johnson"  writes:

> Specifically, if the author is NOT the same as the committer, then
> display both in the header. Otherwise continue to display only the
> author.

I too found myself wanting to see both of the names sometimes, and
the "fuller" format was added explicitly for that purpose.

Even though I agree "show only one, and both only when they are
different" is a reasonable and possibly useful format, it is out of
question to change what "--pretty=medium" does.  It has been with us
forever and people and their scripts do rely on it.

It would be good if we can say

$ git log --pretty=robinsformat

but with a better name to show such an output.


Having said that, I'm moderately negative about adding it as yet
another hard-coded format.  We simply have too many, and we do not
need one more.  What we need instead is a flexible framework to let
users get what they want.

I think what needs to happen is:

 * Enhance the "--pretty=format:" thing so that the current set of
   hardcoded --pretty=medium,short,... formats and your modified
   "medium" can be expressed as a custom format string.

 * Introduce a configuration mechanism to allow users to define new
   short-hand, e.g. if you have this in your $HOME/.gitconfig:

[pretty "robin"]
format = "commit %H%nAuthor: %an <%ae>%n..."

   and run "git log --pretty=robin", it would behave as if you said
   "git log --pretty="format:commit %H%nAuthor: %an <%ae>%n...".

 * (optional) Replace the hardcoded implementations of pretty
   formats with short-hand names like "medium", "short", etc. with a
   built-in set of pretty.$name.format using the configuration
   mechanism.  But we need to make sure this does not hurt
   performance for common cases.

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


"Medium" log format: change proposal for author != committer

2015-09-15 Thread Robin H. Johnson
Hi,

I want to propose a change to the 'medium' log output format, to improve
readability.

Specifically, if the author is NOT the same as the committer, then
display both in the header. Otherwise continue to display only the
author.

This would aid quick review of changes in git-log & git-show output.

-- 
Robin Hugh Johnson
Gentoo Linux: Developer, Infrastructure Lead
E-Mail : robb...@gentoo.org
GnuPG FP   : 11ACBA4F 4778E3F6 E4EDF38E B27B944E 34884E85
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Proposal for git stash : add --staged option

2015-06-03 Thread edgar . hipp

Hi again,

just wanted to tell that I have created a solution by doing a few lines 
of scripting:


git-cstash
```
#/bin/sh

git commit -m 'temporary, will be stashed soon'
git stash --include-untracked
git reset HEAD^1
git stash
git stash pop stash@{1}
```

Le 2015-04-22 11:25, Johannes Schindelin a écrit :

Hi Edgar,

On 2015-04-22 10:30, edgar.h...@netapsys.fr wrote:


When you have a lot of unstaged files, and would like to test what
happens if you undo some of the changes that you think are unecessary,
you would rather keep a copy of those changes somewhere.

For example

Changed but not updated:
M config_test.xml
M config_real.xml

I have changed both config_test.xml and config_real.xml, but I think
the changes made in config_test.xml are unnecessary. However, I would
still like to keep them somewhere in case it breaks something.

In this case for example, I would like to be able to stash only the
file config_test.xml

Eg:

git add config_test.xml
git stash --staged

So that after this, my git looks like this:

Changed but not updated:
M config_real.xml

and my stash contains only the changes introduced in config_test.xml

`git stash --keep-index` doesn't give the necessary control, because
it will still stash everything (and create unnecessary merge
complications if I change the files and apply the stash)


I often have the same problem. How about doing this:

```sh
git add config_real.xml
git stash -k
git reset
```

The difference between our approaches is that I keep thinking of the
staging area as the place to put changes I want to *keep*, not that I
want to forget for a moment.

Having said that, I am sympathetic to your cause, although I would
rather have `git stash [--patch] -- [file...]` that would be used
like `git add -p` except that the selected changes are *not* staged,
but stashed instead.

Ciao,
Johannes


--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Please Acknowledge My Proposal!!

2015-05-23 Thread Gva Abogados



Please Acknowledge My Proposal!!

My name is Mr. Juan Martin Domingo a lawyer resident in Spain. I am
writing to let you know I have some FUNDS I want to transfer and am
seeking if you can be a beneficiary...Do not hesitate to Contact me for
more information if interested: gva.abogad...@aim.com).

Sincerely

Juan Martin Domingo.

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Please Acknowledge My Proposal!!

2015-05-22 Thread Gva Abogados



Please Acknowledge My Proposal!!

My name is Mr. Juan Martin Domingo a lawyer resident in Spain. I am
writing to let you know I have some FUNDS I want to transfer and am
seeking if you can be a beneficiary...Do not hesitate to Contact me for
more information if interested: gva.abogad...@aim.com).

Sincerely

Juan Martin Domingo.

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Proposal for git stash : add --staged option

2015-04-23 Thread edgar . hipp

Hi,


the


```sh
git add config_real.xml
git stash -k
git reset
```


is not very well suited because the -k option to keep the index. 
However, the index will still be put inside the stash.


So what you propose is equivalent to:

```sh
git stash
git stash apply stash@\{0\}
git checkout --config_test.xml
```

`git stash --patch` can do the job (and I think that's what I'm going to 
use from now), but it's still a bit cumbersome in some situations.


Best,

Edgar

Le 2015-04-22 11:25, Johannes Schindelin a écrit :

Hi Edgar,

On 2015-04-22 10:30, edgar.h...@netapsys.fr wrote:


When you have a lot of unstaged files, and would like to test what
happens if you undo some of the changes that you think are unecessary,
you would rather keep a copy of those changes somewhere.

For example

Changed but not updated:
M config_test.xml
M config_real.xml

I have changed both config_test.xml and config_real.xml, but I think
the changes made in config_test.xml are unnecessary. However, I would
still like to keep them somewhere in case it breaks something.

In this case for example, I would like to be able to stash only the
file config_test.xml

Eg:

git add config_test.xml
git stash --staged

So that after this, my git looks like this:

Changed but not updated:
M config_real.xml

and my stash contains only the changes introduced in config_test.xml

`git stash --keep-index` doesn't give the necessary control, because
it will still stash everything (and create unnecessary merge
complications if I change the files and apply the stash)


I often have the same problem. How about doing this:

```sh
git add config_real.xml
git stash -k
git reset
```

The difference between our approaches is that I keep thinking of the
staging area as the place to put changes I want to *keep*, not that I
want to forget for a moment.

Having said that, I am sympathetic to your cause, although I would
rather have `git stash [--patch] -- [file...]` that would be used
like `git add -p` except that the selected changes are *not* staged,
but stashed instead.

Ciao,
Johannes


--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Proposal for git stash : add --staged option

2015-04-22 Thread edgar . hipp

Hello,

There's some feature of git that I have been missing.
When you have a lot of unstaged files, and would like to test what 
happens if you undo some of the changes that you think are unecessary, 
you would rather keep a copy of those changes somewhere.


For example

Changed but not updated:
M config_test.xml
M config_real.xml

I have changed both config_test.xml and config_real.xml, but I think the 
changes made in config_test.xml are unnecessary. However, I would still 
like to keep them somewhere in case it breaks something.


In this case for example, I would like to be able to stash only the file 
config_test.xml


Eg:

git add config_test.xml
git stash --staged

So that after this, my git looks like this:

Changed but not updated:
M config_real.xml

and my stash contains only the changes introduced in config_test.xml

`git stash --keep-index` doesn't give the necessary control, because it 
will still stash everything (and create unnecessary merge complications 
if I change the files and apply the stash)


Best,

Edgar

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Proposal for git stash : add --staged option

2015-04-22 Thread Johannes Schindelin
Hi Edgar,

On 2015-04-22 10:30, edgar.h...@netapsys.fr wrote:

 When you have a lot of unstaged files, and would like to test what
 happens if you undo some of the changes that you think are unecessary,
 you would rather keep a copy of those changes somewhere.
 
 For example
 
 Changed but not updated:
 M config_test.xml
 M config_real.xml
 
 I have changed both config_test.xml and config_real.xml, but I think
 the changes made in config_test.xml are unnecessary. However, I would
 still like to keep them somewhere in case it breaks something.
 
 In this case for example, I would like to be able to stash only the
 file config_test.xml
 
 Eg:
 
 git add config_test.xml
 git stash --staged
 
 So that after this, my git looks like this:
 
 Changed but not updated:
 M config_real.xml
 
 and my stash contains only the changes introduced in config_test.xml
 
 `git stash --keep-index` doesn't give the necessary control, because
 it will still stash everything (and create unnecessary merge
 complications if I change the files and apply the stash)

I often have the same problem. How about doing this:

```sh
git add config_real.xml
git stash -k
git reset
```

The difference between our approaches is that I keep thinking of the staging 
area as the place to put changes I want to *keep*, not that I want to forget 
for a moment.

Having said that, I am sympathetic to your cause, although I would rather have 
`git stash [--patch] -- [file...]` that would be used like `git add -p` 
except that the selected changes are *not* staged, but stashed instead.

Ciao,
Johannes
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC/GSoC] Proposal Draft: Unifying git branch -l, git tag -l, and git for-each-ref

2015-03-28 Thread karthik nayak


On 03/26/2015 10:07 PM, Jeff King wrote:

On Mon, Mar 23, 2015 at 06:39:20PM +0530, karthik nayak wrote:

 All three commands select a subset of the repository’s refs and print the
 result. There has been an attempt to unify these commands by Jeff King[3]. I
 plan on continuing his work[4] and using his approach to tackle this
 project.

I would be cautious about the work in my for-each-ref-contains-wip
branch. At one point it was reasonably solid, but it's now a year and a
half old, and I've been rebasing it without paying _too_ much attention
to correctness. I think some subtle bugs have been introduced as it has
been carried forward.

Also, the very first patch (factoring out the contains traversal) is
probably better served by this series:

   http://thread.gmane.org/gmane.comp.version-control.git/252472

I don't remember all of the issues offhand that need to be addressed in
it, but there were plenty of review comments.

Thanks for the link, will go through that.


 For extended selection behaviour such as ‘--contains’ or ‘--merged’ we could
 implement these within
 the library by providing functions which closely mimic the current methods
 used individually by ‘branch -l’ and ‘tag -l’. For eg to implement
 ‘--merged’ we implement a ‘compute_merge()’ function, which with the help of
 the revision API’s will be able to perform the same function as ‘branch -l
 --merged’.

One trick with making a library-like interface is that some of the
selection routines can work on a streaming list of refs (i.e., as we
see each ref we can say yes or no) and some must wait until the end
(e.g., --merged does a single big merge traversal). It's probably not
the end of the world to just always collect all the refs, then filter
them, then sort them, then print them. It may delay the start of output
in some minor cases, but I doubt that's a big deal (and anyway, the
packed-refs code will load them all into an array anyway, so collecting
them in a second array is probably not a big deal).

I think I noted this down while going through your implementation also.
You even mentioned this on the mailing list if I'm not wrong.
Will have to work out a design around this and think about it more.


 For formatting functionality provided by ‘for-each-ref’ we replicate the
 ‘show_ref’ function in ‘for-each-ref.c’ where the format is given to the
 function and the function uses the format to obtain atom values and prints
 the corresponding atom values to the screen. This feature would allow us to
 provide format functionality which could act as a base for the ‘-v’ option
 also.

Yeah, I'd really like to see --format for git branch, and have -v
just feed that a hard-coded format string (or even a configurable one).

 Although Jeff has built a really good base to build upon, I shall use
 his work as more of a reference and work on unification of the three
 commands from scratch.

Good. :)

-Peff


Thanks for the Review/Tips.

Regards
-Karthik
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC/GSoC] Proposal Draft: Unifying git branch -l, git tag -l, and git for-each-ref

2015-03-26 Thread Jeff King
On Mon, Mar 23, 2015 at 06:39:20PM +0530, karthik nayak wrote:

 All three commands select a subset of the repository’s refs and print the
 result. There has been an attempt to unify these commands by Jeff King[3]. I
 plan on continuing his work[4] and using his approach to tackle this
 project.

I would be cautious about the work in my for-each-ref-contains-wip
branch. At one point it was reasonably solid, but it's now a year and a
half old, and I've been rebasing it without paying _too_ much attention
to correctness. I think some subtle bugs have been introduced as it has
been carried forward.

Also, the very first patch (factoring out the contains traversal) is
probably better served by this series:

  http://thread.gmane.org/gmane.comp.version-control.git/252472

I don't remember all of the issues offhand that need to be addressed in
it, but there were plenty of review comments.

 For extended selection behaviour such as ‘--contains’ or ‘--merged’ we could
 implement these within
 the library by providing functions which closely mimic the current methods
 used individually by ‘branch -l’ and ‘tag -l’. For eg to implement
 ‘--merged’ we implement a ‘compute_merge()’ function, which with the help of
 the revision API’s will be able to perform the same function as ‘branch -l
 --merged’.

One trick with making a library-like interface is that some of the
selection routines can work on a streaming list of refs (i.e., as we
see each ref we can say yes or no) and some must wait until the end
(e.g., --merged does a single big merge traversal). It's probably not
the end of the world to just always collect all the refs, then filter
them, then sort them, then print them. It may delay the start of output
in some minor cases, but I doubt that's a big deal (and anyway, the
packed-refs code will load them all into an array anyway, so collecting
them in a second array is probably not a big deal).

 For formatting functionality provided by ‘for-each-ref’ we replicate the
 ‘show_ref’ function in ‘for-each-ref.c’ where the format is given to the
 function and the function uses the format to obtain atom values and prints
 the corresponding atom values to the screen. This feature would allow us to
 provide format functionality which could act as a base for the ‘-v’ option
 also.

Yeah, I'd really like to see --format for git branch, and have -v
just feed that a hard-coded format string (or even a configurable one).

 Although Jeff has built a really good base to build upon, I shall use
 his work as more of a reference and work on unification of the three
 commands from scratch.

Good. :)

-Peff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC/GSoC v2] Proposal: Make git-pull and git-am builtins

2015-03-25 Thread Paul Tan
Since the deadline is fast approaching, and I've read that
google-melange usually becomes overwhelmed near the deadline, I'll try
to iterate on the proposal as much as possible. Below is v2, mostly
small changes in response to Matthieu's and Junio's reviews.

The changes are as follows:

* Make it clear that zero spawning of processes is an ideal -- it
  doesn't have to be so in practice.

* Swap rewrite of git-pull and git-am in timeline. It is better to push
  the first patch to the mailing list as soon as possible.

* Make it clear that as part of refactoring, commonly recurring patterns
  can be codified and implemented in the internal git API.

* Add microproject v5.

* Make it clear that Windows is not the only one that has poor IO
  performance. Poor IO performance can stem from the choice of operating
  system, filesystem and underlying storage performance.

* Cite filesystem cache feature in git for windows.

* Remove section-numbering directive, github messes it up.

* State what the end-product of the final stage is.

The updated version is also in the gist[1].

[1] https://gist.github.com/pyokagan/1b7b0d1f4dab6ba3cef1

--8--


Make `git-pull` and `git-am` builtins


:Abstract: `git-pull` and `git-am` are frequently used git subcommands.
   However, they are porcelain commands and implemented as shell
   scripts, which has some limitations which can cause poor
   performance, especially in non-POSIX environments like Windows.
   I propose to rewrite these scripts into low level C code and make
   them builtins.  This will increase git's portability, and may
   improve the efficiency and performance of these commands.

.. section-numbering::

Limitations of shell scripts
=

`git-pull` is a commonly executed command to check for new changes in the
upstream repository and, if there are, fetch and integrate them into the
current branch. `git-am` is another commonly executed command for applying a
series of patches from a mailbox to the current branch. They are both git
porcelain commands -- they have no access to git's low level internal API.
Currently, they are implemented by the shell scripts ``git-pull.sh`` and
``git-am.sh`` respectively. These shell scripts require a fully-functioning
POSIX shell and utilities. As a result, these commands are difficult to port to
non-POSIX environments like Windows.

Since porcelain commands do not have access to git's internal API, performing
any git-related function, no matter how trivial, requires git to be spawned in
a separate process. This limitation leads to these git commands being
relatively inefficient, and can cause long run times on certain platforms that
do not have copy-on-write ``fork()`` semantics.

Spawning processes can be slow
---

Shell scripting, by itself, is severely limited in what it can do.  Performing
most operations in shell scripts require external executables to be called. For
example, ``git-pull.sh`` spawns the git executable not only to perform complex
git operations like `git-fetch` and `git-merge`, but it also spawns the git
executable for trivial tasks such as retrieving configuration values with
`git-config` and even quoting of command-line arguments with ``git rev-parse
--sq-quote``. As a result, these shell scripts usually end up spawning a lot of
processes.

Process spawning is usually implemented as a ``fork()`` followed by an
``exec()`` by shells. This can be slow on systems that do not support
copy-on-write semantics for ``fork()``, and thus needs to duplicate the memory
of the parent process for every ``fork()`` call -- an expensive process.

Furthermore, starting up processes on Windows is generally expensive as it
performs `several extra steps`_ such as such as using an inter-process call to
notify the Windows Client/Server Runtime Subsystem(CSRSS) about the process
creation and checking for App Compatibility requirements.

.. _`several extra steps`:
http://www.microsoft.com/mspress/books/sampchap/4354a.aspx

The official Windows port of git, Git for Windows, uses MSYS2 [#]_ to emulate
``fork()``. Since Windows does not support forking semantics natively, MSYS2
can only emulate ``fork()`` `without copy-on-write semantics`_. Coupled with
Windows heavy process creation, this causes huge slowdowns of git on Windows.

.. _`without copy-on-write semantics`:
https://www.cygwin.com/faq.html#faq.api.fork

A no-updates `git-pull`, for example, takes an average of 5.1s [#]_, as
compared to Linux which only takes an average of 0.08s. 5 seconds,
while seemingly short, would seem like an eternity to a user who just wants to
quickly fetch and merge changes from upstream.

`git-am`'s implementation reads each patch from the mailbox in a while loop,
spawning many processes for each patch. Considering the cost of spawning each
process, as well

Re: [RFC/GSoC] Proposal: Make git-pull and git-am builtins

2015-03-25 Thread Junio C Hamano
Paul Tan pyoka...@gmail.com writes:

 I think it's still good to have the ideal in mind though (and whoops I
 forgot to put in the word ideal in the text).

Using or not using fork is merely one of the trade-offs we can make.

If all other things are equal, no fork is better than a fork is a
meaningless statement, as all other things are never equal in real
life---doing things internally will have a cost of having to clean
up and a risk to get that part wrong, for example.  Engineering is a
fine balancing act and setting an absolute goal is not a healthy
attitude.
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC/GSoC] Proposal: Make git-pull and git-am builtins

2015-03-25 Thread Sebastian Schuberth

On 24.03.2015 17:37, Paul Tan wrote:


I'm applying for git in the Google Summer of Code this year. For my
project, I propose to rewrite git-pull.sh and git-am.sh into fast
optimized C builtins. I've already hacked up a prototype of a builtin
git-pull in [1], and it showed a promising 8x improvement in execution
time on Windows.


I cannot thank you enough for starting this effort. As one of the 
project owners of Git for Windows I can confirm the (shell) script Git 
commands to be a major source of pain.


I really hope your proposal gets accepted and you'll be able to 
successfully complete this task.


All the best!

--
Sebastian Schuberth
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC/GSoC] Proposal: Make git-pull and git-am builtins

2015-03-25 Thread Paul Tan
On Thu, Mar 26, 2015 at 1:54 AM, Junio C Hamano gits...@pobox.com wrote:
 Paul Tan pyoka...@gmail.com writes:

 I think it's still good to have the ideal in mind though (and whoops I
 forgot to put in the word ideal in the text).

 Using or not using fork is merely one of the trade-offs we can make.

 If all other things are equal, no fork is better than a fork is a
 meaningless statement, as all other things are never equal in real
 life---doing things internally will have a cost of having to clean
 up and a risk to get that part wrong, for example.  Engineering is a
 fine balancing act and setting an absolute goal is not a healthy
 attitude.

No, I do not mean all other things are equal, I meant all other
things are ideal, meaning that human factors are not involved.

I thought we were in agreement that calling functions in the internal
API is technically superior to forking, assuming that there are no
bugs or errors. Isn't this one of the reasons why libgit2 exists? If
for whatever reason spawning an external git process is chosen, it
would be because rewriting all the code paths without committing any
errors would take too much effort.

I will switch the word requirements to the word guidelines to make
it sound less strict. However my above point still stands.
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC/GSoC] Proposal: Make git-pull and git-am builtins

2015-03-24 Thread Paul Tan
Hi,

On Wed, Mar 25, 2015 at 2:37 AM, Junio C Hamano gits...@pobox.com wrote:
 Paul Tan pyoka...@gmail.com writes:

 ..., I propose the following requirements for the rewritten code:

 1. No spawning of external git processes. This is to support systems with 
 high
``fork()`` or process creation overhead, and to reduce redundant IO by
taking advantage of the internal object, index and configuration cache.

 I suspect this may probably be too strict in practice.

 True, we should never say run_command_capture() just to to read
 from git rev-parse---we should just call get_sha1() instead.

 But for a complex command whose execution itself far outweighs the
 cost of forking, I do not think it is fair to say your project
 failed if you chose to run_command() it.  For example, it may be
 perfectly OK to invoke git merge via run_command().

Yes, which is why I proposed writing a baseline using only the
run-command APIs first. Any other optimizations can then be done
selectively after that.

I think it's still good to have the ideal in mind though (and whoops I
forgot to put in the word ideal in the text).


 3. The resulting builtin should not have wildly different behavior or bugs
compared to the shell script.

 This on the other hand is way too loose.

 The original and the port must behave identically, unless the
 difference is fixing bugs in the original.


I was considering that there may be slight behavioral changes when the
rewritten code is modified to take greater advantage of the internal
API, especially since some builtins due to historical issues, may have
duplicated code from the internal API[1].

[1] I'm suspecting that the implementation of --merge-base in
show-branch.c re-implements get_octopus_merge_bases().

 Potential difficulties
 ===

 Rewriting code may introduce bugs
 ...

 Yes, but that is a reasonable risk you need to manage to gain the
 benefit from this project.

 Of course, the downside of following this too strictly is that if there were
 any logical bugs in the original code, or if the original code is unclear, 
 the
 rewritten code would inherit these problems too.

 I'd repeat my comment on the 3. above.  Identifying and fixing bugs
 is great, but otherwise don't worry about this too much.

 Being bug-to-bug compatible with the original is way better than
 introducing new bugs of an unknown nature.

Well yes, but I was thinking that if there are any glaring errors in
the original source then it would be better to fix these errors during
the rewrite than wasting time writing code that replicates these
errors.

 Rewritten code may become harder to understand
 ...

 And also it may become harder to modify.

 That is the largest problem with any rewrite, and we should spend
 the most effort to avoid it.

 A new bugs introduced we can later fix as long as the result is
 understandable and maintainable.

 For the purpose of reducing git's dependencies, the rewritten C code should 
 not
 depend on other libraries or executables other than what is already available
 to git builtins.

 Perhaps misphrased; see below.

In this case I was thinking of making git depend on another project.
(e.g, using an external regular expression library). Of course a
balance has to be made in this aspect (thus the use of should not),
but git-pull and git-am are relatively simple so there should be no
need for that,


 We can see that the C version requires much more lines compared to the shell
 pipeline,...

 That is something you would solve by introducing reusable code in
 run_command API, isn't it?  That is how various rewrites in the past
 did, and this project should do so too.  You should aim to do this
 project by not just using what is already available, but adding
 what you discover is a useful reusable pattern into a set of new
 functions in the already available API set.

Whoops, forgot to mention that here. (A brief mention was made on this
kind of refactoring in the Development Approach).

Thank you for your review.
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC/GSoC] Proposal: Make git-pull and git-am builtins

2015-03-24 Thread Paul Tan
Hi all,

I'm applying for git in the Google Summer of Code this year. For my
project, I propose to rewrite git-pull.sh and git-am.sh into fast
optimized C builtins. I've already hacked up a prototype of a builtin
git-pull in [1], and it showed a promising 8x improvement in execution
time on Windows.

Below is the full text of the proposal as submitted to google-melange
for your review and feedback. It is marked up in reStructuredText. The
latest (and rendered) version can be found at [2].

Regards,
Paul.

[1] http://thread.gmane.org/gmane.comp.version-control.git/265628
[2] https://gist.github.com/pyokagan/1b7b0d1f4dab6ba3cef1

(Thanks Matthieu for suggesting to post this on the mailing list. Will
reply to your comments in a separate email).

--8--


Make `git-pull` and `git-am` builtins


:Abstract: `git-pull` and `git-am` are frequently used git subcommands.
   However, they are porcelain commands and implemented as shell
   scripts, which has some limitations which can cause poor
   performance, especially in non-POSIX environments like Windows.
   I propose to rewrite these scripts into low level C code and make
   them builtins.  This will increase git's portability, and may
   improve the efficiency and performance of these commands.

.. section-numbering::

Limitations of shell scripts
=

`git-pull` is a commonly executed command to check for new changes in the
upstream repository and, if there are, fetch and integrate them into the
current branch. `git-am` is another commonly executed command for applying a
series of patches from a mailbox to the current branch. They are both git
porcelain commands -- with no access to git's low level internal API.
Currently, they are implemented by the shell scripts ``git-pull.sh`` and
``git-am.sh`` respectively. These shell scripts require a fully-functioning
POSIX shell and utilities. As a result, these commands are difficult to port to
non-POSIX environments like Windows.

Since porcelain commands do not have access to git's internal API, performing
any git-related function, no matter how trivial, requires git to be spawned in
a separate process. This limitation leads to these git commands being
relatively inefficient, and can cause long run times on certain platforms that
do not have copy-on-write ``fork()`` semantics.

Spawning processes can be slow
---

Shell scripting, by itself, is severely limited in what it can do.
Performing most operations in shell scripts require external executables to be
called. For example, ``git-pull.sh`` spawns the git executable not only to
perform git operations like `git-fetch` and `git-merge`, but it also spawns the
git executable for trivial tasks such as retrieving configuration values with
`git-config` and even quoting of command-line arguments with ``git rev-parse
--sq-quote``. As a result, these shell scripts usually end up spawning a lot of
processes.

Process spawning is usually implemented as a ``fork()`` followed by an
``exec()`` by shells. This can be slow on systems that do not support
copy-on-write semantics for ``fork()``, and thus needs to duplicate the memory
of the parent process for every ``fork()`` call -- an expensive process.

Furthermore, starting up processes on Windows is generally expensive as it
performs `several extra steps`_ such as such as using an inter-process call to
notify the Windows Client/Server Runtime Subsystem(CSRSS) about the process
creation and checking for App Compatibility requirements.

.. _`several extra steps`:
http://www.microsoft.com/mspress/books/sampchap/4354a.aspx

The official Windows port of git, Git for Windows, uses MSYS2 [#]_ to emulate
``fork()``. Since Windows does not support forking semantics natively, MSYS2
can only emulate ``fork()`` `without copy-on-write semantics`_. Coupled with
Windows heavy process creation, this causes huge slowdowns of git on Windows.

.. _`without copy-on-write semantics`:
https://www.cygwin.com/faq.html#faq.api.fork

A no-updates `git-pull`, for example, takes an average of 5.1s [#]_, as
compared to Linux which only takes an average of 0.08s. 5 seconds,
while seemingly short, would seem like an eternity to a user who just wants to
quickly fetch and merge changes from upstream.

`git-am`'s implementation reads each patch from the mailbox in a while loop,
spawning many processes for each patch. Considering the cost of spawning each
process, as well as the fact that runtime grows linearly with the number of
patches, git-am takes a long time to process a seemingly small number of
patches on Windows as compared to Linux. A quick benchmarks shows that `git-am`
takes 7m 20.39s to apply 100 patches on Windows, compared to Linux, which took
only 0.08s.

Commands which call `git-am` are also affected as well. ``git-rebase--am.sh``,
which implements the default

Re: [RFC/GSoC] Proposal: Make git-pull and git-am builtins

2015-03-24 Thread Paul Tan
On Tue, Mar 24, 2015 at 6:19 PM, Matthieu Moy
matthieu@grenoble-inp.fr wrote:
 A few minor details:

 on operating systems with poor file system performance (i.e. Windows)
 = that's not only windows, I also commonly use a slow filesystem on
 Linux, just because it's NFS. Mentionning other cases of poor filesystem
 performance would show that the benefit is not limited to windows users,
 and would give less of a taste of windows-bashing.

Ah right, I didn't think of network file systems. Thanks for the suggestion.

 About the timeline: I'd avoid too much parallelism. Usually, it's best
 to try to send a first patch to the mailing list as soon as possible,
 hence focus on one point first (I'd do that with pull, since that's the
 one which is already started). Then, you can parallelize coding on git
 am and the discussion on the pull patches. Whatever you plan, review and
 polishing takes more than that ;-). The risk is to end up with an almost
 good but not good enough to be mergeable code. That said, your timeline
 does plan patches and review early, so I'm not too worried.


Well, I was thinking that after the full rewrite (2nd stage, halfway
through the project), any optimizations made to the code will be done
iteratively (and in separate small patches) so as to keep the patch
series in an always almost mergeable state. This will hopefully make
it much easier and shorter to do any final polishing and review for
merging.

 A general advice: if time allows, try to contribute to discussions and
 review other than your own patches. It's nice to feel integrated in the
 community and not the GSoC student working alone at home ;-).

Yeah I apologize for not participating in the list so actively because
writing the git-pull prototype and the proposal took a fair chunk of
my time. Also, my expertise with the code base is not that great yet
so it takes quite a bit more effort for me to contribute
constructively, but I expect that will improve in the future. Now that
the proposal is more or less complete I can spend more time on
discussions.

Thanks,
Paul
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC/GSoC] Proposal: Make git-pull and git-am builtins

2015-03-24 Thread Junio C Hamano
Paul Tan pyoka...@gmail.com writes:

 ..., I propose the following requirements for the rewritten code:

 1. No spawning of external git processes. This is to support systems with high
``fork()`` or process creation overhead, and to reduce redundant IO by
taking advantage of the internal object, index and configuration cache.

I suspect this may probably be too strict in practice.

True, we should never say run_command_capture() just to to read
from git rev-parse---we should just call get_sha1() instead.

But for a complex command whose execution itself far outweighs the
cost of forking, I do not think it is fair to say your project
failed if you chose to run_command() it.  For example, it may be
perfectly OK to invoke git merge via run_command().

 3. The resulting builtin should not have wildly different behavior or bugs
compared to the shell script.

This on the other hand is way too loose.

The original and the port must behave identically, unless the
difference is fixing bugs in the original.

 Potential difficulties
 ===

 Rewriting code may introduce bugs
 ...

Yes, but that is a reasonable risk you need to manage to gain the
benefit from this project.

 Of course, the downside of following this too strictly is that if there were
 any logical bugs in the original code, or if the original code is unclear, the
 rewritten code would inherit these problems too.

I'd repeat my comment on the 3. above.  Identifying and fixing bugs
is great, but otherwise don't worry about this too much.

Being bug-to-bug compatible with the original is way better than
introducing new bugs of an unknown nature.

 Rewritten code may become harder to understand
 ...

And also it may become harder to modify.

That is the largest problem with any rewrite, and we should spend
the most effort to avoid it.

A new bugs introduced we can later fix as long as the result is
understandable and maintainable.

 For the purpose of reducing git's dependencies, the rewritten C code should 
 not
 depend on other libraries or executables other than what is already available
 to git builtins.

Perhaps misphrased; see below.

 We can see that the C version requires much more lines compared to the shell
 pipeline,...

That is something you would solve by introducing reusable code in
run_command API, isn't it?  That is how various rewrites in the past
did, and this project should do so too.  You should aim to do this
project by not just using what is already available, but adding
what you discover is a useful reusable pattern into a set of new
functions in the already available API set.
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC/GSoC] Proposal: Make git-pull and git-am builtins

2015-03-24 Thread Matthieu Moy
Paul Tan pyoka...@gmail.com writes:

 On Tue, Mar 24, 2015 at 6:19 PM, Matthieu Moy
 matthieu@grenoble-inp.fr wrote:

 About the timeline: I'd avoid too much parallelism. Usually, it's best
 to try to send a first patch to the mailing list as soon as possible,
 hence focus on one point first (I'd do that with pull, since that's the
 one which is already started). Then, you can parallelize coding on git
 am and the discussion on the pull patches. Whatever you plan, review and
 polishing takes more than that ;-). The risk is to end up with an almost
 good but not good enough to be mergeable code. That said, your timeline
 does plan patches and review early, so I'm not too worried.


 Well, I was thinking that after the full rewrite (2nd stage, halfway
 through the project), any optimizations made to the code will be done
 iteratively (and in separate small patches)

Yes, that's why I'm not too worried. But being able to say this part is
done, it won't disturb me anymore ASAP is still good IMHO, even if
this part is not so big.

But again, I'm thinking out loudly, feel free to ignore.

 A general advice: if time allows, try to contribute to discussions and
 review other than your own patches. It's nice to feel integrated in the
 community and not the GSoC student working alone at home ;-).

 Yeah I apologize for not participating in the list so actively because
 writing the git-pull prototype and the proposal took a fair chunk of
 my time.

Don't apologize, you're doing great. I'm only pointing out things that
could be even better, but certainly not blaming you!

-- 
Matthieu Moy
http://www-verimag.imag.fr/~moy/
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC/GSoC] Proposal Draft: Unifying git branch -l, git tag -l, and git for-each-ref

2015-03-23 Thread karthik nayak

Hello,
I have completed the micro project[1] and have also been working on 
adding a --literally option
for cat-file[2]. I have left out the personal information part of the 
proposal here, will fill that in while submitting my final proposal.


Currently, I have been reading about how branch -l, “tag -l” and 
“for-each-refs” work and how they implement the selection and formatting 
options.


Since this is a draft for my final proposal I would love to hear from 
you all about :


* Suggestions on my take of this idea and how I could improve it or 
modify it.

* Anything more I might have missed out on, in the proposal.

GSOC Proposal : Unifying git branch -l, git tag -l, and git for-each-ref

# Main objectives of the project:

* Build a common library for which can handle both selection and 
formatting of refs.


* Use this library throughout ‘branch -l’, ‘tag -l’ and ‘for-each-ref’.

* Implement options available in some of these commands onto others.



# Amongst  ‘branch -l’, ‘tag -l’ and ‘for-each-ref’ :

* ‘git branch -l’ and ‘git tag -l’ share  the ‘--contains’ option.

* 'git tag' and 'git branch' could use a formatting option (This could 
also be used to implement the verbose options)

For eg: git branch -v could be implemented using :
	git for-each-ref refs/heads --format='%(refname:short) 
%(objectname:short) %(upstream:track) %(contents:subject)'
	This shows that having a formatting option for these two would mean 
that the verbose options could be implemented using the formatting 
option itself.


 * 'git for-each-refs' could use all the selection options. This would 
enhance the uses of for-each-refs itself. Users can then view only refs 
based on what they may be looking for.


* formatting options for ‘git branch -l’ and ‘git tag -l’. This would 
enable the user to view information as per the users requirements and 
format.



# Approach

All three commands select a subset of the repository’s refs and print 
the result. There has been an attempt to unify these commands by Jeff 
King[3]. I plan on continuing his work[4] and using his approach to 
tackle this project.


As per the common library for ‘branch -l’, ‘tag -l’ and ‘for-each-ref’ I 
plan on creating a file (mostly as ref-filter.c in terms with what Jeff 
has already done) which will provide API’s to add refs to get a list of 
all refs. This will be used along with ‘for_each_*_ref’ for obtaining 
the refs required. This gives us the basic functionality of obtaining 
the refs required by the command.


Here we could have a basic data structure (struct ref_filter_item) which 
would denote a particular ref and have another data structure to hold a 
list of these refs (struct ref_filter). Then after getting the required 
refs, we could print the information.


For extended selection behaviour such as ‘--contains’ or ‘--merged’ we 
could implement these within
the library by providing functions which closely mimic the current 
methods used individually by ‘branch -l’ and ‘tag -l’. For eg to 
implement ‘--merged’ we implement a ‘compute_merge()’ function, which 
with the help of the revision API’s will be able to perform the same 
function as ‘branch -l --merged’.


For formatting functionality provided by ‘for-each-ref’ we replicate the 
‘show_ref’ function in ‘for-each-ref.c’ where the format is given to the 
function and the function uses the format to obtain atom values and 
prints the corresponding atom values to the screen. This feature would 
allow us to provide format functionality which could act as a base for 
the ‘-v’ option also.


As Jeff has already done, we could also add parse options. Although Jeff 
has built a really good base to build upon, I shall use his work as more 
of a reference and work on unification of the three commands from 
scratch. I plan on coding for this project using a test driven 
development, where I will write tests (initially failing) which will be 
based on the objectives of the project and then write code to pass those 
tests.




# Timeline

This is a rough plan of how I will spend the summer working on this project.

Community bonding period:
	Work on understanding how all three commands work in total detail. And 
build up on the design of unification of the three commands. Read 
through Jeff’s attempt at unification and get a grasp of what to do.


Week 1 :
	Write tests and documentation which will the goal of this project. This 
will set

Re: About the proposal format of GSoc 2015

2015-03-23 Thread Matthieu Moy
Shanti Swarup Tunga b112...@iiit-bh.ac.in writes:

  hey
   I am Shanti Swarup Tunga . I want to know is there any proposal
 format for Git .If not what should we focus in the proposal .

You probably already found http://git.github.io/SoC-2015-Ideas.html

There's no particular requirement on the format other than the ones
there.

-- 
Matthieu Moy
http://www-verimag.imag.fr/~moy/
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


About the proposal format of GSoc 2015

2015-03-23 Thread Shanti Swarup Tunga
 hey
  I am Shanti Swarup Tunga . I want to know is there any proposal
format for Git .If not what should we focus in the proposal .
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Fwd: [RFC] [GSoC Proposal Draft] Unifying git branch -l,git tag -l and git for-each-ref

2015-03-23 Thread Sundararajan R
Hi all,

I have attempted a microproject [1][2] and this is my first draft of
the proposal.I have included only the matter regarding my approach
to solving the problem and shall add my personal details later.

Please be kind enough to go through my proposal and suggest modifications
or detailing wherever required. Also give me feedback on whether my approach to
solving the problem is correct.

In the meantime I am reading up the code of Jeff's attempt at
unification,here [3] for preparing my final proposal.



Title
-

Unifying git branch -l,git tag -l and git for-each-ref

Abstract


git for-each-ref and the list modes of git branch and git tag involve selecting
a subset of the refs and printing out the result. Currently the implementations
are not shared in the sense that :-

SELECTION
1. git branch knows --contains,--merged and --no-merged
2. git tag only knows --contains

FORMATTING
1. git for-each-ref knows formatting which none of the other two commands know.

SORTING
1. git tag knows sorting only on the basis of refnames
2. git for-each-ref knows sorting on the basis of all the fieldnames
which can be
used in its --format option

The idea is to unify the computations for these processes in a common library
and teach these commands all these options uniformly.

Why do we need unification?
These commands try to accomplish more or less the same thing . So,new
features would
most likely be applicable to all three of them. So, unification will
allow us build
new features for all these commands in one go instead of doing it separately for
each of the three commands.

Jeff has already worked quite a bit on unifying the selection part. I shall use
that work as a starting point when I start off building the library
and its API calls.

Deliverables


1. The unified library will borrow the --contains implementation from git tag
(due to the speed up it had received), the --merged/--no-merged implementation
from git branch and the --format implementation from git for-each-ref.

2. The commands will then be taught these options by making calls to
this library
functions and structures.

3. Add documentation and tests for these new features.

Optionals
-

1. Implement the --sort option for these commands in the unified library.
2. Add documentation and tests for this feature

Approach


The common library will contain a structure which will store the
present state of the
list of refs in the sense that after we perform a computation(eg.
--contains commit)
on the list of refs, the new list will store the result of that computation.

The structure will also have other attributes which the options
structure will take in
as its (void *)value attribute’s value before parsing the different
options. This is to
communicate to the structure about the various options(eg. --merged,
--format, --sort)
we want to use. The list of refs shall be fetched by the API in accordance with
the command(eg. git tag) and its option(eg. --merged) which were
passed to the API.

Next comes the matter of printing out the results according to the
format specified
(the default format for the command if no format is specified). This
will be done
in a method similar to how git for-each-ref prints out the results in
the given format.

Approximate Timeline


(To estimate the amount of work that can be done in summers though it may change
 during the project[based on advice from mentors])

May 03 - May 10
Read and understand the implementation of --contains option in git tag and the
--merged/--no-merged implementation in git branch.

May 11 - May 17
Go through Jeff’s work on unification to get detailed pointers on how
to start with
unifying selection. Finalise all the structures required and also the API calls
the library would have to make for the selection options.

May 18 - May 24
Start working on the API.  Discuss ideas with mentor, brainstorm on
the details of
what function calls will be made to the API and what function calls
will be made by the API.

CODING PERIOD BEGINS

May 25 - May 31
Implement the --contains option in the library by taking the cue from
how git tag --contains
is implemented.

June 1 - June 7
Implement the -merge and --no-merged options similar to how they are
implemented in git branch

June 8 - June 11
Make computations more efficient, improve comments and start documentation.
Discuss about additional features and requirements with mentors.

June 12 - June 25
Teach the three commands to use the API for formatting and sorting.
Add tests and refactor
the code of the API if required. Complete the documentation for the
new features added.

MID-TERM EVALUATION

June 26 - June 30
Discuss with mentors about the state and the pace with which the
project is coming on.
Start finalising the details of the further goals to be accomplished.

July 01 - July 07
Start working on the formatting

Re: Feature Proposal: Track all branches from a given remote

2014-10-26 Thread Scott Johnson
Hi Brian:

 [remote origin]
   fetch = refs/heads/*:refs/heads/*

Yes, you're right, this works just fine as long as I move out from a
branch that's not in the remote in question, for example by doing:

git checkout -b nothing
git fetch

- OR -

git pull

Do you think there would be any interest in a patch that added this as
a simple command line option, though? I guess the idea of this patch
then would simply change this line in the .git/config file for the
length of the operation (and specified remote), execute the git pull
command, and then reset the configuration after the command finished.
(There really wouldn't be a need to affect the configuration on the
filesystem - simply the effective configuration used while git is
running for this operation).

Thanks,

~Scott
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Feature Proposal: Track all branches from a given remote

2014-10-26 Thread Andreas Schwab
Scott Johnson jayw...@gmail.com writes:

 Do you think there would be any interest in a patch that added this as
 a simple command line option, though? I guess the idea of this patch
 then would simply change this line in the .git/config file for the
 length of the operation (and specified remote), execute the git pull
 command, and then reset the configuration after the command finished.

There is no need to modify the configuration, you can pass the fetch
spec on the command line.

Andreas.

-- 
Andreas Schwab, sch...@linux-m68k.org
GPG Key fingerprint = 58CA 54C7 6D53 942B 1756  01D3 44D5 214B 8276 4ED5
And now for something completely different.
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Feature Proposal: Track all branches from a given remote

2014-10-25 Thread Scott Johnson
Hello git experts:

Recently, I've encountered the problem where I would like to set my
local repository copy to track all branches on a given remote. There
does not appear to be a switch for this in the git-branch command
currently, however, I will admit that my somewhat limited
understanding of the git-branch manpage might be causing me simply not
to see it.

It seems as though this is a use case that some users of git encounter
now and then, as illustrated by this post:

http://stackoverflow.com/a/6300386/281460

I was thinking that it might be useful to add a new option to git
branch, perhaps something like:

git-branch --track-remote remotename

Where remotename specifies a given remote, and the command will
track all branches remotes/remotename/* to refs/heads/*.

So, for example, if I were to run:

git-branch --track-remote origin

and I had two branches on origin, master and maint, respectively,
after the command finishes, my local repo would now have two branches,
master (set up to track origin/master), and maint (setup to track
origin/maint).

I'm not entirely sure how to handle naming conflicts, for example if
'maint' already existed on another remote, and was set up to track
from that remote previous to this invocation of the command.

If I were to start work on a patch, would there be any interest in
this feature, or are there reasons why it isn't currently implemented?

Thank you,

~Scott Johnson
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Feature Proposal: Track all branches from a given remote

2014-10-25 Thread brian m. carlson
On Sat, Oct 25, 2014 at 04:34:30PM -0700, Scott Johnson wrote:
 Hello git experts:
 
 Recently, I've encountered the problem where I would like to set my
 local repository copy to track all branches on a given remote. There
 does not appear to be a switch for this in the git-branch command
 currently, however, I will admit that my somewhat limited
 understanding of the git-branch manpage might be causing me simply not
 to see it.

I don't know about a command line option for this, but I think there's a
way to achieve what you're looking for.

 So, for example, if I were to run:
 
 git-branch --track-remote origin
 
 and I had two branches on origin, master and maint, respectively,
 after the command finishes, my local repo would now have two branches,
 master (set up to track origin/master), and maint (setup to track
 origin/maint).

You could do something like this in .git/config:

[remote origin]
  fetch = refs/heads/*:refs/heads/*

You won't be able to fetch if you would overwrite the current branch,
though.
-- 
brian m. carlson / brian with sandals: Houston, Texas, US
+1 832 623 2791 | http://www.crustytoothpaste.net/~bmc | My opinion only
OpenPGP: RSA v4 4096b: 88AC E9B2 9196 305B A994 7552 F1BA 225C 0223 B187


signature.asc
Description: Digital signature


I'VE A FINANCIAL PROPOSAL FOR YOU. ARE YOU INTERESTED?

2014-07-14 Thread Peter. K


I'M SORRY I CANNOT GIVE YOU IMMEDIATE DETAILS ON THE ISSUE UNTIL I CONFIRM YOUR 
INTEREST. BE ATTENTIVE TO THE SUBJECT LINE AND SEND YOUR REPLY ON SAME MAIL 
TRAIL TO AID CONTINUITY. REGARDS, MR. PETER KREMER
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Proposal for pruning tags

2014-06-11 Thread Michael Haggerty
On 06/05/2014 04:51 PM, Robert Dailey wrote:
 I've never contributed to the Git project before. I'm a Windows user,
 so I use msysgit, but I'd be happy to install linux just so I can help
 implement this feature if everyone feels it would be useful.
 
 Right now AFAIK, there is no way to prune tags through Git. The way I
 currently do it is like so:
 
 $ git tag -l | xargs git tag -d
 $ git fetch --all

Junio explained some limitations of tags (namely that there is only one
tags namespace that is shared project-wide) that makes your wish
impossible to implement the way it works for branches.

Local tags are awkward for the same reason.  It is too easy to push them
accidentally to a central repository and too hard to delete them after
that has happened.  They kindof spread virally, as you have noticed.  I
recommend against using local tags in general.

Recent Git does have a feature that might help you.  *If* you have a
central repository that is authoritative WRT tags, then you can sync
the tags in your local repository to the tags in the central repo using

git fetch --prune $REMOTE +refs/tags/*:refs/tags/*

You might also be able to use a pre-receive hook on the central repo to
prevent tags from being pushed by people who shouldn't be doing so, or
to require that tags have an approved format (like
refs/tags/release-\d+\.\d+\.\d+ or whatever) to try to prevent a
recurrence of the problem.

Michael

-- 
Michael Haggerty
mhag...@alum.mit.edu
http://softwareswirl.blogspot.com/
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Proposal for pruning tags

2014-06-06 Thread Robert Dailey
On Thu, Jun 5, 2014 at 3:50 PM, Junio C Hamano gits...@pobox.com wrote:
 I think you need to explain what you mean by prune a lot better
 than what you are doing in your message to be understood by others.

 After seeing the above two commands, my *guess* of what you want to
 do is to remove any of your local tag that is *not* present in the
 repository you usually fetch from (aka origin), but that directly
 contradicts with what you said you wish, i.e.

 This is not only wasteful, but dangerous. I might accidentally delete
 a local tag I haven't pushed yet...

 which only shows that your definition of prune is different from
 remove what I do not have at 'origin'.

 But it does not say *how* that is different.  How should prune
 behave differently from the two commands above?  How does your
 prune decide a tag needs to be removed locally when it is not at
 your origin [*1*]?

 There is *nothing* in git that lets you look at a local tag that is
 missing from the other side and determine if that is something you
 did not want to push (hence it is missing there) of if that is
 something you forgot to push (hence it is missing there but you
 would rather have pushed if you did not forget).  So you must have
 some new mechanism to record and/or infer that distinction in mind,
 but it is not clear what it is from your message.

 So until that is clarified, there is not much more to say if your
 feature has any merit---as there is no way to tell what that
 feature exactly is, at least not yet ;-)
 snip

You're right I didn't clarify, although I feel you're not providing
the most welcome response to someone who isn't as familiar with the
internals of Git as you are.

It was an oversight on my part. What I was expecting is that it would
behave exactly like branch pruning does, but that would require
remote tracking tags, which we don't have. So, apparently my idea
doesn't hold much water.

The general problem I see in the day to day workflow with my team is
that if tags exist locally and they push, those tags continuously get
recreated on the remote repo even after I delete them remotely. So I
can never truly delete tags until I go to each person and make sure
the tool they're using isn't accidentally pushing tags. For example,
SourceTree pushes all tags by default. Everyone on my team is new to
Git, so they don't know to turn that off. Having git clean up tags
automatically would really help with this, even though you may not
feel it's the responsibility of Git. It's more of a usability issue,
it's just prone to error.

I can setup my config to prune tracking branches after I pull. Having
something like this for tags would be wonderful. However, this
requires a bigger overhaul than what I initially was proposing.
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Proposal for pruning tags

2014-06-06 Thread Junio C Hamano
Robert Dailey rcdailey.li...@gmail.com writes:

 ... Having git clean up tags
 automatically would really help with this, even though you may not
 feel it's the responsibility of Git. It's more of a usability issue,

I agree with Having ... help with this.  I did not say at all that
it is not something Git should and can try to help.  I also agree
with it is a usability issue.

The thing is, the word automatically in your clean up tags
automatically is still too loose a definition of what we want, and
we cannot come up with a way to help users without tightening that
looseness.  As you said, you are looking for something that can tell
between two kinds of tags that locally exist without having a copy
at the 'origin':

 - ones that you do not want to keep
 - others that you haven't pushed to (or forgot to push to) 'origin'

without giving the users a way to help Git to tell these two kinds
apart and only remove the former.

So...

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Proposal for pruning tags

2014-06-05 Thread Robert Dailey
I've never contributed to the Git project before. I'm a Windows user,
so I use msysgit, but I'd be happy to install linux just so I can help
implement this feature if everyone feels it would be useful.

Right now AFAIK, there is no way to prune tags through Git. The way I
currently do it is like so:

$ git tag -l | xargs git tag -d
$ git fetch --all

This is not only wasteful, but dangerous. I might accidentally delete
a local tag I haven't pushed yet. What would be great is if we had the
following:

git tag prune [remote|--all]

The remote is needed in decentralized workflows (upstream vs
origin). I'd also like to see an `--all` option in place of the
remote, which means it will prune local tags from all remotes. I'm not
sure if this command line structure will work, but it can be altered
as necessary.

Alternatively, this might also make sense on the remote command:

git remote prune remote --tags

Again I'm not an expert at the internals of Git, so I wanted to share
my idea with the community first to see if this holds water or if
there is already some built in way of doing this. Thanks for hearing
out my idea!
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Proposal for pruning tags

2014-06-05 Thread Junio C Hamano
Robert Dailey rcdailey.li...@gmail.com writes:

 I've never contributed to the Git project before. I'm a Windows user,
 so I use msysgit, but I'd be happy to install linux just so I can help
 implement this feature if everyone feels it would be useful.

 Right now AFAIK, there is no way to prune tags through Git. The way I
 currently do it is like so:

 $ git tag -l | xargs git tag -d
 $ git fetch --all

I think you need to explain what you mean by prune a lot better
than what you are doing in your message to be understood by others.

After seeing the above two commands, my *guess* of what you want to
do is to remove any of your local tag that is *not* present in the
repository you usually fetch from (aka origin), but that directly
contradicts with what you said you wish, i.e.

 This is not only wasteful, but dangerous. I might accidentally delete
 a local tag I haven't pushed yet...

which only shows that your definition of prune is different from
remove what I do not have at 'origin'.

But it does not say *how* that is different.  How should prune
behave differently from the two commands above?  How does your
prune decide a tag needs to be removed locally when it is not at
your origin [*1*]?

There is *nothing* in git that lets you look at a local tag that is
missing from the other side and determine if that is something you
did not want to push (hence it is missing there) of if that is
something you forgot to push (hence it is missing there but you
would rather have pushed if you did not forget).  So you must have
some new mechanism to record and/or infer that distinction in mind,
but it is not clear what it is from your message.

So until that is clarified, there is not much more to say if your
feature has any merit---as there is no way to tell what that
feature exactly is, at least not yet ;-)


[Footnote]

*1* By the way, removing and then refetching would be a silly way to
do this kind of thing anyway.  After removing but before you
have a chance to fetch, your ISP may severe your network
connection and then what happens?

Whatever your definition of prune is, I would think it would
be built around ls-remote --tags output, to see what tags the
other repository (or other repositories, by looping over the
remotes you interact with) have, compare that set with the tags
you locally have in order to decide which subset of tags you
locally have to remove.
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC][GSOC] Proposal Draft for GSoC, Suggest Changes

2014-03-31 Thread karthik nayak
Hello,
Now that i have already submitted my proposal to GSOC , i was
wondering if there is any way
where i could contribute to git via bug fixes or something similar to
the microprojects which
was available prior to GSOC application.
Also wondering if any clarification was needed as per my proposal.
Would be great to hear from you all .
Thanks
- Karthik
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] [GSoC] Draft of Proposal for GSoC

2014-03-21 Thread Brian Bourn
Parts of v2, once again, i'd love some more comments on what I've
rewritten


On Fri, Mar 21, 2014 at 1:42 AM, Jeff King p...@peff.net wrote:
 On Thu, Mar 20, 2014 at 02:15:29PM -0400, Brian Bourn wrote:

 Going through the annals of the listserve thus far I've found a few
 discussions which provide some insight towards this process as well as
 some experimental patches that never seem to have made it
 through[1][2][3][4]

 Reading the past work in this area is a good way to get familiar with
 it. It looks like most of the features discussed in the threads you link
 have been implemented. The one exception seems to be negative patterns.
 I think that would be a good feature to build on top of the unified
 implementation, once all three commands are using it.

 I would start by beginning a deprecation plan for git branch -l very
 similar to the one Junio presents in [5], moving -create-reflog to -g,

 That makes sense. I hadn't really considered -l as another point of
 inconsistency between the commands, but it definitely is.

 Following this I would begin the real work of the project which would
 involve moving the following flag operations into a standard library
 say 'list-options.h'

 --contains [6]
 --merged [7]
 --no-merged[8]
 --format
 This Library would build these options for later interpretation by 
 parse_options
 Can you sketch out what the API would look like for this unified
 library? What calls would the 3 programs need to make into it?


Something like this?

Sample api calls
Add_Opt_Group()
Parse_with_contains()
Parse_with_merged()
Parse_with_no_merged()
Parse_with_formatting()
(each of the 4 calls above may have internal calls within the library
in order to parse the option for each of the different function which
may call these functions)


 For the most part I haven't finalized my weekly schedule but a basic
 breakdown would be

 Can you go into more detail here? Remember that writing code is only one
 part of the project. You'll need to be submitting your work, getting
 review and feedback, and iterating on it.

 One problem that students have is queuing up a large amount of work to
 send to the list. Then they twiddle their thumbs waiting for review to
 come back (which takes a long time, because they just dumped a large
 body of work on the reviewers). If you want to make effective use of
 your time, it helps to try to break tasks down into smaller chunks, and
 think about the dependencies between the chunks. When one chunk is in
 review, you can be designing and coding on another.

This one I can absolutely understand, I tried to break this part down into very
managable parts and give myself a little time at the end of each coding period
to clean up each previous section.  this slop time also allows for me
to hopefully
add some of the extra features that have been thought of. I'm thinking something
like this makes it a little better,

Weekly Schedule

Start-Midterm
Week 1- Begin deprecation of -l in git branch/establish exactly how
long each stage of the deprecation should take.  Spend some time
reading *.c files even deeper while getting to know any current
patches occurring in any area near my work files.  Lastly, this week
will be spent going through the Mailing-list finding previous work
done in this area and any other experimental patches
Week 2- Move Opt_Group callbacks for the functions  into Library
Week 3-Make a Contains Function in the library which will work for all
three functions
Week 4-Add Merge function in library
Week 5-Add a No Merge function in library
Weeks 7-8 spend time polishing the library and cleaning up the patches
for final submission of library to the project

Deliverables for midterm- Library finished pending polish and
acceptance into the git repository

Midterm

Week 9- refactor all files to use the contains flag from the file.
Week 10- use Merge from library in all relevant files
Week 11-use no-merge from library in all relevant files
Week 11-12- implement the format flags in all relevant files (this
will be slightly harder as I think this might involve calling
for-each-ref in the code for tag and branch. Ultimately there is a
chance that part of the code for doing for-each-ref will end up in
this library as well), additionally add in the code for formatting the
relevant opt_Groups into the necessary files.
Week 13-14 Polish patches via mailing-list and clean up all the
refactoring of the files that has occurred.(optionally, add more
formatting changes such as negative patterns and numbering each output
into the library).

Deliverables for Final- working library hopefully added into the code,
and all of the relevant patches for using the library mostly polished
and, minimally, pending peer review for submission into the code base.

I do wonder if this plan might be a little on the conservative side,
if anything, I think
this could take a slightly shorter time than planned, but In that case
I can always
work on other additions to format.



 

Re: [RFC] [GSoC] Draft of Proposal for GSoC

2014-03-21 Thread Junio C Hamano
Brian Bourn ba.bo...@gmail.com writes:

 Something like this?

 Sample api calls
 Add_Opt_Group()
 Parse_with_contains()
 Parse_with_merged()
 Parse_with_no_merged()
 Parse_with_formatting()
 (each of the 4 calls above may have internal calls within the library
 in order to parse the option for each of the different function which
 may call these functions)

This list is a bit too sketchy to be called sample api calls, at
least to me.  Can you elaborate a bit more?

What do they do, what does the caller expect to see (do they get
something as return values?  do they expect some side effects?)?
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] [GSoC] Draft of Proposal for GSoC

2014-03-21 Thread Brian Bourn
On Fri, Mar 21, 2014 at 1:45 PM, Junio C Hamano gits...@pobox.com wrote:
 Brian Bourn ba.bo...@gmail.com writes:

 Something like this?

 Sample api calls
 Add_Opt_Group()
 Parse_with_contains()
 Parse_with_merged()
 Parse_with_no_merged()
 Parse_with_formatting()
 (each of the 4 calls above may have internal calls within the library
 in order to parse the option for each of the different function which
 may call these functions)

 This list is a bit too sketchy to be called sample api calls, at
 least to me.  Can you elaborate a bit more?

 What do they do, what does the caller expect to see (do they get
 something as return values?  do they expect some side effects?)?

so something like this would be better I'm assuming?

Some basic sample API calls are found below, each of these would hold
code to complete parsing and/or formatting the flags.
Add_Opt_Group() - returns an OPT_CALLBACK with contains, merged,
no-merged, or formatting which can be used in a commands options list.

Execute_list()-the main call into the library and would pass into the
library all of the necessary flags and arguments for parsing the
request and executing it. This would accept the flags like
-contain, with arguments such as the commit or pattern that is being
searched for.

The next four commands would be called by execute_list() to execute
the original command with respect to the flags that are passed into
this library.
Parse_with_contains()
Parse_with_merged()
Parse_with_no_merged()
Parse_with_formatting()
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] [GSoC] Draft of Proposal for GSoC

2014-03-21 Thread Jeff King
On Fri, Mar 21, 2014 at 02:03:41PM -0400, Brian Bourn wrote:

  What do they do, what does the caller expect to see (do they get
  something as return values?  do they expect some side effects?)?
 
 so something like this would be better I'm assuming?
 
 Some basic sample API calls are found below, each of these would hold
 code to complete parsing and/or formatting the flags.
 Add_Opt_Group() - returns an OPT_CALLBACK with contains, merged,
 no-merged, or formatting which can be used in a commands options list.
 
 Execute_list()-the main call into the library and would pass into the
 library all of the necessary flags and arguments for parsing the
 request and executing it. This would accept the flags like
 -contain, with arguments such as the commit or pattern that is being
 searched for.
 
 The next four commands would be called by execute_list() to execute
 the original command with respect to the flags that are passed into
 this library.
 Parse_with_contains()
 Parse_with_merged()
 Parse_with_no_merged()
 Parse_with_formatting()

Think about how the callers would use them. Will git-branch just call
Parse_with_contains? If so, where would that call go? What arguments
would it take, and what would it do?

I don't think those calls are enough. We probably need:

  1. Some structure to represent a list of refs and store its
 intermediate state.

  2. Some mechanism for telling that structure about the various
 filters, sorters, and formatters we want to use (and this needs to
 be hooked into the option-parsing somehow).

  3. Some mechanism for getting the listed refs out of that structure,
 formatting them, etc.

-Peff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] [GSoC] Draft of Proposal for GSoC

2014-03-21 Thread Brian Bourn
On Fri, Mar 21, 2014 at 2:07 PM, Jeff King p...@peff.net wrote:
 On Fri, Mar 21, 2014 at 02:03:41PM -0400, Brian Bourn wrote:

  What do they do, what does the caller expect to see (do they get
  something as return values?  do they expect some side effects?)?

 so something like this would be better I'm assuming?

 Some basic sample API calls are found below, each of these would hold
 code to complete parsing and/or formatting the flags.
 Add_Opt_Group() - returns an OPT_CALLBACK with contains, merged,
 no-merged, or formatting which can be used in a commands options list.

 Execute_list()-the main call into the library and would pass into the
 library all of the necessary flags and arguments for parsing the
 request and executing it. This would accept the flags like
 -contain, with arguments such as the commit or pattern that is being
 searched for.

 The next four commands would be called by execute_list() to execute
 the original command with respect to the flags that are passed into
 this library.
 Parse_with_contains()
 Parse_with_merged()
 Parse_with_no_merged()
 Parse_with_formatting()

 Think about how the callers would use them. Will git-branch just call
 Parse_with_contains? If so, where would that call go? What arguments
 would it take, and what would it do?

 I don't think those calls are enough. We probably need:

   1. Some structure to represent a list of refs and store its
  intermediate state.

   2. Some mechanism for telling that structure about the various
  filters, sorters, and formatters we want to use (and this needs to
  be hooked into the option-parsing somehow).

   3. Some mechanism for getting the listed refs out of that structure,
  formatting them, etc.

keeping some of my function calls to do the actual work I think I
settled on this

A possible API is given below, each of these would hold code to
complete parsing and/or formatting the flags.

There will be a struct in the library called refs_list() which when
initialized will iterate through all the refs in a repository and add
them to this list.

 there would be a function which would retrieve ref structs from that function.

Get_ref_from_list()- which would return a single ref from the list.

Add_Opt_Group() - returns an OPT_CALLBACK with contains, merged,
no-merged, or formatting which can be used in a commands options list.

Execute_list()-the main call into the library and would pass into the
library all of the necessary flags and arguments for parsing the
request and executing it. This would accept the flags like contain,
with arguments such as the commit or pattern that is being searched
for. This will then parse the refs_list using the four commands below
to make, sort, filter, and format an output list which will then be
printed or returned by this function.

Any Call into the API from an outside source would call one of the
previous two functions, all other commands in the API would be for
internal use only, in order to simplify the process of calling into
this library.

The next four commands would be called by execute_list() to further
format the refs_list with respect to the flags that are passed into
this library. These would also take the additional arguments from
execute_list() such as patterns to parse or which commit to filter
out. these calls would modify the refs_list for eventual printing.

Parse_list _with_contains()

Parse_list_with_merged()

Parse_list_with_no_merged()

Format_list()

of course this would still depend on deciding whether or not we want
to return to the original command to print or if printing can be
handled by the library itself.


 -Peff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: GSoC proposal: port pack bitmap support to libgit2.

2014-03-20 Thread Yuxuan Shui
Hi,

Sorry for this late reply, I was busy for past few days.

On Fri, Mar 14, 2014 at 12:34 PM, Jeff King p...@peff.net wrote:
 On Wed, Mar 12, 2014 at 04:19:23PM +0800, Yuxuan Shui wrote:

 I'm Yuxuan Shui, a undergraduate student from China. I'm applying for
 GSoC 2014, and here is my proposal:

 I found this idea on the ideas page, and did some research about it.
 The pack bitmap patchset add a new .bitmap file for every pack file
 which contains the reachability information of selected commits. This
 information is used to speed up git fetching and cloning, and produce
 a very convincing results.

 The goal of my project is to port the pack bitmap implementation in
 core git to libgit2, so users of libgit2 could benefit from this
 optimization as well.

 Please let me know if my proposal makes sense, thanks.

 You'd want to flesh it out a bit more to show how you're thinking about
 tackling the problem:

   - What are the areas of libgit2 that you will need to touch? Be
 specific. What's the current state of the packing code? What
 files and functions will you need to touch?

Firstly I will need to implement bitmap creation in libgit2's
git_packbuilder_* functions (probably also git_odb_write_pack), so
libgit2 could support bitmap creation. Then I will need to change
git_revwalk_* functions to make them use bitmap. Since the operations
that can benefit from bitmap is, if my understanding is correct, all
using the git_revwalk_* functions, having bitmap support in revwalk
functions should be enough.

Files I need to touch probably are: revwalk.c pack-objects.c
If I need to change the API of packbuilder or revwalk functions I will
have to change the callers as well: push.c fetch.c and
transport/smart_protocol.c

I haven't read all the code to put together a list of functions I need
to change, but I think the list will be long.


   - What are the challenges you expect to encounter in porting the code?

The architecture differences between git and libgit2 will probably be
a challenge.


   - Can you give a detailed schedule of the summer's work? What will you
 work on in each week? What milestones do you expect to hit, and
 when?

I don't really have a plan, but I'll try to provide a rough schedule.

I'll read the code and try to understand the code, to the point where
I can start to add new code. This will probably take a week. For next
three or four weeks I should be implementing bitmap creation in
packbuilder. Then for the rest of time I will be optimizing revwalk
using bitmap.


 -Peff

--

Regards
Yuxuan Shui
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC][GSOC] Proposal Draft for GSoC, Suggest Changes

2014-03-20 Thread karthik nayak
Hello,
I have completed my microproject under the guidance of Eric, After
going through the code and
previous mailing lists. I have drafted my Proposal. Still going
through the code
as of now and figuring things out. Would be great to have your
suggestions on my proposal, so that i can improve it before submitting
it. Also have written the proposal in markdown for easier formatting.
Doesn't look pretty on plain text.
Thanks
Karthik


Git configuration API improvements

Abstract

Currently git_config() has a few issues which need to be addressed:

Reads and parses the configuration files each time.
Values cannot be unset,only can be set to false which has different
implications.
Repeated setting and un-setting of value in a particular new header,
leaves trails.

This project is to fix these problems while also retaining backward
compatibility
wherever git_config() is called, by implementing a cache for the config's in a
tree data structure, which provides for easier modification.

About Me

Name : Karthik Nayak
Email : karthik@gmail.com
College : BMS Institute of Technology
Studying : Engineering In Computer Science
IRC : nayak94
Phone : 91--XXX-XXX
Country : India
Interests : Guitar, Photography, Craft.
Github : KarthikNayak

Technical Experience

Have been Learning about the Linux Kernel and its implementation on the android
platform. Released also on XDA-Dev for the phones LG P500 and Xperia SP.
Working on a Library in C on various Sorting Techniques.
Contributed to the Open-Source Lab Manual for Colleges under VTU.
Active Member of Gnu/Linux Users Group in College and Free Software
Movement of Karnataka.

Why i Picked Git

This is my first attempt at GSOC and as I began going through the list
of organisations, what struck me is that
I haven't really used any of the software's of most of the listed
organisations. That's when I realized why
not contribute to something I use on a daily basis, this way I wont be
contributing only because I want to take
part in GSOC, rather I'd contribute because I would love to be a part
of something I use on a regular basis
and would be able to contribute to the project even after GSOC.

Proposal

Ideas Page : Git configuration API improvements

The Following improvements have to be made to how configs are handled in git :

Read all the config files once and store them in an appropriate data structure.

I suggest the use of an tree data structure to store the cache of the
config files.
I think tree data structure is a better choice over a hash - key data
structure as a tree data
structure although has a lower time efficiency than a hash - key data
structure while traversing
for a config request. A tree data structure can more optimal for
further improvements like
the problem with setting and unsetting of configs can be easily
handled as when a node under
a particular header is deleted the header can check if it has no
children nodes and on being true
can delete the header from the config file.

Change git_config() to iterate through the pre-read values in memory
rather than re-reading
the configuration files. This function should remain
backwards-compatible with the old implementation
so that callers don't have to all be rewritten at once.

Now whenever git_config() is called within a single invocation of git
it can traverse the
tree data structure already created and get the particular config.
This needs to maintain backward
compatibility. So the Basic functioning of functions like git_config()
and so on would change the
API should remain the same for the user invoking these calls.

Add new API functions that allow the cache to be inquired easily and
efficiently.
Rewrite callers to use the new API wherever possible.

Now that the base data structure and underlying changes have been made
for the data structure
to work have been made, we can now add various new API functions to
assist the usage of the data
structure. And also rewrite callers to use the new API's made available

Issues to be addressed

Headers and comments left are all configs under a header is deleted.

whenever we set and unset configs under a particular header it leaves
garbage value
behind, for example :

git config pull.rebase true
git config --unset pull.rebase
git config pull.rebase true
git config --unset pull.rebase

would result in :

[pull]
[pull]

And further changes made appear under the last header.
The issue also gives rise to comments being stranded within a header.

Possible Solution :

Make sure that the header is deleted whenever the last config under it
is deleted.
Also delete comments within a header and comments made above a particular config
when a config is removed and comments made above a header when the whole header
is being removed.

How to invalidate the cache correctly in the case that the
configuration is changed
while git is executing.

If config is being changed while git is currently running then the
changes need to be considered.

Possible Solution :

A simple

[RFC] [GSoC] Draft of Proposal for GSoC

2014-03-20 Thread Brian Bourn
Hi all,

This is a first draft of my Proposal for GSoC, I'd love feedback about
what I might be missing and any other files I should read regarding
this, so far I have read most of tag.c, branch.c,
builtin/for-each-ref.c, parse-options.c. once again I hope I can get
the same amount of helpful feedback as when I submitted my
Microproject.

My name is Brian Bourn, I'm currently a computer engineering student
at Columbia university in the city of New York.  I've used git since
my freshman year however this past week has been my first time
attempting to contribute to the project, and I loved it. I'd
particularly like to tackle Unifying git branch -l, git tag -l, and
git for-each-ref.  This functionality seems like an important update
to me as it will simplify usage of git throughout three different
commands, a noble pursuit which is not contained in any other project.

Going through the annals of the listserve thus far I've found a few
discussions which provide some insight towards this process as well as
some experimental patches that never seem to have made it
through[1][2][3][4]

I would start by beginning a deprecation plan for git branch -l very
similar to the one Junio presents in [5], moving -create-reflog to -g,

Following this I would begin the real work of the project which would
involve moving the following flag operations into a standard library
say 'list-options.h'

--contains [6]
--merged [7]
--no-merged[8]
--format
This Library would build these options for later interpretation by parse_options

Next I would implement these flags in the three files so that they are
uniform and the same formatting and list capabilities can be used on
all three. The formatting option will be especially useful for branch
and tag as it will allow users to better understand what is in each
ref that they grab.

For the most part I haven't finalized my weekly schedule but a basic
breakdown would be

Start-Midterm
Begin deprecation of -l
Spend some time reading *.c files even deeper
Build Library(dedicate Minimum one week per function moved)

Midterm-finish
Implement the list flags
Implement the format flags
(if time is left over, add some formatting)

Additionally I am thinking about adding some more formatting tools
such as numbering outputs. What do you all think of this?


[1]http://git.661346.n2.nabble.com/More-formatting-with-git-tag-l-tt6739049.html

[2]http://git.661346.n2.nabble.com/RFC-branch-list-branches-by-single-remote-tt6645679.html#a6725483

[3]http://git.661346.n2.nabble.com/RFC-PATCH-tag-make-list-exclude-lt-pattern-gt-tt7270451.html#a7338712

 
[4]http://git.661346.n2.nabble.com/RFC-branch-list-branches-by-single-remote-tt6645679.html#a6728878

[5]http://git.661346.n2.nabble.com/RFC-PATCH-0-2-RFC-POC-patterns-for-branch-list-tt6309233.html

[6]https://github.com/git/git/blob/master/builtin/branch.c#L817

[7] https://github.com/git/git/blob/master/builtin/branch.c#L849

[8] https://github.com/git/git/blob/master/builtin/branch.c#L843

Regards,
Brian Bourn
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] [GSoC] Draft of Proposal for GSoC

2014-03-20 Thread Brian Bourn
Hello again,

Please it would be very helpful for me to get some comments on this
proposal I would be very grateful towards anyone who could take some
time to look at it, even if it's just the wording.
Regards,
Brian Bourn


On Thu, Mar 20, 2014 at 2:15 PM, Brian Bourn ba.bo...@gmail.com wrote:
 Hi all,

 This is a first draft of my Proposal for GSoC, I'd love feedback about
 what I might be missing and any other files I should read regarding
 this, so far I have read most of tag.c, branch.c,
 builtin/for-each-ref.c, parse-options.c. once again I hope I can get
 the same amount of helpful feedback as when I submitted my
 Microproject.

 My name is Brian Bourn, I'm currently a computer engineering student
 at Columbia university in the city of New York.  I've used git since
 my freshman year however this past week has been my first time
 attempting to contribute to the project, and I loved it. I'd
 particularly like to tackle Unifying git branch -l, git tag -l, and
 git for-each-ref.  This functionality seems like an important update
 to me as it will simplify usage of git throughout three different
 commands, a noble pursuit which is not contained in any other project.

 Going through the annals of the listserve thus far I've found a few
 discussions which provide some insight towards this process as well as
 some experimental patches that never seem to have made it
 through[1][2][3][4]

 I would start by beginning a deprecation plan for git branch -l very
 similar to the one Junio presents in [5], moving -create-reflog to -g,

 Following this I would begin the real work of the project which would
 involve moving the following flag operations into a standard library
 say 'list-options.h'

 --contains [6]
 --merged [7]
 --no-merged[8]
 --format
 This Library would build these options for later interpretation by 
 parse_options

 Next I would implement these flags in the three files so that they are
 uniform and the same formatting and list capabilities can be used on
 all three. The formatting option will be especially useful for branch
 and tag as it will allow users to better understand what is in each
 ref that they grab.

 For the most part I haven't finalized my weekly schedule but a basic
 breakdown would be

 Start-Midterm
 Begin deprecation of -l
 Spend some time reading *.c files even deeper
 Build Library(dedicate Minimum one week per function moved)

 Midterm-finish
 Implement the list flags
 Implement the format flags
 (if time is left over, add some formatting)

 Additionally I am thinking about adding some more formatting tools
 such as numbering outputs. What do you all think of this?


 [1]http://git.661346.n2.nabble.com/More-formatting-with-git-tag-l-tt6739049.html

 [2]http://git.661346.n2.nabble.com/RFC-branch-list-branches-by-single-remote-tt6645679.html#a6725483

 [3]http://git.661346.n2.nabble.com/RFC-PATCH-tag-make-list-exclude-lt-pattern-gt-tt7270451.html#a7338712

  
 [4]http://git.661346.n2.nabble.com/RFC-branch-list-branches-by-single-remote-tt6645679.html#a6728878

 [5]http://git.661346.n2.nabble.com/RFC-PATCH-0-2-RFC-POC-patterns-for-branch-list-tt6309233.html

 [6]https://github.com/git/git/blob/master/builtin/branch.c#L817

 [7] https://github.com/git/git/blob/master/builtin/branch.c#L849

 [8] https://github.com/git/git/blob/master/builtin/branch.c#L843

 Regards,
 Brian Bourn
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] [GSoC] Draft of Proposal for GSoC

2014-03-20 Thread Jeff King
On Thu, Mar 20, 2014 at 02:15:29PM -0400, Brian Bourn wrote:

 Going through the annals of the listserve thus far I've found a few
 discussions which provide some insight towards this process as well as
 some experimental patches that never seem to have made it
 through[1][2][3][4]

Reading the past work in this area is a good way to get familiar with
it. It looks like most of the features discussed in the threads you link
have been implemented. The one exception seems to be negative patterns.
I think that would be a good feature to build on top of the unified
implementation, once all three commands are using it.

 I would start by beginning a deprecation plan for git branch -l very
 similar to the one Junio presents in [5], moving -create-reflog to -g,

That makes sense. I hadn't really considered -l as another point of
inconsistency between the commands, but it definitely is.

 Following this I would begin the real work of the project which would
 involve moving the following flag operations into a standard library
 say 'list-options.h'
 
 --contains [6]
 --merged [7]
 --no-merged[8]
 --format
 This Library would build these options for later interpretation by 
 parse_options

Can you sketch out what the API would look like for this unified
library? What calls would the 3 programs need to make into it?

 For the most part I haven't finalized my weekly schedule but a basic
 breakdown would be

Can you go into more detail here? Remember that writing code is only one
part of the project. You'll need to be submitting your work, getting
review and feedback, and iterating on it.

One problem that students have is queuing up a large amount of work to
send to the list. Then they twiddle their thumbs waiting for review to
come back (which takes a long time, because they just dumped a large
body of work on the reviewers). If you want to make effective use of
your time, it helps to try to break tasks down into smaller chunks, and
think about the dependencies between the chunks. When one chunk is in
review, you can be designing and coding on another.

 Additionally I am thinking about adding some more formatting tools
 such as numbering outputs. What do you all think of this?

Something like numbering might make sense as part of the formatting code
(e.g., a new placeholder that expands to n for the nth line of
output). I think that would be fairly straightforward, but again, it
makes sense to me to unify the implementations first, and then we can
build new features on top.

-Peff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC][GSoC] Calling for comments regarding rough draft of proposal

2014-03-19 Thread tanay abhra
Hi,
  I have already done the microproject, which has been merged into
main last week. I have prepared a rough draft of my proposal for
review, read all the previous mailing list threads about it.I am
reading the codebase little by little.

Please suggest improvements on the following topics,

1.I have read one-third of config.c and will complete reading it by
tomorrow.Is there any other  piece of code relevant to this proposal?

2.Other things I should add to the proposal that I have left off?I am
getting confused what extra details I should add to the proposal. I
will add
the informal parts(my background, schedule for summer etc) of the
proposal later.

3.Did I understand anything wrong or if my approach to solving
problems is incorrect,if yes, I will redraft my proposal according to
your suggestions.

--
#GSoC Proposal : Git configuration API improvements
---

#Proposed Improvements

* Fix git config --unset to clean up detritus from sections that are
left empty.

* Read the configuration from files once and cache the results in an
appropriate data structure in memory.

* Change `git_config()` to iterate through the pre-read values in
memory rather than re-reading the configuration
  files.

* Add new API calls that allow the cache to be inquired easily and
efficiently.  Rewrite other functions like
 `git_config_int()` to be cache-aware.

* Rewrite callers to use the new API wherever possible.

* How to invalidate the cache correctly in the case that the
configuration is changed while `git` is executing.

#Future Improvements

*Allow configuration values to be unset via a config file

--
##Changing the git_config api to retrieve values from memory

Approach:-

We parse the config file once, storing the raw values to records in
memory. After the whole config has been read, iterate through the records,
feeding the surviving values into the callback in the order they were
originally read
(minus deletions).

Path to follow for the api conversion,

1. Convert the parser to read into an in-memory representation, but
   leave git_config() as a wrapper which iterates over it.

2. Add query functions like config_string_get() which will inquire
cache for values efficiently.

3. Convert callbacks to query functions one by one.

I propose two approaches for the format of the internal cache,

1.Using a hashmap to map keys to their values.This would bring as an
 advantage, constant time lookups for the values.The implementation
 will be similar to dict data structure in python,

 for example, section.subsection --mapped-to-- multi_value_string

 This approach loses the relative order of different config keys.

2.Another approach would be to actually represent the syntax tree of the
  config file in memory. That would make lookups of individual keys more
  expensive, but would enable other manipulation. E.g., if the syntax
  tree included nodes for comments and other non-semantic constructs, then
  we can use it for a complete rewrite.

 And git config becomes:

  1. Read the tree.

  2. Perform operations on the tree (add nodes, delete nodes, etc).

  3. Write out the tree.

and things like remove the section header when the last item in the
section is removed become trivial during step 2.


I still prefer the hashmap way of implementing the cache,as empty
section headers  are not so problematic(no processing pitfalls) and
are sometimes annotated with comments  which become redundant and
confusing if the section header is removed.As for the aesthetic
problem
I propose a different solution for it below.

--
##Tidy configuration files

When a configuration file is repeatedly modified, often garbage is
left behind.  For example, after

git config pull.rebase true
git config --unset pull.rebase
git config pull.rebase true
git config --unset pull.rebase

the bottom of the configuration file is left with the useless lines

[pull]
[pull]

Also,setting a config value, appends the key-value pair at the end of
file without checking for empty main keys
even if the main key(like [my]) is already present and empty.It works
fine if the main key with an already present
sub-key.

for example:-
git config pull.rebase true
git config --unset pull.rebase
git config pull.rebase true
git config pull.option true
gives
[pull]
[pull]
rebase = true
option = true

Also, a possible detriment is presence of comments,
For Example:-
[my]
# This section is for my own private settings

Expected output:

  1. When we delete the last key in a section, we should be
 able to delete the section header.

  2. When we add a key into a section, we should be able to
 reuse

Re: [RFC][GSoC] Calling for comments regarding rough draft of proposal

2014-03-19 Thread Junio C Hamano
tanay abhra tanay...@gmail.com writes:

 2.Other things I should add to the proposal that I have left off?I am
 getting confused what extra details I should add to the proposal. I
 will add
 the informal parts(my background, schedule for summer etc) of the
 proposal later.

I would not label the schedule and success criteria informal;
without them how would one judge if the proposal has merits?

Other things like your background and previous achievements would
become relevant, after it is decided that the proposed project has
merits, to see if you are a good fit to work on that project, so I
agree with your message that it is sensible to defer them before the
other parts of the proposal is ironed out.

 #Proposed Improvements

 * Fix git config --unset to clean up detritus from sections that are
 left empty.

 * Read the configuration from files once and cache the results in an
 appropriate data structure in memory.

 * Change `git_config()` to iterate through the pre-read values in
 memory rather than re-reading the configuration
   files.

 * Add new API calls that allow the cache to be inquired easily and
 efficiently.  Rewrite other functions like
  `git_config_int()` to be cache-aware.

I think we already had a discussion to point out git_config_int() is
not a good example for this bullet point (check the list archive).
The approach seciton seems to use a more sensible example (point 2).

 * Rewrite callers to use the new API wherever possible.

 * How to invalidate the cache correctly in the case that the
 configuration is changed while `git` is executing.

I wouldn't list this as an item of list of improvements.

It is merely a point you have to be careful about because you are
doing other improvements based on read all into memory first and
do not re-read files approach, no?  In the current code, when
somebody does git_config_set() and then later uses git_config() to
grab the value of the variable set with the first call, we will read
the value written to the file with the first call.  With the
proposed change, if you parse from the file upfront, callers to
git_config_set() will need to somehow invalidate that stale copy in
memory, either updating only the changed part (harder) or just
discarding the cache (easy).

 ##Changing the git_config api to retrieve values from memory

 Approach:-

 We parse the config file once, storing the raw values to records in
 memory. After the whole config has been read, iterate through the records,
 feeding the surviving values into the callback in the order they were
 originally read
 (minus deletions).

 Path to follow for the api conversion,

 1. Convert the parser to read into an in-memory representation, but
leave git_config() as a wrapper which iterates over it.

 2. Add query functions like config_string_get() which will inquire
 cache for values efficiently.

 3. Convert callbacks to query functions one by one.

 I propose two approaches for the format of the internal cache,

 1.Using a hashmap to map keys to their values.This would bring as an
  advantage, constant time lookups for the values.The implementation
  will be similar to dict data structure in python,

  for example, section.subsection --mapped-to-- multi_value_string

I have no idea what you wanted to illustrate with that example at
all.

  This approach loses the relative order of different config keys.

As long as it keeps the order of multi-value elements, it should
not be a problem.

 2.Another approach would be to actually represent the syntax tree of the
   config file in memory. That would make lookups of individual keys more
   expensive, but would enable other manipulation. E.g., if the syntax
   tree included nodes for comments and other non-semantic constructs, then
   we can use it for a complete rewrite.

for a complete rewrite of what?

  And git config becomes:

   1. Read the tree.

   2. Perform operations on the tree (add nodes, delete nodes, etc).

   3. Write out the tree.

 and things like remove the section header when the last item in the
 section is removed become trivial during step 2.

Are you saying you will try both approaches during the summer?

You should be able to look-up quickly *and* to preserve order at the
same time within one approach, by either annotating the tree with a
hash, or the other way around to annotate the hash with each node
remembering where in the original file it came from (which you will
need to keep in order to report errors anyway).

 --
 ##Tidy configuration files

 When a configuration file is repeatedly modified, often garbage is
 left behind.  For example, after

 git config pull.rebase true
 git config --unset pull.rebase
 git config pull.rebase true
 git config --unset pull.rebase

 the bottom of the configuration file is left with the useless lines

 [pull]
 [pull]

 Also,setting a config value, appends the key-value pair at the end

[GSoC] Choosing a Project Proposal

2014-03-19 Thread Brian Bourn
Hi all,

I'm Currently trying to decide on a project to work on in for Google
Summer of Code, I'm stuck choosing between three which I find really
interesting and I was wondering if any of them are particularly more
pressing then the others.  I would also love some comments on each of
these three if possible expanding on them. the three projects I'm
considering are,

1.  Unifying git branch -l, git tag -l, and git for-each-ref

2.  Refactor tempfile handling

3.  Improve triangular workflow support


Once again, I would appreciate all feedback on which of these are most
important.

Thanks for the Help,
Brian Bourn
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: GSoC proposal: port pack bitmap support to libgit2.

2014-03-13 Thread Yuxuan Shui
Hi,

On Wed, Mar 12, 2014 at 4:19 PM, Yuxuan Shui yshu...@gmail.com wrote:
 Hi,

 I'm Yuxuan Shui, a undergraduate student from China. I'm applying for
 GSoC 2014, and here is my proposal:

 I found this idea on the ideas page, and did some research about it.
 The pack bitmap patchset add a new .bitmap file for every pack file
 which contains the reachability information of selected commits. This
 information is used to speed up git fetching and cloning, and produce
 a very convincing results.

 The goal of my project is to port the pack bitmap implementation in
 core git to libgit2, so users of libgit2 could benefit from this
 optimization as well.

 Please let me know if my proposal makes sense, thanks.

 P.S. I've submitted by microproject patch[1], but haven't received any
 response yet.

 [1]: http://thread.gmane.org/gmane.comp.version-control.git/243854

 --
 Regards
 Yuxuan Shui

Could anyone please review my proposal a little bit? Is this project
helpful and worth doing? Did I get anything wrong in my proposal?

Thanks.

-- 

Regards
Yuxuan Shui
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Proposal: Write git subtree info to .git/config

2014-03-13 Thread John Butterfield
Has there been any talk about adding a stub for git subtrees in .git/config?

The primary benefits would be:

1. Determine what sub directories of the project were at one time
pulled from another repo (where from and which commit id), without
having to attempt to infer this by scanning the log.
2. Simplify command syntax by providing a predictable default (ie.
last pulled from, last pushed to), and not requiring the repo argument
optional.
3. Improvement for default commit id to start split operations over
using --rejoin which creates blank log entries just so the log scan
can find it (afaict). It's a default either way, so it can still
always be explicitly specified.

If this information were available in the config, I think additional
features could be added as well:

- The command 'git subtree pull' for instance could be made to pull
*all* subtrees, similar to the way 'git submodule update' works.
- An option -i (interactive), or -p (prompt), etc. could be added that
confirms the defaults read from the config before actually executing
the command with implicit arguments, and the ability to modify the
arguments before the command actually executes.
- If the current working directory from which the command is run
happens to be a subtree specified in the config, the --prefix could
even be implied.


None of these ideas would break the way the command currently works
since it can still always take explicit arguments. There's a comment
in the documentation about the command that says:

 Unlike submodules, subtrees do not need any special constructions (like 
 .gitmodule files or gitlinks) be present in your repository

It would still be true that subtrees do not *need* any special config
settings, but that doesn't mean they are bad, and by having them the
command could be improved and made easier to use.

I'm happy to contribute the changes myself if this proposal is acceptable.
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Proposal: Write git subtree info to .git/config

2014-03-13 Thread Junio C Hamano
John Butterfield johnb...@gmail.com writes:

 Has there been any talk about adding a stub for git subtrees in .git/config?

I do not think so, and that is probably for a good reason.

A subtree biding can change over time, but .git/config is about
recording information that do not change depending on what tree you
are looking at, so there is an impedance mismatch---storing that
information in .git/config is probably a wrong way to go about it.

It might help to keep track of In this tree, the tip of that other
history is bound as a subtree at this path, which means that
information more naturally belongs to each tree, I would think.
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Proposal: Write git subtree info to .git/config

2014-03-13 Thread John Butterfield
 A subtree biding can change over time, but .git/config is about
recording information that do not change depending on what tree you
are looking at, so there is an impedance mismatch---storing that
information in .git/config is probably a wrong way to go about it.

I see. How about a .gitsubtrees config file in the root of a project?

 It might help to keep track of In this tree, the tip of that other
history is bound as a subtree at this path, which means that
information more naturally belongs to each tree, I would think.

Anything in the subdirectory must be part of the contents of the
subtree repo. It should not know how it is linked to it's parent
project; parents should know how their children are fetched. Therefore
it cannot live in the subtree.

Subtrees could be nested. So, should the config be in the root of the
parent subtree? This makes sense to me.

Example:

/
  A/
  B/# a subtree of (blah)
X/
Y/  # a subtree of (yada-yada)
Z/

So, lets say B has many updates remotely, including pushing and
pulling changes to Y.

When pulling the changes from B, it would be convenient for it to come
with the meta data, (subtree repo and commit info) for Y.

So how does that sound; Could we store subtree repo and commit id
references per folder in a .gitsubtrees file in the root of every
project?

(Project B is technically it's own project so it would pull it's own
.gitsubtrees in /B/.gitsubtrees)

`John

On Thu, Mar 13, 2014 at 4:36 PM, Junio C Hamano gits...@pobox.com wrote:
 John Butterfield johnb...@gmail.com writes:

 Has there been any talk about adding a stub for git subtrees in .git/config?

 I do not think so, and that is probably for a good reason.

 A subtree biding can change over time, but .git/config is about
 recording information that do not change depending on what tree you
 are looking at, so there is an impedance mismatch---storing that
 information in .git/config is probably a wrong way to go about it.

 It might help to keep track of In this tree, the tip of that other
 history is bound as a subtree at this path, which means that
 information more naturally belongs to each tree, I would think.
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Proposal: Write git subtree info to .git/config

2014-03-13 Thread John Butterfield
by per folder I meant, for each subtree

On Thu, Mar 13, 2014 at 5:43 PM, John Butterfield johnb...@gmail.com wrote:
 A subtree biding can change over time, but .git/config is about
 recording information that do not change depending on what tree you
 are looking at, so there is an impedance mismatch---storing that
 information in .git/config is probably a wrong way to go about it.

 I see. How about a .gitsubtrees config file in the root of a project?

 It might help to keep track of In this tree, the tip of that other
 history is bound as a subtree at this path, which means that
 information more naturally belongs to each tree, I would think.

 Anything in the subdirectory must be part of the contents of the
 subtree repo. It should not know how it is linked to it's parent
 project; parents should know how their children are fetched. Therefore
 it cannot live in the subtree.

 Subtrees could be nested. So, should the config be in the root of the
 parent subtree? This makes sense to me.

 Example:

 /
   A/
   B/# a subtree of (blah)
 X/
 Y/  # a subtree of (yada-yada)
 Z/

 So, lets say B has many updates remotely, including pushing and
 pulling changes to Y.

 When pulling the changes from B, it would be convenient for it to come
 with the meta data, (subtree repo and commit info) for Y.

 So how does that sound; Could we store subtree repo and commit id
 references per folder in a .gitsubtrees file in the root of every
 project?

 (Project B is technically it's own project so it would pull it's own
 .gitsubtrees in /B/.gitsubtrees)

 `John

 On Thu, Mar 13, 2014 at 4:36 PM, Junio C Hamano gits...@pobox.com wrote:
 John Butterfield johnb...@gmail.com writes:

 Has there been any talk about adding a stub for git subtrees in .git/config?

 I do not think so, and that is probably for a good reason.

 A subtree biding can change over time, but .git/config is about
 recording information that do not change depending on what tree you
 are looking at, so there is an impedance mismatch---storing that
 information in .git/config is probably a wrong way to go about it.

 It might help to keep track of In this tree, the tip of that other
 history is bound as a subtree at this path, which means that
 information more naturally belongs to each tree, I would think.
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: GSoC proposal: port pack bitmap support to libgit2.

2014-03-13 Thread Jeff King
On Wed, Mar 12, 2014 at 04:19:23PM +0800, Yuxuan Shui wrote:

 I'm Yuxuan Shui, a undergraduate student from China. I'm applying for
 GSoC 2014, and here is my proposal:
 
 I found this idea on the ideas page, and did some research about it.
 The pack bitmap patchset add a new .bitmap file for every pack file
 which contains the reachability information of selected commits. This
 information is used to speed up git fetching and cloning, and produce
 a very convincing results.
 
 The goal of my project is to port the pack bitmap implementation in
 core git to libgit2, so users of libgit2 could benefit from this
 optimization as well.
 
 Please let me know if my proposal makes sense, thanks.

You'd want to flesh it out a bit more to show how you're thinking about
tackling the problem:

  - What are the areas of libgit2 that you will need to touch? Be
specific. What's the current state of the packing code? What
files and functions will you need to touch?

  - What are the challenges you expect to encounter in porting the code?

  - Can you give a detailed schedule of the summer's work? What will you
work on in each week? What milestones do you expect to hit, and
when?

-Peff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


GSoC proposal: port pack bitmap support to libgit2.

2014-03-12 Thread Yuxuan Shui
Hi,

I'm Yuxuan Shui, a undergraduate student from China. I'm applying for
GSoC 2014, and here is my proposal:

I found this idea on the ideas page, and did some research about it.
The pack bitmap patchset add a new .bitmap file for every pack file
which contains the reachability information of selected commits. This
information is used to speed up git fetching and cloning, and produce
a very convincing results.

The goal of my project is to port the pack bitmap implementation in
core git to libgit2, so users of libgit2 could benefit from this
optimization as well.

Please let me know if my proposal makes sense, thanks.

P.S. I've submitted by microproject patch[1], but haven't received any
response yet.

[1]: http://thread.gmane.org/gmane.comp.version-control.git/243854

-- 
Regards
Yuxuan Shui
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [GSoC14][RFC] Proposal Draft: Refactor tempfile handling

2014-03-12 Thread Brian Gesiak
 Currently the linked list of lockfiles only grows, never shrinks.  Once
 an object has been linked into the list, there is no way to remove it
 again even after the lock has been released.  So if a lock needs to be
 created dynamically at a random place in the code, its memory is
 unavoidably leaked.

Ah yes, I see. I think a good example is
config.git_config_set_multivar_in_file, which even contains a comment
detailing the problem: Since lockfile.c keeps a linked list of all
created lock_file structures, it isn't safe to free(lock).  It's
better to just leave it hanging around.

 But I have a feeling that if we want to use a similar mechanism to
 handle all temporary files (of which there can be more), then it would
 be a good idea to lift this limitation.  It will require some care,
 though, to make sure that record removal is done in a way that is
 threadsafe and safe in the event of all expected kinds of process death.

It sounds like a threadsafe linked-list with an interface to manually
remove elements from the list is the solution here; does that sound
reasonable? Ensuring thread safety without sacrificing readability is
probably more difficult than it sounds, but I don't think it's
impossible.

I'll add some more details on this to my proposal[1]. Thank you!

- Brian Gesiak

[1] 
https://www.google-melange.com/gsoc/proposal/review/student/google/gsoc2014/modocache/5629499534213120
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [GSoC14][RFC] Proposal Draft: Refactor tempfile handling

2014-03-11 Thread Michael Haggerty
On 03/01/2014 10:04 PM, Brian Gesiak wrote:
 Hello all,
 
 My name is Brian Gesiak. I'm a research student at the University of
 Tokyo, and I'm hoping to participate in this year's Google Summer of
 Code by contributing to Git. I'm a longtime user, first-time
 contributor--some of you may have noticed my microproject
 patches.[1][2]
 
 I'd like to gather some information on one of the GSoC ideas posted on
 the ideas page. Namely, I'm interested in refactoring the way
 tempfiles are cleaned up.
 
 The ideas page points out that while lock files are closed and
 unlinked[3] when the program exits[4], object pack files implement
 their own brand of temp file creation and deletion. This
 implementation doesn't share the same guarantees as lock files--it is
 possible that the program terminates before the temp file is
 unlinked.[5]
 
 Lock file references are stored in a linked list. When the program
 exits, this list is traversed and each file is closed and unlinked. It
 seems to me that this mechanism is appropriate for temp files in
 general, not just lock files. Thus, my proposal would be to extract
 this logic into a separate module--tempfile.h, perhaps. Lock and
 object files would share the tempfile implementation.
 
 That is, both object and lock temp files would be stored in a linked
 list, and all of these would be deleted at program exit.
 
 I'm very enthused about this project--I think it has it all:
 
 - Tangible benefits for the end-user
 - Reduced complexity in the codebase
 - Ambitious enough to be interesting
 - Small enough to realistically be completed in a summer
 
 Please let me know if this seems like it would make for an interesting
 proposal, or if perhaps there is something I am overlooking. Any
 feedback at all would be appreciated. Thank you!

Hi Brian,

Thanks for your proposal.  I have a technical point that I think your
proposal should address:

Currently the linked list of lockfiles only grows, never shrinks.  Once
an object has been linked into the list, there is no way to remove it
again even after the lock has been released.  So if a lock needs to be
created dynamically at a random place in the code, its memory is
unavoidably leaked.

This hasn't been much of a problem in the past because (1) the number of
locks acquired/released during a Git invocation is reasonable, and (2) a
lock object (even if it is already in the list) can be reused after the
lock has been released.  So there are many lock callsites that define
one static lock instance and use it over and over again.

But I have a feeling that if we want to use a similar mechanism to
handle all temporary files (of which there can be more), then it would
be a good idea to lift this limitation.  It will require some care,
though, to make sure that record removal is done in a way that is
threadsafe and safe in the event of all expected kinds of process death.

Michael

-- 
Michael Haggerty
mhag...@alum.mit.edu
http://softwareswirl.blogspot.com/
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [GSoC14][RFC] Proposal Draft: Refactor tempfile handling

2014-03-11 Thread Jeff King
On Tue, Mar 11, 2014 at 05:27:05PM +0100, Michael Haggerty wrote:

 Thanks for your proposal.  I have a technical point that I think your
 proposal should address:
 
 Currently the linked list of lockfiles only grows, never shrinks.  Once
 an object has been linked into the list, there is no way to remove it
 again even after the lock has been released.  So if a lock needs to be
 created dynamically at a random place in the code, its memory is
 unavoidably leaked.

Thanks, I remember thinking about this when I originally conceived of
the idea, but I forgot to mention it in the idea writeup.

In most cases the potential leaks are finite and small, but object
creation and diff tempfiles could both be unbounded. So this is
definitely something to consider. In both cases we have a bounded number
of _simultaneous_ tempfiles, so one strategy could be to continue using
static objects. But it should not be hard to do it dynamically, and I
suspect the resulting API will be a lot easier to comprehend.

-Peff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [GSoC14][RFC] Proposal Draft: Refactor tempfile handling

2014-03-10 Thread Jeff King
On Sun, Mar 09, 2014 at 02:04:16AM +0900, Brian Gesiak wrote:

  Once the logic is extracted into a nice API, there are
  several other places that can use it, too: ...
 
 I've found the following four areas so far:
 
 1. lockfile.lock_file
 2. git-compat-util.odb_mkstemp
 3. git-compat-util.odb_pack_keep
 4. diff.prepare_temp_file
 
 Tons of files use (1) and (2). (3) is less common, and (4) is only
 used for external diffs.

Yeah, I would expect (1) and (2) to be the most frequent. (3) gets
written on every push and fetch, but only for a short period. (4) is
also used for diff's textconv, though like external diffs, they are
relatively rare.

In my experience, most of the cruft that gets left is from (2), since a
push or fetch will spool to a tmpfile, then verify the results via git
index-pack. Any failure there leaves the file in place.

There are a few other potential candidates we can find by grepping for
mkstemp. Not all of those might want cleanup, but it's a starting point
for investigation.

  the shallow_XX tempfiles
 
 I'm not sure I was able to find this one. Are you referring to the
 lock files used when fetching, such as in fetch-pack.c?

I mean the xmkstemp from setup_temporary_shallow in shallow.c.

 I'd say the biggest difference between lockfiles and object files is
 that tempfile methods like odb_mkstemp need to know the location of
 the object directory. Aside from that, lockfiles and the external diff
 files appear to be cleaned up at exit, while temporary object files
 tend to have a more finely controlled lifecycle. I'm still
 investigating this aspect of the proposal, though.

The diff tempfiles are true tempfiles; they always go away in the end
(though of course we want to clean them up as we finish with them,
rather than doing it all at the end). Lockfiles may get committed into
place (i.e., via atomic rename) or rolled back (deleted).

Object files should generally be hard-linked into place, but there is
some extra magic in move_temp_to_file to fallback to renames.  Some of
that we may be able to get rid of (e.g., we try to avoid doing
cross-directory renames at all these days, so the comment there may be
out of date).

 One question, though: the idea on the ideas page specifies that
 temporary pack and object files may optionally be cleaned up in case
 of error during program execution. How will users specify their
 preference? I think the API for creating temporary files should allow
 cleanup options to be specified on a per-file basis. That way each
 part of the program that creates tempfiles can specify a different
 config value to determine the cleanup policy.

That probably makes sense. I certainly had a config option in mind. I
mentioned above that the most common cruft is leftover packfiles from
pushes and fetches. We haven't deleted those historically because the
same person often controls both the client and the server, and they
would want to possibly do forensics on the packfile sent to the remote,
or even rescue objects out of it. But the remote end may simply have
rejected the pack by some policy, and has no interest in forensics.

Having a config option for each type of file may be cool, but I don't
know how useful it would be in practice. Still, it's certainly worth
thinking about and looking into.

-Peff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [GSoC14][RFC] Proposal Draft: Refactor tempfile handling

2014-03-08 Thread Brian Gesiak
Excellent, thank you very much for the feedback, Jeff! It was very
helpful and encouraging. I've done some more research based on your
comments.

 Once the logic is extracted into a nice API, there are
 several other places that can use it, too: ...

I've found the following four areas so far:

1. lockfile.lock_file
2. git-compat-util.odb_mkstemp
3. git-compat-util.odb_pack_keep
4. diff.prepare_temp_file

Tons of files use (1) and (2). (3) is less common, and (4) is only
used for external diffs.

 the shallow_XX tempfiles

I'm not sure I was able to find this one. Are you referring to the
lock files used when fetching, such as in fetch-pack.c?

 What are the mismatches in how lockfiles and object files are
 handled? E.g., how do we finalize them into place? How should
 the API be designed to minimize race conditions (e.g., if we
 get a signal delivered while we are committing or cleaning up a file)?

I'd say the biggest difference between lockfiles and object files is
that tempfile methods like odb_mkstemp need to know the location of
the object directory. Aside from that, lockfiles and the external diff
files appear to be cleaned up at exit, while temporary object files
tend to have a more finely controlled lifecycle. I'm still
investigating this aspect of the proposal, though.

One question, though: the idea on the ideas page specifies that
temporary pack and object files may optionally be cleaned up in case
of error during program execution. How will users specify their
preference? I think the API for creating temporary files should allow
cleanup options to be specified on a per-file basis. That way each
part of the program that creates tempfiles can specify a different
config value to determine the cleanup policy.

Thanks for all your help so far!

- Brian Gesiak

PS: I'm maintaining a working draft of my proposal here, in case
anyone wants to offer any feedback prior to its submission:
https://gist.github.com/modocache/9434914


On Tue, Mar 4, 2014 at 7:42 AM, Jeff King p...@peff.net wrote:
 On Sun, Mar 02, 2014 at 06:04:39AM +0900, Brian Gesiak wrote:

 My name is Brian Gesiak. I'm a research student at the University of
 Tokyo, and I'm hoping to participate in this year's Google Summer of
 Code by contributing to Git. I'm a longtime user, first-time
 contributor--some of you may have noticed my microproject
 patches.[1][2]

 Yes, we did notice them. Thanks, and welcome. :)

 The ideas page points out that while lock files are closed and
 unlinked[3] when the program exits[4], object pack files implement
 their own brand of temp file creation and deletion. This
 implementation doesn't share the same guarantees as lock files--it is
 possible that the program terminates before the temp file is
 unlinked.[5]

 Lock file references are stored in a linked list. When the program
 exits, this list is traversed and each file is closed and unlinked. It
 seems to me that this mechanism is appropriate for temp files in
 general, not just lock files. Thus, my proposal would be to extract
 this logic into a separate module--tempfile.h, perhaps. Lock and
 object files would share the tempfile implementation.

 That is, both object and lock temp files would be stored in a linked
 list, and all of these would be deleted at program exit.

 Yes, I think this is definitely the right way to go. We should be able
 to unify the tempfile handling for all of git. Once the logic is
 extracted into a nice API, there are several other places that can use
 it, too:

   - the external diff code creates tempfiles and uses its own cleanup
 routines

   - the shallow_XX tempfiles (these are not cleaned right now,
 though I sent a patch recently for them to do their own cleanup)

 Those are just off the top of my head. There may be other spots, too.

 It is worth thinking in your proposal about some of the things that the
 API will want to handle. What are the mismatches in how lockfiles and
 object files are handled? E.g., how do we finalize them into place?
 How should the API be designed to minimize race conditions (e.g., if we
 get a signal delivered while we are committing or cleaning up a file)?

 Please let me know if this seems like it would make for an interesting
 proposal, or if perhaps there is something I am overlooking. Any
 feedback at all would be appreciated. Thank you!

 You definitely have a grasp of what the project is aiming for, and which
 areas need to be touched.

 -Peff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [GSoC14][RFC] Proposal Draft: Refactor tempfile handling

2014-03-03 Thread Jeff King
On Sun, Mar 02, 2014 at 06:04:39AM +0900, Brian Gesiak wrote:

 My name is Brian Gesiak. I'm a research student at the University of
 Tokyo, and I'm hoping to participate in this year's Google Summer of
 Code by contributing to Git. I'm a longtime user, first-time
 contributor--some of you may have noticed my microproject
 patches.[1][2]

Yes, we did notice them. Thanks, and welcome. :)

 The ideas page points out that while lock files are closed and
 unlinked[3] when the program exits[4], object pack files implement
 their own brand of temp file creation and deletion. This
 implementation doesn't share the same guarantees as lock files--it is
 possible that the program terminates before the temp file is
 unlinked.[5]
 
 Lock file references are stored in a linked list. When the program
 exits, this list is traversed and each file is closed and unlinked. It
 seems to me that this mechanism is appropriate for temp files in
 general, not just lock files. Thus, my proposal would be to extract
 this logic into a separate module--tempfile.h, perhaps. Lock and
 object files would share the tempfile implementation.
 
 That is, both object and lock temp files would be stored in a linked
 list, and all of these would be deleted at program exit.

Yes, I think this is definitely the right way to go. We should be able
to unify the tempfile handling for all of git. Once the logic is
extracted into a nice API, there are several other places that can use
it, too:

  - the external diff code creates tempfiles and uses its own cleanup
routines

  - the shallow_XX tempfiles (these are not cleaned right now,
though I sent a patch recently for them to do their own cleanup)

Those are just off the top of my head. There may be other spots, too.

It is worth thinking in your proposal about some of the things that the
API will want to handle. What are the mismatches in how lockfiles and
object files are handled? E.g., how do we finalize them into place?
How should the API be designed to minimize race conditions (e.g., if we
get a signal delivered while we are committing or cleaning up a file)?

 Please let me know if this seems like it would make for an interesting
 proposal, or if perhaps there is something I am overlooking. Any
 feedback at all would be appreciated. Thank you!

You definitely have a grasp of what the project is aiming for, and which
areas need to be touched.

-Peff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[GSoC14][RFC] Proposal Draft: Refactor tempfile handling

2014-03-01 Thread Brian Gesiak
Hello all,

My name is Brian Gesiak. I'm a research student at the University of
Tokyo, and I'm hoping to participate in this year's Google Summer of
Code by contributing to Git. I'm a longtime user, first-time
contributor--some of you may have noticed my microproject
patches.[1][2]

I'd like to gather some information on one of the GSoC ideas posted on
the ideas page. Namely, I'm interested in refactoring the way
tempfiles are cleaned up.

The ideas page points out that while lock files are closed and
unlinked[3] when the program exits[4], object pack files implement
their own brand of temp file creation and deletion. This
implementation doesn't share the same guarantees as lock files--it is
possible that the program terminates before the temp file is
unlinked.[5]

Lock file references are stored in a linked list. When the program
exits, this list is traversed and each file is closed and unlinked. It
seems to me that this mechanism is appropriate for temp files in
general, not just lock files. Thus, my proposal would be to extract
this logic into a separate module--tempfile.h, perhaps. Lock and
object files would share the tempfile implementation.

That is, both object and lock temp files would be stored in a linked
list, and all of these would be deleted at program exit.

I'm very enthused about this project--I think it has it all:

- Tangible benefits for the end-user
- Reduced complexity in the codebase
- Ambitious enough to be interesting
- Small enough to realistically be completed in a summer

Please let me know if this seems like it would make for an interesting
proposal, or if perhaps there is something I am overlooking. Any
feedback at all would be appreciated. Thank you!

- Brian Gesiak

[1] http://thread.gmane.org/gmane.comp.version-control.git/242891
[2] http://thread.gmane.org/gmane.comp.version-control.git/242893
[3] https://github.com/git/git/blob/v1.9.0/lockfile.c#L18
[4] https://github.com/git/git/blob/v1.9.0/lockfile.c#L143
[5] https://github.com/git/git/blob/v1.9.0/pack-write.c#L350
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Business Proposal

2014-02-18 Thread frederique.berrucas
I am Mr. Mr. Leung Wing Lok and I work with Hang Seng Bank, Hong Kong. I have a 
Business Proposal of $19,500,000.00 of mutual benefits. Contact me via 
leungwlok...@yahoo.com.vn
for more info.--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Proposal] Clonable scripts

2013-09-10 Thread Niels Basjes
Hi,

On Tue, Sep 10, 2013 at 12:18 AM, Ramkumar Ramachandra
artag...@gmail.com wrote:
 Niels Basjes wrote:
 As we all know the hooks ( in .git/hooks ) are not cloned along with
 the code of a project.
 Now this is a correct approach for the scripts that do stuff like
 emailing the people responsible for releases or submitting the commit
 to a CI system.

 More often than not, maintainers come with these hooks and they keep
 them private.

Yes.

 Initially I wanted to propose introducing fully clonable (pre-commit)
 hook scripts.
 However I can imagine that a malicious opensource coder can create a
 github repo and try to hack the computer of a contributer via those
 scripts. So having such scripts is a 'bad idea'.

 I think it's a good idea, since the contributer can look through the scripts.

What I meant to say is that having fully functional unrestricted
scripts that are cloned is a bad idea.
Having restricted cloned scripts to me is a goog idea (or atleast,
that is what I propose here).


 3) For the regular hooks this language is also support and when
 located in the (not cloned!) .git/hooks directory they are just as
 powerful as a normal script (i.e. can control CI, send emails, etc.).

 I'm confused now; how can .git/hooks be as powerful as .githooks? The
 former users should consider uploading their code on GitHub.

The way I envisioned is is that the scripting language in .git/hooks
is pick any language you like with the builtin language as a new
addition.
In the .githooks (which is under version control in the code base and
cloned) is a the same builtin language, yet constrained in a sandbox.

 Which reminds me that we need to have GitTogethers. Thanks for this!

You're welcome.

-- 
Best regards / Met vriendelijke groeten,

Niels Basjes
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Proposal] Clonable scripts

2013-09-10 Thread Sitaram Chamarty
On 09/10/2013 02:18 AM, Niels Basjes wrote:

 As we all know the hooks ( in .git/hooks ) are not cloned along with
 the code of a project.
 Now this is a correct approach for the scripts that do stuff like
 emailing the people responsible for releases or submitting the commit
 to a CI system.
 
 For several other things it makes a lot of sense to give the developer
 immediate feedback. Things like the format of the commit message (i.e.
 it must start with an issue tracker id) or compliance with a coding
 standard.
 
 Initially I wanted to propose introducing fully clonable (pre-commit)
 hook scripts.
 However I can imagine that a malicious opensource coder can create a
 github repo and try to hack the computer of a contributer via those
 scripts. So having such scripts is a 'bad idea'.
 
 If those scripts were how ever written in a language that is build
 into the git program and the script are run in such a way that they
 can only interact with the files in the local git (and _nothing_
 outside of that) this would be solved.
 
 Also have a builtin scripting language also means that this would run
 on all operating systems (yes, even Windows).
 
 So I propose the following new feature:
 
 1) A scripting language is put inside git. Perhaps a version of python
 or ruby or go or ... (no need for a 'new' language)
 
 2) If a project contains a folder called .githooks in the root of the
 code base then the rules/scripts that are present there are executed
 ONLY on the system doing the actual commit. These scripts are run in
 such a limited way that they can only read the files in the
 repository, they cannot do any networking/write to disk/etc and they
 can only do a limited set op actions against the current operation at
 hand (i.e. do checks, parse messages, etc).
 
 3) For the regular hooks this language is also support and when
 located in the (not cloned!) .git/hooks directory they are just as
 powerful as a normal script (i.e. can control CI, send emails, etc.).
 
 Like I said, this is just a proposal and I would like to know what you
 guys think.

I am not in favour of any idea like this.  It will end in some sort of
compromise (in both sense of the word!)

It has to be voluntary, but we can make it easier.  I suggest something
like this:

  - some special directory can have normal hook files, but it's just a
place holder.

  - each hook code file comes with some meta data at the top, say
githook name, hook name, version, remote-name.  I'll use these
examples:

pre-commit  crlf-check  1.1 origin

  - on a clone/pull, if there is a change to any of these code files
when compared to the previous HEAD, and if the program is running
interactively, then you can ask and setup these hooks.

The purpose of the remote name in the stored metadata is that we
don't want to bother updating when we pull from some other repo,
like when merging a feature branch.

The purpose of the version number is so you can do some intelligent
things, even silently upgrade under certain conditions.

All we're doing is making things easier compared to what you can already
do even now (which is completely manual and instructions based).

I don't think anything more intrusive or forced is wise.

And people who say it is OK, I'm going to seriously wonder if you work
for the NSA (directly or indirectly).  Sadly, that is not meant to be a
joke question; such is life now.
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Proposal] Clonable scripts

2013-09-10 Thread Andreas Krey
On Mon, 09 Sep 2013 22:48:42 +, Niels Basjes wrote:
...
 However I can imagine that a malicious opensource coder can create a
 github repo and try to hack the computer of a contributer via those
 scripts. So having such scripts is a 'bad idea'.

Given that half the repos out there are cloned to 'make install' in
them...it's still a bad idea.

 If those scripts were how ever written in a language that is build
 into the git program and the script are run in such a way that they
 can only interact with the files in the local git (and _nothing_
 outside of that) this would be solved.

I still think this is a nightmare of maintenance. You'd need a restricted
version of a language that doesn't allow access outside the repo (and
no TCP either), and someone will always miss some module...

Not that it wouldn't be cool, yet.

...
 Like I said, this is just a proposal and I would like to know what you
 guys think.

I think there are generally two use cases:

- Many people working on repos in an organization. Give them a wrapper
  script that does the clone (and also knows the clone URL already),
  that will set up hooks and configuration as needed.

- github-style cooperation. Add a make hooks to your Makefile that sets
  up the hooks your project seems to want. After all, this is for the
  developers to pre-check what they will submit, so it is in their own
  interest to have (and cross-read) the hooks.

Andreas

-- 
Totally trivial. Famous last words.
From: Linus Torvalds torvalds@*.org
Date: Fri, 22 Jan 2010 07:29:21 -0800
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Proposal] Clonable scripts

2013-09-09 Thread Niels Basjes
Hi,

As we all know the hooks ( in .git/hooks ) are not cloned along with
the code of a project.
Now this is a correct approach for the scripts that do stuff like
emailing the people responsible for releases or submitting the commit
to a CI system.

For several other things it makes a lot of sense to give the developer
immediate feedback. Things like the format of the commit message (i.e.
it must start with an issue tracker id) or compliance with a coding
standard.

Initially I wanted to propose introducing fully clonable (pre-commit)
hook scripts.
However I can imagine that a malicious opensource coder can create a
github repo and try to hack the computer of a contributer via those
scripts. So having such scripts is a 'bad idea'.

If those scripts were how ever written in a language that is build
into the git program and the script are run in such a way that they
can only interact with the files in the local git (and _nothing_
outside of that) this would be solved.

Also have a builtin scripting language also means that this would run
on all operating systems (yes, even Windows).

So I propose the following new feature:

1) A scripting language is put inside git. Perhaps a version of python
or ruby or go or ... (no need for a 'new' language)

2) If a project contains a folder called .githooks in the root of the
code base then the rules/scripts that are present there are executed
ONLY on the system doing the actual commit. These scripts are run in
such a limited way that they can only read the files in the
repository, they cannot do any networking/write to disk/etc and they
can only do a limited set op actions against the current operation at
hand (i.e. do checks, parse messages, etc).

3) For the regular hooks this language is also support and when
located in the (not cloned!) .git/hooks directory they are just as
powerful as a normal script (i.e. can control CI, send emails, etc.).

Like I said, this is just a proposal and I would like to know what you
guys think.

-- 
Best regards / Met vriendelijke groeten,

Niels Basjes
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Proposal] Clonable scripts

2013-09-09 Thread Hilco Wijbenga
On 9 September 2013 13:48, Niels Basjes ni...@basjes.nl wrote:
 If those scripts were how ever written in a language that is build
 into the git program and the script are run in such a way that they
 can only interact with the files in the local git (and _nothing_
 outside of that) this would be solved.

That sounds interesting.

 Also have a builtin scripting language also means that this would run
 on all operating systems (yes, even Windows).

This would be *very* helpful. It's a total pain trying to get hooks
working across different OSes.

 So I propose the following new feature:

 1) A scripting language is put inside git. Perhaps a version of python
 or ruby or go or ... (no need for a 'new' language)

That sounds nice but ...

 2) If a project contains a folder called .githooks in the root of the
 code base then the rules/scripts that are present there are executed
 ONLY on the system doing the actual commit. These scripts are run in
 such a limited way that they can only read the files in the
 repository, they cannot do any networking/write to disk/etc and they
 can only do a limited set op actions against the current operation at
 hand (i.e. do checks, parse messages, etc).

... how would you prevent Ruby/Python/Go/$GeneralProgLang from
executing arbitrary code?

 Like I said, this is just a proposal and I would like to know what you
 guys think.

I love the idea but I'm not sure how feasible it is. I think you would
be forced to copy an existing language and somehow make it secure
(seems like a maintenance nightmare) or to create your own language
(potentially a lot of work). But perhaps something more declarative
might be usable?
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Proposal] Clonable scripts

2013-09-09 Thread Niels Basjes
On Mon, Sep 9, 2013 at 11:13 PM, Hilco Wijbenga
hilco.wijbe...@gmail.com wrote:
 On 9 September 2013 13:48, Niels Basjes ni...@basjes.nl wrote:
 So I propose the following new feature:

 1) A scripting language is put inside git. Perhaps a version of python
 or ruby or go or ... (no need for a 'new' language)

 That sounds nice but ...

 2) If a project contains a folder called .githooks in the root of the
 code base then the rules/scripts that are present there are executed
 ONLY on the system doing the actual commit. These scripts are run in
 such a limited way that they can only read the files in the
 repository, they cannot do any networking/write to disk/etc and they
 can only do a limited set op actions against the current operation at
 hand (i.e. do checks, parse messages, etc).

 ... how would you prevent Ruby/Python/Go/$GeneralProgLang from
 executing arbitrary code?

Some kind of sandbox?

 Like I said, this is just a proposal and I would like to know what you
 guys think.

 I love the idea but I'm not sure how feasible it is. I think you would
 be forced to copy an existing language and somehow make it secure
 (seems like a maintenance nightmare) or to create your own language
 (potentially a lot of work). But perhaps something more declarative
 might be usable?

As far as I'm concerned it should be the 'best suitable' language for
the task at hand.

-- 
Best regards / Met vriendelijke groeten,

Niels Basjes
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Proposal] Clonable scripts

2013-09-09 Thread Ramkumar Ramachandra
Niels Basjes wrote:
 As we all know the hooks ( in .git/hooks ) are not cloned along with
 the code of a project.
 Now this is a correct approach for the scripts that do stuff like
 emailing the people responsible for releases or submitting the commit
 to a CI system.

More often than not, maintainers come with these hooks and they keep
them private.

 For several other things it makes a lot of sense to give the developer
 immediate feedback. Things like the format of the commit message (i.e.
 it must start with an issue tracker id) or compliance with a coding
 standard.

i.e. tracker ID. Compliance is simply a request. The developer must be
able to pick it up from surrounding style.

 Initially I wanted to propose introducing fully clonable (pre-commit)
 hook scripts.
 However I can imagine that a malicious opensource coder can create a
 github repo and try to hack the computer of a contributer via those
 scripts. So having such scripts is a 'bad idea'.

I think it's a good idea, since the contributer can look through the scripts.

 If those scripts were how ever written in a language that is build
 into the git program and the script are run in such a way that they
 can only interact with the files in the local git (and _nothing_
 outside of that) this would be solved.

GNU make.

 Also have a builtin scripting language also means that this would run
 on all operating systems (yes, even Windows).

kbuild tends to get complicated.

 So I propose the following new feature:

 1) A scripting language is put inside git. Perhaps a version of python
 or ruby or go or ... (no need for a 'new' language)

make + go sounds like a good alternative.

 2) If a project contains a folder called .githooks in the root of the
 code base then the rules/scripts that are present there are executed
 ONLY on the system doing the actual commit. These scripts are run in
 such a limited way that they can only read the files in the
 repository, they cannot do any networking/write to disk/etc and they
 can only do a limited set op actions against the current operation at
 hand (i.e. do checks, parse messages, etc).

Submodules and url.url.insteadOf come in handy here.

 3) For the regular hooks this language is also support and when
 located in the (not cloned!) .git/hooks directory they are just as
 powerful as a normal script (i.e. can control CI, send emails, etc.).

I'm confused now; how can .git/hooks be as powerful as .githooks? The
former users should consider uploading their code on GitHub.

 Like I said, this is just a proposal and I would like to know what you
 guys think.

 Best regards / Met vriendelijke groeten,

Which reminds me that we need to have GitTogethers. Thanks for this!
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: A naive proposal for preventing loose object explosions

2013-09-06 Thread Junio C Hamano
mf...@codeaurora.org writes:

 Object lookups should likely not get any slower than if
 repack were not run, and the extra new pack might actually help
 find some objects quicker.

In general, having an extra pack, only to keep objects that you know
are available in other packs, will make _all_ object accesses, not
just the ones that are contained in that extra pack, slower.

Instead of mmapping all the .idx files for all the available
packfiles, we could build a table that records, for each packed
object, from which packfile at what offset the data is available to
optimize the access, but obviously building that in-core table will
take time, so it may not be a good trade-off to do so at runtime (a
precomputed super-.idx that we can mmap at runtime might be a good
way forward if that turns out to be the case).

 Does this sound like it would work?

Sorry, but it is unclear what problem you are trying to solve.

Is it that you do not like that repack -A ejects unreferenced
objects and makes it loose, which you may have many?

The loosen_unused_packed_objects() function used by repack -A
calls the force_object_loose() function (actually, it is the sole
caller of the function).  If you tweak the latter to stream to a
single new graveyard packfile and mark it as kept until expiry,
would it solve the issue the same way but with much smaller impact?

There already is an infrastructure available to open a single output
packfile and send multiple objects to it in bulk-checkin.c, and I am
wondering if you can take advantage of the framework.  The existing
interface to it assumes that the object data is coming from a file
descriptor (the interface was built to support bulk-checkin of many
objects in an empty repository), and it needs refactoring to allow
stream_to_pack() to take different kind of data sources in the form
of stateful callback function, though.

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: A naive proposal for preventing loose object explosions

2013-09-06 Thread Martin Fick
On Friday, September 06, 2013 11:19:02 am Junio C Hamano 
wrote:
 mf...@codeaurora.org writes:
  Object lookups should likely not get any slower than if
  repack were not run, and the extra new pack might
  actually help find some objects quicker.
 
 In general, having an extra pack, only to keep objects
 that you know are available in other packs, will make
 _all_ object accesses, not just the ones that are
 contained in that extra pack, slower.

My assumption was that if the new pack, with all the 
consolidated reachable objects in it, happens to be searched 
first, it would actually speed things up.  And if it is 
searched last, then the objects weren't in the other packs 
so how could it have made it slower?  It seems this would 
only slow down the missing object path?

But it sounds like all the index files are mmaped up front?  
Then yes, I can see how it would slow things down.  However, 
it is one only extra (hopefully now well optimized) pack.  
My base assumption was that even if it does slow things 
down, it would likely be unmeasurable and a price worth 
paying to avoid an extreme penalty.


 Instead of mmapping all the .idx files for all the
 available packfiles, we could build a table that
 records, for each packed object, from which packfile at
 what offset the data is available to optimize the
 access, but obviously building that in-core table will
 take time, so it may not be a good trade-off to do so at
 runtime (a precomputed super-.idx that we can mmap at
 runtime might be a good way forward if that turns out to
 be the case).
 
  Does this sound like it would work?
 
 Sorry, but it is unclear what problem you are trying to
 solve.

I think you guessed it below, I am trying to prevent loose 
object explosions by keeping unreachable objects around in 
packs (instead of loose) until expiry.  With the current way 
that pack-objects works, this is the best I could come up 
with (I said naive). :(

Today the git-repack calls git pack-objects like this:

git pack-objects --keep-true-parents --honor-pack-keep --
non-empty --all --reflog $args /dev/null $PACKTMP

This has no mechanism to place unreachable objects in a 
pack.  If git pack-objects supported an option which 
streamed them to a separate file (as you suggest below), 
that would likely be the main piece needed to avoid the 
heavy-handed approach I was suggesting.  

The problem is how to define the interface for this?  How do 
we get the filename of the new unreachable packfile?  Today 
the name of the new packfile is sent to stdout, would we 
just tack on another name?  That seems like it would break 
some assumptions?  Maybe it would be OK if it only did that 
when an --unreachable flag was added?  Then git-repack could 
be enhanced to understand that flag and the extra filenames 
it outputs?


 Is it that you do not like that repack -A ejects
 unreferenced objects and makes it loose, which you may
 have many?

Yes, several times a week we have people pushing the kernel 
to wrong projects, this leads to 4M loose objects. :(  
Without a solution for this regular problem, we are very 
scared to move our repos off of SSDs.  This leads to hour 
plus long fetches.


 The loosen_unused_packed_objects() function used by
 repack -A calls the force_object_loose() function
 (actually, it is the sole caller of the function).  If
 you tweak the latter to stream to a single new
 graveyard packfile and mark it as kept until expiry,
 would it solve the issue the same way but with much
 smaller impact?

Yes.
 
 There already is an infrastructure available to open a
 single output packfile and send multiple objects to it
 in bulk-checkin.c, and I am wondering if you can take
 advantage of the framework.  The existing interface to
 it assumes that the object data is coming from a file
 descriptor (the interface was built to support
 bulk-checkin of many objects in an empty repository),
 and it needs refactoring to allow stream_to_pack() to
 take different kind of data sources in the form of
 stateful callback function, though.

That feels beyond what I could currently dedicate the time 
to do.  Like I said, my solution is heavy handed but it felt 
simple enough for me to try.  I can spare the extra disk 
space and I am not convinced the performance hit would be 
bad.  I would, of course, be delighted if someone else were 
to do what you suggest, but I get that it's my itch...

-Martin


-- 
The Qualcomm Innovation Center, Inc. is a member of Code 
Aurora Forum, hosted by The Linux Foundation
 
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


A naive proposal for preventing loose object explosions

2013-09-05 Thread mfick
I am imagining what I consider to be a naive approach to preventing
loose unreachable object explosions.   It may seem a bit heavy handed
at first, but every conversation so far about this issue seems to have
died, so I am looking for a simple incremental improvement to what we
have today. I theorize that this approach will provide the same
protections (good and bad) against races as using git-repack -A -d
and git-prune --expire time regularly will today.

1a)  Add --prune-packed option to git-repack to force a call to git
prune-packed, without having to specify the -d option to git-repack.

1b) Add a --keep marker option to git-repack which will create a
keep file with marker in it for existing pack files which were
repacked (not to the new pack).

1c) Now instead of running:

 git-repack -A -d

run:

 git-repack --prune-packed --keep 'prune-when-expired'


This should effectively keep a duplicate copy of all old packfiles
around, but the new pack file will not have unreferenced objects in
it.  This is similar to having unreachable loose objects left around,
but it also keeps around extra copy(ies) of reachable objects wasting
some disk space.  While this will normally consume more disk space in
pack files, it will not explode loose objects, which will likely save
a lot of space when such explosions would have occured.   Of course,
this should also prevent the severe performance downsides to these
explosions.  Object lookups should likely not get any slower than if
repack were not run, and the extra new pack might actually help
find some objects quicker.   Safety with respect to unreachable object
race conditions should be the same as using git repack -A -d since at
least one copy of every object should be kept around during this run?


Then:

2a) Add support for passing in a list of pack files to git-repack.
This list will then be used as the original existing list instead
of finding all packfiles without keeps.

2b) Add an --expire-marked marker option to git-prune which will
find any pack files with a .keep with marker in it, and evaluate if
it meets the --expire time.  If so, it will also call:

   git-repack -a -d expired-pack-files...

This should repack any reachable objects from the expired-pack-files
into a single new pack file.  This may again cause some reachable
object duplication (likely with the same performance affects as the
first git-repack phase above), but unreachable objects from expired-
pack-files will now have been pruned as they would have been if they
had originally been turned into loose objects.

3) Finally on the next repack cycle the current duplicated reachable
objects should likely get fully reconsolidated into a single copy.

Does this sound like it would work?  I may attempt to construct this
for internal use (since it is a bit hacky).  It feels like it could be
done mostly with some simple shell modding/wrapping (feels less scary than
messing with the core C tools).  I wonder if I a missing some obvious flaw
to this approach?

Thanks for any insights,

-Martin


--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


My Proposal

2013-09-04 Thread Mr. Chi Pui
Good day, I Am Chi Pui;Do not be surprised! I got your email contact via the
World Email On-line Directory I am crediting officer at Sino Pac Bank Plc
in Hong Kong and i have a deal of $17.3M to discuss with you urgently.

Regards,
Mr.Chi Pui

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Proposal From Mr. Gibson Mouka.

2013-08-31 Thread Mr. Gibson Mouka


Dear Friend,
 
I decided to contact you to help me actualize this business for the mutual 
benefit of both our families. I am the Auditing and Accounting section manager 
in a bank, there is one of our customers who have made fixed deposit of sum of 
($39.5)million for 7 years and upon maturity; I discovered that he died after a 
brief illness without any next of kin on his file. I am contacting you for 
joining hands with the honesty and truth to ensure that the fund is transferred 
into your bank account.
 
Please reply quickly enough to enable me decide how to proceed with further 
details.
  
Regards.
Mr. Gibson Mouka.
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] repack: rewrite the shell script in C (squashing proposal)

2013-08-22 Thread Stefan Beller
This patch is meant to be squashed into bb4335a21441a0
(repack: rewrite the shell script in C), I'll do so when rerolling
the series. For reviewing I'll just send this patch.

* Remove comments, which likely get out of date (authorship is kept in
  git anyway)
* rename get_pack_filenames to get_non_kept_pack_filenames
* catch return value of unlink and fail as the shell version did
* beauty fixes to remove_temporary_files as Junio proposed
* install signal handling after static variables packdir, packtmp are set
* remove adding the empty string to the buffer.
* fix the rollback mechanism (wrong variable name)

Signed-off-by: Stefan Beller stefanbel...@googlemail.com
---
 builtin/repack.c | 78 ++--
 1 file changed, 36 insertions(+), 42 deletions(-)

diff --git a/builtin/repack.c b/builtin/repack.c
index 1f13e0d..e0d1f17 100644
--- a/builtin/repack.c
+++ b/builtin/repack.c
@@ -1,8 +1,3 @@
-/*
- * The shell version was written by Linus Torvalds (2005) and many others.
- * This is a translation into C by Stefan Beller (2013)
- */
-
 #include builtin.h
 #include cache.h
 #include dir.h
@@ -13,9 +8,8 @@
 #include string-list.h
 #include argv-array.h
 
-/* enabled by default since 22c79eab (2008-06-25) */
 static int delta_base_offset = 1;
-char *packdir;
+static char *packdir, *packtmp;
 
 static const char *const git_repack_usage[] = {
N_(git repack [options]),
@@ -41,18 +35,16 @@ static void remove_temporary_files(void)
DIR *dir;
struct dirent *e;
 
-   /* .git/objects/pack */
-   strbuf_addstr(buf, get_object_directory());
-   strbuf_addstr(buf, /pack);
-   dir = opendir(buf.buf);
-   if (!dir) {
-   strbuf_release(buf);
+   dir = opendir(packdir);
+   if (!dir)
return;
-   }
 
-   /* .git/objects/pack/.tmp-$$-pack-* */
+   strbuf_addstr(buf, packdir);
+
+   /* dirlen holds the length of the path before the file name */
dirlen = buf.len + 1;
-   strbuf_addf(buf, /.tmp-%d-pack-, (int)getpid());
+   strbuf_addf(buf, %s, packtmp);
+   /* prefixlen holds the length of the prefix */
prefixlen = buf.len - dirlen;
 
while ((e = readdir(dir))) {
@@ -73,11 +65,16 @@ static void remove_pack_on_signal(int signo)
raise(signo);
 }
 
-static void get_pack_filenames(struct string_list *fname_list)
+/*
+ * Adds all packs hex strings to the fname list, which do not
+ * have a corresponding .keep file.
+ */
+static void get_non_kept_pack_filenames(struct string_list *fname_list)
 {
DIR *dir;
struct dirent *e;
char *fname;
+   size_t len;
 
if (!(dir = opendir(packdir)))
return;
@@ -86,7 +83,7 @@ static void get_pack_filenames(struct string_list *fname_list)
if (suffixcmp(e-d_name, .pack))
continue;
 
-   size_t len = strlen(e-d_name) - strlen(.pack);
+   len = strlen(e-d_name) - strlen(.pack);
fname = xmemdupz(e-d_name, len);
 
if (!file_exists(mkpath(%s/%s.keep, packdir, fname)))
@@ -95,14 +92,14 @@ static void get_pack_filenames(struct string_list 
*fname_list)
closedir(dir);
 }
 
-static void remove_redundant_pack(const char *path, const char *sha1)
+static void remove_redundant_pack(const char *path_prefix, const char *hex)
 {
const char *exts[] = {.pack, .idx, .keep};
int i;
struct strbuf buf = STRBUF_INIT;
size_t plen;
 
-   strbuf_addf(buf, %s/%s, path, sha1);
+   strbuf_addf(buf, %s/%s, path_prefix, hex);
plen = buf.len;
 
for (i = 0; i  ARRAY_SIZE(exts); i++) {
@@ -115,15 +112,14 @@ static void remove_redundant_pack(const char *path, const 
char *sha1)
 int cmd_repack(int argc, const char **argv, const char *prefix)
 {
const char *exts[2] = {.idx, .pack};
-   char *packtmp;
struct child_process cmd;
struct string_list_item *item;
struct argv_array cmd_args = ARGV_ARRAY_INIT;
struct string_list names = STRING_LIST_INIT_DUP;
-   struct string_list rollback = STRING_LIST_INIT_DUP;
+   struct string_list rollback = STRING_LIST_INIT_NODUP;
struct string_list existing_packs = STRING_LIST_INIT_DUP;
struct strbuf line = STRBUF_INIT;
-   int count_packs, ext, ret;
+   int nr_packs, ext, ret, failed;
FILE *out;
 
/* variables to be filled by option parsing */
@@ -173,11 +169,11 @@ int cmd_repack(int argc, const char **argv, const char 
*prefix)
argc = parse_options(argc, argv, prefix, builtin_repack_options,
git_repack_usage, 0);
 
-   sigchain_push_common(remove_pack_on_signal);
-
packdir = mkpathdup(%s/pack, get_object_directory());
packtmp = mkpathdup(%s/.tmp-%d-pack, packdir, (int)getpid());
 
+   sigchain_push_common(remove_pack_on_signal);
+

Re: [PATCH] repack: rewrite the shell script in C (squashing proposal)

2013-08-22 Thread Junio C Hamano
Stefan Beller stefanbel...@googlemail.com writes:

 @@ -41,18 +35,16 @@ static void remove_temporary_files(void)
   DIR *dir;
   struct dirent *e;
  
 + dir = opendir(packdir);
 + if (!dir)
   return;
  
 + strbuf_addstr(buf, packdir);
 +
 + /* dirlen holds the length of the path before the file name */
   dirlen = buf.len + 1;
 + strbuf_addf(buf, %s, packtmp);
 + /* prefixlen holds the length of the prefix */

Thanks to the name of the variable that is self-describing, this
comment does not add much value.

But it misses the whole point of my suggestion in the earlier
message to phrase these like so:

/* Point at the slash at the end of .../objects/pack/ */
dirlen = strlen(packdir) + 1;
/* Point at the dash at the end of .../.tmp-%d-pack- */
prefixlen = buf.len - dirlen;

to clarify what the writer considers as the prefix is, which may
be quite different from what the readers think the prefix is.  In
.tmp-2342-pack-0d8beaa5b76e824c9869f0d1f1b19ec7acf4982f.pack, is
the prefix .tmp-2342-, .tmp-2342-pack, or .tmp-2342-pack-?

  int cmd_repack(int argc, const char **argv, const char *prefix)
  {
 ...
   packdir = mkpathdup(%s/pack, get_object_directory());
   packtmp = mkpathdup(%s/.tmp-%d-pack, packdir, (int)getpid());
  
 + sigchain_push_common(remove_pack_on_signal);
 +
   argv_array_push(cmd_args, pack-objects);
   argv_array_push(cmd_args, --keep-true-parents);
   argv_array_push(cmd_args, --honor-pack-keep);
 ...
 + rollback_failure.items[i].string,
 + rollback_failure.items[i].string);
   }
   exit(1);
   }

The scripted version uses

trap 'rm -f $PACKTMP-*' 0 1 2 3 15

so remove_temporary_files() needs to be called before exiting from
the program without getting killed by a signal.

Thanks.
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Proposal: sharing .git/config

2013-03-18 Thread Ramkumar Ramachandra
Jeff King wrote:
 I don't think you can avoid the 3-step problem and retain the safety in
 the general case.  Forgetting implementation details for a minute, you
 have either a 1-step system:

   1. Fetch and start using config from the remote.

 which is subject to fetching and executing malicious config, or:

   1. Fetch config from remote.
   2. Inspect it.
   3. Integrate it into the current config.

I don't understand your emphasis on step 2.  Isn't the configuration
written by me?  Why would it be malicious?

I've just started thinking about how to design something that will
allow us to share configuration elegantly [1].  Essentially, the
metadata repository will consist of *.layout files, one for each
repository to clone, containing the .git/config to write after cloning
that repository.  So, a git.layout might look like:

[layout]
directory = git
[remote origin]
url = git://github.com/git/git
[remote ram]
url = g...@github.com:artagnon/git
[remote junio]
url = git://github.com/gitster/git

As you can see the [layout] is a special section which will tell our
fetcher where to place the repository.  Everything else is meant to be
inserted into the repository's .git/config.  However, I can foresee a
problem in scaling: when I ask a specific directory like a/b/c to be
populated (equivalent of repo sync `a/b/c`), it'll have to parse the
layout.directory variable of all the .layout files, and this can be
slow.  So, maybe we should have a special _manifest.layout listing all
the paths?

Further, I see this as a way to work with projects that would
otherwise require nested submodules like the Android project.  What do
you think?

[1]: https://github.com/artagnon/src.layout
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


<    1   2   3   4   >