Re: GSoC 2016 | Proposal | Incremental Rewrite of git bisect
On Fri, Mar 25, 2016 at 11:15 AM, Pranit Bauvawrote: >> - you will add an option to "git bisect--helper" to perform what the >> git-bisect.sh function did, and >> - you will create a test script for "git bisect--helper" in which you >> will test each option? > > I had very initially planned to do this. But Matthieu pointed out that > it would be much better to use the existing test suite rather than > creating one which can lead to less coverage. Ok, then perhaps: - you will add tests to existing test scripts, so that each "git bisect--helper" option is (indirectly) tested. -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: GSoC 2016 | Proposal | Incremental Rewrite of git bisect
> - you will add an option to "git bisect--helper" to perform what the > git-bisect.sh function did, and > - you will create a test script for "git bisect--helper" in which you > will test each option? I had very initially planned to do this. But Matthieu pointed out that it would be much better to use the existing test suite rather than creating one which can lead to less coverage. Thanks, Pranit Bauva -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: GSoC 2016 | Proposal | Incremental Rewrite of git bisect
On Fri, Mar 25, 2016 at 2:45 PM, Matthieu Moy <matthieu@grenoble-inp.fr> wrote: > Christian Couder <christian.cou...@gmail.com> writes: > >> On Thu, Mar 24, 2016 at 12:27 AM, Pranit Bauva <pranit.ba...@gmail.com> >> wrote: >> >>> Unification of bisect.c and bisect--helper.c >>> >>> This will unify the algorithmic and non-algorithmic parts of bisect >>> bringing them under one heading to make the code clean. >> >> I am not sure this is needed and a good idea. Maybe you will rename >> "builtin/bisect--helper.c" to "builtin/bisect.c" and remove >> git-bisect.sh at the same time to complete the shell to C move. But >> the actual bisect.{c,h} might be useful as they are for other >> purposes. > > Yes. My view on this is that builtin/*.c should be just user-interface, > and actual stuff should be outside builtin, ideally in a well-designed > and reusable library (typically re-usable by libgit2 or others to > provide another UI for the same feature). Not all commands work this > way, but I think this is a good direction to take. Okay. I didn't know about this. Thanks for completing Christian's point. >> When you have sent one patch series, even a small one, then your main >> goal should be to have this patch series merged. > > I'd add: to get a patch series merged, two things take time: > > 1) latency: let time to other people to read and comment on your code. > > 2) extra-work required by reviewers. > > You want to send series early because of 1) (then you can work on the > next series while waiting for reviews on the current one), and you need > to prioritize 2) over working on the next series to minimize in-flight > topics. I had planned to work this way. I will include this in the proposal. Though it creates some confusion for me and I tend to mix some things up but I will maintain a hard copy to jot down the discussions and my thoughts. -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: GSoC 2016 | Proposal | Incremental Rewrite of git bisect
On Fri, Mar 25, 2016 at 2:32 PM, Christian Couder <christian.cou...@gmail.com> wrote: > On Thu, Mar 24, 2016 at 12:27 AM, Pranit Bauva <pranit.ba...@gmail.com> wrote: >> Hey! >> >> I have prepared a proposal for Google Summer of Code 2016. I know this >> is a bit late, but please try to give your comments and suggestions. >> My proposal could greatly improve from this. Some questions: >> >> 1. Should I include more ways in which it can help windows? > > I don't think it is necessary. > >> 2. Should I include the function names I intend to convert? > > I don't think it is necessary, but if you want, you can take a look at > some big ones (or perhaps just one big) and explain how you plan to > convert it (using which C functions or apis). I try to do it for one big one if there is some time left. >> 3. Is my timeline (a bit different) going to affect me in any way? > > What is important with the timeline is just that it looks realistic. > So each task should have a realistic amount of time and the order in > which tasks are listed should be logical. > I commented below about how I think you could improve your timeline. Your suggestions seem nice to me. I have thought about changing some parts. I have described some changes below. >> Here is a Google doc for my proposal. >> https://docs.google.com/document/d/1stnDPA5Hs3u0a8sqoWZicTFpCz1wHP9bkifcKY13Ocw/edit?usp=sharing >> >> For the people who prefer the text only version : >> >> --- >> >> Incremental rewrite of Git bisect >> >> About Me >> >> Basic Information >> >> >> Name Pranit Bauva >> >> University IIT Kharagpur >> >> MajorMining Engineering >> >> Emailpranit.ba...@gmail.com >> >> IRC pungi-man >> >> Blog http://bauva.in >> >> Timezone IST (UTC +5:30) >> >> Background >> >> I am a first year undergraduate in the department of Mining >> Engineering at Indian Institute of Technology, Kharagpur. I am an open >> source enthusiast. I am a part of Kharagpur Linux Users Group which is >> basically a group of open-source enthusiasts. I am quite familiar with >> C and I have been using shell for some time now and still find new >> things about it everyday. I have used SVN when I was on Windows and >> then I switched to Git when I moved to linux. Git seems like magic. I >> always wanted to involve in the development process and Google Summer >> of Code is an a awesome way to achieve it. >> >> >> Abstract >> >> Git bisect is a frequently used command which helps the developers in >> finding the commit which introduced the bug. Some part of it is >> written in shell script. I intend to convert it to low level C code >> thus making them builtins. This will increase Git’s portability. >> Efficiency of git bisect will definitely increase but it would not >> really matter much as most of the time is consumed in compiling or >> testing when in bisection mode but it will definitely reduce the >> overhead IO which can make the heavy process of compiling relatively >> lighter. >> >> >> Problems Shell creates >> >> System Dependencies >> >> Using shell code introduces various dependencies even though they >> allowing prototyping of the code quickly. Shell script often use some >> POSIX utilities like cat, grep, ls, mkdir, etc which are not included >> in non-POSIX systems by default. These scripts do not have access to >> the git’s internal low level API. So even trivial tasks have to be >> performed by spawning new process every time. So when git is ported to >> windows, it has to include all the utilities (namely a shell >> interpreter, perl bindings and much more). >> >> Scripts introduce extra overheads >> >> Shell scripts do not have access to Git’s internal API which has >> excellent use of cache thus reducing the unnecessary IO of user >> configuration files, repository index and filesystem access. By using >> a builtin we could exploit the cache system thus reducing the >> overhead. As compiling / testing already involves quite a number of >> resources, it would be good if we could do our best to make more >> resources available for that. >> >> Potential Problems >> >> Rewriting may introduce bugs >> >> Rewriting the shell script to C might introduce some bugs. This >> problem will be properly taken care of in my method of
Re: GSoC 2016 | Proposal | Incremental Rewrite of git bisect
Christian Couderwrites: > On Thu, Mar 24, 2016 at 12:27 AM, Pranit Bauva wrote: > >> Unification of bisect.c and bisect--helper.c >> >> This will unify the algorithmic and non-algorithmic parts of bisect >> bringing them under one heading to make the code clean. > > I am not sure this is needed and a good idea. Maybe you will rename > "builtin/bisect--helper.c" to "builtin/bisect.c" and remove > git-bisect.sh at the same time to complete the shell to C move. But > the actual bisect.{c,h} might be useful as they are for other > purposes. Yes. My view on this is that builtin/*.c should be just user-interface, and actual stuff should be outside builtin, ideally in a well-designed and reusable library (typically re-usable by libgit2 or others to provide another UI for the same feature). Not all commands work this way, but I think this is a good direction to take. > When you have sent one patch series, even a small one, then your main > goal should be to have this patch series merged. I'd add: to get a patch series merged, two things take time: 1) latency: let time to other people to read and comment on your code. 2) extra-work required by reviewers. You want to send series early because of 1) (then you can work on the next series while waiting for reviews on the current one), and you need to prioritize 2) over working on the next series to minimize in-flight topics. -- Matthieu Moy http://www-verimag.imag.fr/~moy/ -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: GSoC 2016 | Proposal | Incremental Rewrite of git bisect
On Thu, Mar 24, 2016 at 12:27 AM, Pranit Bauva <pranit.ba...@gmail.com> wrote: > Hey! > > I have prepared a proposal for Google Summer of Code 2016. I know this > is a bit late, but please try to give your comments and suggestions. > My proposal could greatly improve from this. Some questions: > > 1. Should I include more ways in which it can help windows? I don't think it is necessary. > 2. Should I include the function names I intend to convert? I don't think it is necessary, but if you want, you can take a look at some big ones (or perhaps just one big) and explain how you plan to convert it (using which C functions or apis). > 3. Is my timeline (a bit different) going to affect me in any way? What is important with the timeline is just that it looks realistic. So each task should have a realistic amount of time and the order in which tasks are listed should be logical. I commented below about how I think you could improve your timeline. > Here is a Google doc for my proposal. > https://docs.google.com/document/d/1stnDPA5Hs3u0a8sqoWZicTFpCz1wHP9bkifcKY13Ocw/edit?usp=sharing > > For the people who prefer the text only version : > > --- > > Incremental rewrite of Git bisect > > About Me > > Basic Information > > > Name Pranit Bauva > > University IIT Kharagpur > > MajorMining Engineering > > Emailpranit.ba...@gmail.com > > IRC pungi-man > > Blog http://bauva.in > > Timezone IST (UTC +5:30) > > Background > > I am a first year undergraduate in the department of Mining > Engineering at Indian Institute of Technology, Kharagpur. I am an open > source enthusiast. I am a part of Kharagpur Linux Users Group which is > basically a group of open-source enthusiasts. I am quite familiar with > C and I have been using shell for some time now and still find new > things about it everyday. I have used SVN when I was on Windows and > then I switched to Git when I moved to linux. Git seems like magic. I > always wanted to involve in the development process and Google Summer > of Code is an a awesome way to achieve it. > > > Abstract > > Git bisect is a frequently used command which helps the developers in > finding the commit which introduced the bug. Some part of it is > written in shell script. I intend to convert it to low level C code > thus making them builtins. This will increase Git’s portability. > Efficiency of git bisect will definitely increase but it would not > really matter much as most of the time is consumed in compiling or > testing when in bisection mode but it will definitely reduce the > overhead IO which can make the heavy process of compiling relatively > lighter. > > > Problems Shell creates > > System Dependencies > > Using shell code introduces various dependencies even though they > allowing prototyping of the code quickly. Shell script often use some > POSIX utilities like cat, grep, ls, mkdir, etc which are not included > in non-POSIX systems by default. These scripts do not have access to > the git’s internal low level API. So even trivial tasks have to be > performed by spawning new process every time. So when git is ported to > windows, it has to include all the utilities (namely a shell > interpreter, perl bindings and much more). > > Scripts introduce extra overheads > > Shell scripts do not have access to Git’s internal API which has > excellent use of cache thus reducing the unnecessary IO of user > configuration files, repository index and filesystem access. By using > a builtin we could exploit the cache system thus reducing the > overhead. As compiling / testing already involves quite a number of > resources, it would be good if we could do our best to make more > resources available for that. > > Potential Problems > > Rewriting may introduce bugs > > Rewriting the shell script to C might introduce some bugs. This > problem will be properly taken care of in my method of approach > (described below). Still this approach will definitely not guarantee > that the functionality of the new will be exactly similar to the old > one, though it will greatly reduce its possibility. The reviews > provided by the seniors in the git community would help a lot in > reducing bugs since they know the common bugs and how to work around > them. The test suite of git is quite nice which has an awesome > coverage. > > Rewritten can be hard to understand > > Git does not like having many external dependencies, libraries or > executables other than what is provided by git itself and the > rewritten code should follow this. C does not p
Re: [GSoC] Proposal
Well, I should have done some search before ask. -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [GSoC] Proposal
Some developers are already working on that[1]. [1]: http://thread.gmane.org/gmane.comp.version-control.git/288306 On Fri, Mar 25, 2016 at 10:12 AM, 惠轶群wrote: > There is an interesting idea as an idea for GSoC of 2008, is it still > proposable? > > https://git.wiki.kernel.org/index.php/SoC2008Ideas#Restartable_Clone > > 2016-03-25 11:45 GMT+08:00 惠轶群 : >> Hi, >> >> I'm proposing to take part in GSoC as a developer of git. >> >> Here is my >> [Draft](https://docs.google.com/document/d/1zqOVb_cnYcaix48ep1KNPeLpRHvNKA26kNXc78yjhMg/edit?usp=sharing). >> >> I'm planning to refactor some part of git. Following is what I'm interested >> in: >> >> - port parts of “git rebase” to a C helper >> - “git status” during non-interactive rebase >> - etc interesting during the development >> >> If time allow, I'd like to also improve git-bisect, for example: >> >> - convert “git-bisect.sh” to a builtin >> - etc >> >> Sorry for t late. I was so busy these days. sorry again. > -- > To unsubscribe from this list: send the line "unsubscribe git" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [GSoC] Proposal
There is an interesting idea as an idea for GSoC of 2008, is it still proposable? https://git.wiki.kernel.org/index.php/SoC2008Ideas#Restartable_Clone 2016-03-25 11:45 GMT+08:00 惠轶群: > Hi, > > I'm proposing to take part in GSoC as a developer of git. > > Here is my > [Draft](https://docs.google.com/document/d/1zqOVb_cnYcaix48ep1KNPeLpRHvNKA26kNXc78yjhMg/edit?usp=sharing). > > I'm planning to refactor some part of git. Following is what I'm interested > in: > > - port parts of “git rebase” to a C helper > - “git status” during non-interactive rebase > - etc interesting during the development > > If time allow, I'd like to also improve git-bisect, for example: > > - convert “git-bisect.sh” to a builtin > - etc > > Sorry for t late. I was so busy these days. sorry again. -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[GSoC] Proposal
Hi, I'm proposing to take part in GSoC as a developer of git. Here is my [Draft](https://docs.google.com/document/d/1zqOVb_cnYcaix48ep1KNPeLpRHvNKA26kNXc78yjhMg/edit?usp=sharing). I'm planning to refactor some part of git. Following is what I'm interested in: - port parts of “git rebase” to a C helper - “git status” during non-interactive rebase - etc interesting during the development If time allow, I'd like to also improve git-bisect, for example: - convert “git-bisect.sh” to a builtin - etc Sorry for t late. I was so busy these days. sorry again. -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[GSoC] Proposal
Greetings, I hope it is not yet too late to jump on the Summer of Code bandwagon. I would appreciate comments on my application [1] and my microproject contribution, which will follow this mail as a reply. My proposal mostly stems from what was noted under "convert scripts to builtins" and "git rebase improvements" in the ideas page. Both list no mentor, so please let me know if you know anyone who should be mentioned in CC. Regards, XZS. [1]: https://docs.google.com/document/d/1-BV-s5VUGTvBlcVDeo6tVqQO5D1hqeQDqaf37iYuIfU/edit?usp=sharing -- 2.7.4 -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
GSoC proposal
As I was strongly encouraged to submit my GSoC proposal, I'll post it here and CC to my possible mentor. Please, provide with your feedback about my draft. You can also comment it right in the Google doc. Thanks in advance Proposal: https://docs.google.com/document/d/1Hpu9FfD3wb7qgWgTiKtIAie41OXK3ufgnhnNuRaEH4E -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
GSoC 2016 | Proposal | Incremental Rewrite of git bisect
Hey! I have prepared a proposal for Google Summer of Code 2016. I know this is a bit late, but please try to give your comments and suggestions. My proposal could greatly improve from this. Some questions: 1. Should I include more ways in which it can help windows? 2. Should I include the function names I intend to convert? 3. Is my timeline (a bit different) going to affect me in any way? Here is a Google doc for my proposal. https://docs.google.com/document/d/1stnDPA5Hs3u0a8sqoWZicTFpCz1wHP9bkifcKY13Ocw/edit?usp=sharing For the people who prefer the text only version : --- Incremental rewrite of Git bisect About Me Basic Information Name Pranit Bauva University IIT Kharagpur MajorMining Engineering Emailpranit.ba...@gmail.com IRC pungi-man Blog http://bauva.in Timezone IST (UTC +5:30) Background I am a first year undergraduate in the department of Mining Engineering at Indian Institute of Technology, Kharagpur. I am an open source enthusiast. I am a part of Kharagpur Linux Users Group which is basically a group of open-source enthusiasts. I am quite familiar with C and I have been using shell for some time now and still find new things about it everyday. I have used SVN when I was on Windows and then I switched to Git when I moved to linux. Git seems like magic. I always wanted to involve in the development process and Google Summer of Code is an a awesome way to achieve it. Abstract Git bisect is a frequently used command which helps the developers in finding the commit which introduced the bug. Some part of it is written in shell script. I intend to convert it to low level C code thus making them builtins. This will increase Git’s portability. Efficiency of git bisect will definitely increase but it would not really matter much as most of the time is consumed in compiling or testing when in bisection mode but it will definitely reduce the overhead IO which can make the heavy process of compiling relatively lighter. Problems Shell creates System Dependencies Using shell code introduces various dependencies even though they allowing prototyping of the code quickly. Shell script often use some POSIX utilities like cat, grep, ls, mkdir, etc which are not included in non-POSIX systems by default. These scripts do not have access to the git’s internal low level API. So even trivial tasks have to be performed by spawning new process every time. So when git is ported to windows, it has to include all the utilities (namely a shell interpreter, perl bindings and much more). Scripts introduce extra overheads Shell scripts do not have access to Git’s internal API which has excellent use of cache thus reducing the unnecessary IO of user configuration files, repository index and filesystem access. By using a builtin we could exploit the cache system thus reducing the overhead. As compiling / testing already involves quite a number of resources, it would be good if we could do our best to make more resources available for that. Potential Problems Rewriting may introduce bugs Rewriting the shell script to C might introduce some bugs. This problem will be properly taken care of in my method of approach (described below). Still this approach will definitely not guarantee that the functionality of the new will be exactly similar to the old one, though it will greatly reduce its possibility. The reviews provided by the seniors in the git community would help a lot in reducing bugs since they know the common bugs and how to work around them. The test suite of git is quite nice which has an awesome coverage. Rewritten can be hard to understand Git does not like having many external dependencies, libraries or executables other than what is provided by git itself and the rewritten code should follow this. C does not provide with a lot of other facilities like text processing which shell does whose C implementation often spans to multiple lines. C is also notorious for being a bit “cryptic”. This problem can be compensated by having well written documentation with well defined inputs, outputs and behavior. A peek into git bisect How does it help? Git bisect helps the software developers to find the commit that introduced a regression. Software developers are interested in knowing this because a commit changes a small set of code (most time). It is much easier to understand and fix a problem when you know only need to check a very small set of changes, than when you don’t know where to look at it. It is not that the problem will be exactly in that commit but it will be related to the behavior introduced in the commit. Software bugs can be a nightmare when the code base is very large. There would be a lot of sleepless night in figuring out the part which causes the error. This is where git bisect helps. This is the one of the most sought after tool
Re: [GSOC/RFC] GSoC Proposal Draft | Git Beginner
Updated examples with better description for force push and reset HEAD, as suggested by Lars [11]. Thanks and regards, Sidhant Sharma [11]: http://thread.gmane.org/gmane.comp.version-control.git/289365/focus=289495 --- Implement a beginner mode for Git. Abstract Git is a very powerful version control system, with an array of features that lend the user with great capabilities. But it often so happens that some beginners are overwhelmed by its complexity and are unable to fully understand and thus, utilize Git. Moreover, often beginners do not fully understand the command they are using and end up making destructive (and occasionally, irreversible) changes to the repository. The beginner mode will assist such users in using Git by warning them before making possibly destructive changes. It will also display tips and short snippets of documentation for better understanding the Git model. Google summer of code Idea suggested here: http://git.github.io/SoC-2016-Ideas/#git-beginner About Me Name : Sidhant Sharma Email [1] : Sidhant.Sharma1208 gmail.com Email [2] : Tigerkid001 gmail.com College : Delhi Technological University Studying : Software Engineering IRC : tk001 (or _tk_) Phone : 91-9990-606-081 Country : India Interests : Computers, Books, Photography Github : Tigerkid001 LinkedIn : https://in.linkedin.com/in/sidhantsharma12 Technical Experience Authored several Mozilla Firefox and Google Chrome extensions: Firefox: Owl [1], Blink [2], Spoiler Jedi [3] Chrome: Blink [4] Developed a robust Plugin framework for Android [5] for a startup. Learning Linux kernel programming via the Eudyptula Challenge [6] (currently level 6). Developed natural language processor for sarcasm detection [7] in tweets. Developed hand gesture detection module [8] as a college minor project. Active Firefox Add-ons Editor at AMO [9]. Currently working on a restaurant image classification project as second college minor project. Why I chose Git I have been using Git for about two years now, and it has become an indispensable daily-use tool for me. Getting a chance to participate in GSoC for the first time under Git is very exciting. It will give me an opportunity to intimately know the system and a chance to help in making it better and more powerful. Proposal Ideas Page: Git Beginner [10] The following tasks summarize the project: Implement a wrapper around Git A wrapper is to be implemented around (currently called 'ggit'), which will provide the following user interface: `ggit ` For example, `ggit add --all` The wrapper will assess the arguments passed to it, and if they are detected to be safe, it will simply pass them through to 'git'. This approach is favorable as the existing users of git will not be affected by the wrapper. Warning for potentially destructive commands For every command that is entered, the wrapper will assess the subcommand and its options. In that, it will first check if the subcommand (eg. add, commit, rebase) is present in a list of predefined 'potentially destructive' commands. This can be done by searching through a radix tree for the subcommand. If found, then the arguments to the subcommand will be checked for specific flags. The graylisted flags for the destructive commands will be stored as an array of regular expressions, and the current command's arguments will be checked against them. If matches are found, a warning is displayed. 'ggit' for the warning would be "You are about to do X, which will permanently destroy Y. Are you sure you wish to continue? [Y/n] " If the user enters Y[es], the command will be executed as is (by passing it unaltered to git). In the case of Y[es], 'ggit' will also give tips for undoing the changes made by this command (by referring the user to correct commands and reflog), if the command can be undone. In case the command cannot be undone, 'ggit' will display an additional line in the warning like "The changes made by this command cannot be undone. Please proceed cautiously". In the case of n[o], 'ggit' will exit without executing the command. Currently, the list consists of commands like: $ git rebase $ git reset --hard $ git clean -f $ git gc --prune=now --aggressive $ git push -f $ git push remote [+/:] $ git branch -D The list will be updated after some more discussion on the list. Usage tips and documentation The wrapper will also be responsible for showing a short description of every command that is entered through 'ggit'. This shall be done for every command unconditionally. The description will be derived from the actual documentation, but will primarily aim to help the beginner understand the Git workflow and the Git model. A few examples to illustrate the working of the wrapper are: $ ggit add --all Staging all changes and untracked files. Use ` [g]git commit` to commit the changes. $ ggit commit -m “Second commit” Committing staged changes… [master 0be3142] Second commit 4 files changed, 6 insert
Re: [GSOC/RFC] GSoC Proposal Draft | Git Beginner
On Tuesday 22 March 2016 02:08 PM, Lars Schneider wrote: > On 21 Mar 2016, at 11:19, Sidhant Sharma <tigerkid...@gmail.com> wrote: > >> Hi, >> I updated the draft with links, ggit usage examples and some changes to the >> timeline. I placed the links with reference here, but in the Google Doc, >> they're >> inline. >> >> Thanks and regards, >> Sidhant Sharma >> >> --- >> >> Implement a beginner mode for Git. >> >> Abstract >> >> Git is a very powerful version control system, with an array of features >> that lend the user with great capabilities. But it often so happens that some >> beginners are overwhelmed by its complexity and are unable to fully >> understand >> and thus, utilize Git. Moreover, often beginners do not fully understand >> the command they are using and end up making destructive (and occasionally, >> irreversible) changes to the repository. >> >> The beginner mode will assist such users in using Git by warning them >> before making possibly destructive changes. It will also display tips and >> short snippets of documentation for better understanding the Git model. >> >> Google summer of code Idea suggested here: >> http://git.github.io/SoC-2016-Ideas/#git-beginner >> >> About Me >> >> Name : Sidhant Sharma >> Email [1] : Sidhant.Sharma1208 gmail.com >> Email [2] : Tigerkid001 gmail.com >> College : Delhi Technological University >> Studying : Software Engineering >> IRC : tk001 (or _tk_) >> Phone : 91-9990-606-081 >> Country : India >> Interests : Computers, Books, Photography >> Github : Tigerkid001 >> LinkedIn : https://in.linkedin.com/in/sidhantsharma12 >> >> Technical Experience >> >> Authored several Mozilla Firefox and Google Chrome extensions: >> Firefox: Owl [1], Blink [2], Spoiler Jedi [3] >> Chrome: Blink [4] >> >> Developed a robust Plugin framework for Android [5] for a startup. >> Learning Linux kernel programming via the Eudyptula Challenge [6] >> (currently level 6). >> Developed natural language processor for sarcasm detection [7] in tweets. >> Developed hand gesture detection module [8] as a college minor project. >> Active Firefox Add-ons Editor at AMO [9]. >> Currently working on a restaurant image classification project as second >> college >> minor project. >> >> Why I chose Git >> >> I have been using Git for about two years now, and it has become an >> indispensable daily-use tool for me. Getting a chance to participate in GSoC >> for the first time under Git is very exciting. It will give me an opportunity >> to intimately know the system and a chance to help in making it better and >> more >> powerful. >> >> Proposal >> >> Ideas Page: Git Beginner [10] >> >> The following tasks summarize the project: >> >> Implement a wrapper around Git >> >> A wrapper is to be implemented around (currently called 'ggit'), which will >> provide the following user interface: >> `ggit ` >> For example, `ggit add --all` >> The wrapper will assess the arguments passed to it, and if they are detected >> to >> be safe, it will simply pass them through to 'git'. This approach is >> favorable as the existing >> users of git will not be affected by the wrapper. >> >> Warning for potentially destructive commands >> >> For every command that is entered, the wrapper will assess the subcommand and >> its options. In that, it will first check if the subcommand (eg. add, >> commit, rebase) is present in a list of predefined 'potentially destructive' >> commands. This can be done by searching through a radix tree for the >> subcommand. >> If found, then the arguments to the subcommand will be checked for specific >> flags. The graylisted flags for the destructive commands will be stored as an >> array of regular expressions, and the current command's arguments will be >> checked against them. If matches are found, a warning is displayed. 'ggit' >> for the warning would be >> "You are about to do X, which will permanently destroy Y. Are you sure you >> wish >> to continue? [Y/n] " >> If the user enters Y[es], the command will be executed as is (by passing it >> unaltered to git). In the case of Y[es], 'ggit' will also give tips for >> undoing >> the changes made by this command (by referring the user to correct commands >> and >> reflog), if the command can be undone. In case the command cannot be undone, >> 'ggit'
Re: [GSOC/RFC] GSoC Proposal Draft | Git Beginner
On 21 Mar 2016, at 11:19, Sidhant Sharma <tigerkid...@gmail.com> wrote: > Hi, > I updated the draft with links, ggit usage examples and some changes to the > timeline. I placed the links with reference here, but in the Google Doc, > they're > inline. > > Thanks and regards, > Sidhant Sharma > > --- > > Implement a beginner mode for Git. > > Abstract > > Git is a very powerful version control system, with an array of features > that lend the user with great capabilities. But it often so happens that some > beginners are overwhelmed by its complexity and are unable to fully understand > and thus, utilize Git. Moreover, often beginners do not fully understand > the command they are using and end up making destructive (and occasionally, > irreversible) changes to the repository. > > The beginner mode will assist such users in using Git by warning them > before making possibly destructive changes. It will also display tips and > short snippets of documentation for better understanding the Git model. > > Google summer of code Idea suggested here: > http://git.github.io/SoC-2016-Ideas/#git-beginner > > About Me > > Name : Sidhant Sharma > Email [1] : Sidhant.Sharma1208 gmail.com > Email [2] : Tigerkid001 gmail.com > College : Delhi Technological University > Studying : Software Engineering > IRC : tk001 (or _tk_) > Phone : 91-9990-606-081 > Country : India > Interests : Computers, Books, Photography > Github : Tigerkid001 > LinkedIn : https://in.linkedin.com/in/sidhantsharma12 > > Technical Experience > > Authored several Mozilla Firefox and Google Chrome extensions: > Firefox: Owl [1], Blink [2], Spoiler Jedi [3] > Chrome: Blink [4] > > Developed a robust Plugin framework for Android [5] for a startup. > Learning Linux kernel programming via the Eudyptula Challenge [6] > (currently level 6). > Developed natural language processor for sarcasm detection [7] in tweets. > Developed hand gesture detection module [8] as a college minor project. > Active Firefox Add-ons Editor at AMO [9]. > Currently working on a restaurant image classification project as second > college > minor project. > > Why I chose Git > > I have been using Git for about two years now, and it has become an > indispensable daily-use tool for me. Getting a chance to participate in GSoC > for the first time under Git is very exciting. It will give me an opportunity > to intimately know the system and a chance to help in making it better and > more > powerful. > > Proposal > > Ideas Page: Git Beginner [10] > > The following tasks summarize the project: > > Implement a wrapper around Git > > A wrapper is to be implemented around (currently called 'ggit'), which will > provide the following user interface: > `ggit ` > For example, `ggit add --all` > The wrapper will assess the arguments passed to it, and if they are detected > to > be safe, it will simply pass them through to 'git'. This approach is > favorable as the existing > users of git will not be affected by the wrapper. > > Warning for potentially destructive commands > > For every command that is entered, the wrapper will assess the subcommand and > its options. In that, it will first check if the subcommand (eg. add, > commit, rebase) is present in a list of predefined 'potentially destructive' > commands. This can be done by searching through a radix tree for the > subcommand. > If found, then the arguments to the subcommand will be checked for specific > flags. The graylisted flags for the destructive commands will be stored as an > array of regular expressions, and the current command's arguments will be > checked against them. If matches are found, a warning is displayed. 'ggit' > for the warning would be > "You are about to do X, which will permanently destroy Y. Are you sure you > wish > to continue? [Y/n] " > If the user enters Y[es], the command will be executed as is (by passing it > unaltered to git). In the case of Y[es], 'ggit' will also give tips for > undoing > the changes made by this command (by referring the user to correct commands > and > reflog), if the command can be undone. In case the command cannot be undone, > 'ggit' will display an additional line in the warning like > "The changes made by this command cannot be undone. Please proceed > cautiously". > In the case of n[o], 'ggit' will exit without executing the command. > > Currently, the list consists of commands like: > > $ git rebase > $ git reset --hard > $ git clean -f > $ git gc --prune=now --aggressive > $ git push -f > $ git push remote [+/:] > $ git branch -D > > The list
Re: [GSOC/RFC] GSoC Proposal Draft | Git Beginner
Hi, I updated the draft with links, ggit usage examples and some changes to the timeline. I placed the links with reference here, but in the Google Doc, they're inline. Thanks and regards, Sidhant Sharma --- Implement a beginner mode for Git. Abstract Git is a very powerful version control system, with an array of features that lend the user with great capabilities. But it often so happens that some beginners are overwhelmed by its complexity and are unable to fully understand and thus, utilize Git. Moreover, often beginners do not fully understand the command they are using and end up making destructive (and occasionally, irreversible) changes to the repository. The beginner mode will assist such users in using Git by warning them before making possibly destructive changes. It will also display tips and short snippets of documentation for better understanding the Git model. Google summer of code Idea suggested here: http://git.github.io/SoC-2016-Ideas/#git-beginner About Me Name : Sidhant Sharma Email [1] : Sidhant.Sharma1208 gmail.com Email [2] : Tigerkid001 gmail.com College : Delhi Technological University Studying : Software Engineering IRC : tk001 (or _tk_) Phone : 91-9990-606-081 Country : India Interests : Computers, Books, Photography Github : Tigerkid001 LinkedIn : https://in.linkedin.com/in/sidhantsharma12 Technical Experience Authored several Mozilla Firefox and Google Chrome extensions: Firefox: Owl [1], Blink [2], Spoiler Jedi [3] Chrome: Blink [4] Developed a robust Plugin framework for Android [5] for a startup. Learning Linux kernel programming via the Eudyptula Challenge [6] (currently level 6). Developed natural language processor for sarcasm detection [7] in tweets. Developed hand gesture detection module [8] as a college minor project. Active Firefox Add-ons Editor at AMO [9]. Currently working on a restaurant image classification project as second college minor project. Why I chose Git I have been using Git for about two years now, and it has become an indispensable daily-use tool for me. Getting a chance to participate in GSoC for the first time under Git is very exciting. It will give me an opportunity to intimately know the system and a chance to help in making it better and more powerful. Proposal Ideas Page: Git Beginner [10] The following tasks summarize the project: Implement a wrapper around Git A wrapper is to be implemented around (currently called 'ggit'), which will provide the following user interface: `ggit ` For example, `ggit add --all` The wrapper will assess the arguments passed to it, and if they are detected to be safe, it will simply pass them through to 'git'. This approach is favorable as the existing users of git will not be affected by the wrapper. Warning for potentially destructive commands For every command that is entered, the wrapper will assess the subcommand and its options. In that, it will first check if the subcommand (eg. add, commit, rebase) is present in a list of predefined 'potentially destructive' commands. This can be done by searching through a radix tree for the subcommand. If found, then the arguments to the subcommand will be checked for specific flags. The graylisted flags for the destructive commands will be stored as an array of regular expressions, and the current command's arguments will be checked against them. If matches are found, a warning is displayed. 'ggit' for the warning would be "You are about to do X, which will permanently destroy Y. Are you sure you wish to continue? [Y/n] " If the user enters Y[es], the command will be executed as is (by passing it unaltered to git). In the case of Y[es], 'ggit' will also give tips for undoing the changes made by this command (by referring the user to correct commands and reflog), if the command can be undone. In case the command cannot be undone, 'ggit' will display an additional line in the warning like "The changes made by this command cannot be undone. Please proceed cautiously". In the case of n[o], 'ggit' will exit without executing the command. Currently, the list consists of commands like: $ git rebase $ git reset --hard $ git clean -f $ git gc --prune=now --aggressive $ git push -f $ git push remote [+/:] $ git branch -D The list will be updated after some more discussion on the list. Usage tips and documentation The wrapper will also be responsible for showing a short description of every command that is entered through 'ggit'. This shall be done for every command unconditionally. The description will be derived from the actual documentation, but will primarily aim to help the beginner understand the Git workflow and the Git model. A few examples to illustrate the working of the wrapper are: $ ggit add --all Staging all changes and untracked files. Use ` [g]git commit` to commit the changes. $ ggit commit -m “Second commit” Committing staged changes… [master 0be3142] Second commit 4 files changed, 6 insertions(+), 2
Re: [GSOC/RFC] GSoC Proposal Draft | Git Beginner
On Monday 21 March 2016 01:59 PM, Matthieu Moy wrote: > Sidhant Sharmawrites: > >> On Monday 21 March 2016 12:22 AM, Matthieu Moy wrote: >> >>> Note that it implies writting an almost full-blown option parser to >>> recognize commands like >>> >>> ggit --work-tree git --namespace reset --git-dir --hard git log >>> >>> (just looking for "git", "reset" and "--hard" in the command-line would >>> not work here). >> Could you please elaborate on the above command, I'm unable to >> understand its syntax. I thought all git commands follow the >> `git command ` syntax, so using simple string >> manipulations and regexes would work. Am I missing something? > The full syntax is > > git [global options] [options and arguments for a command] > > For example: > > git -p log => -p is the option for "git" itself, which means "paginate" > git log -p => -p is the option for "git log", which means "patch" > > Options can have stuck or non-stuck form, for example > > git --work-tree=foo <=> git --work-tree foo > > git --work-tree git --namespace reset --git-dir --hard git log > <=> > git --work-tree=git --namespace=reset --git-dir=--hard git log > > (This is probably a stupid command to type, but it's legal) > > The later is source of issues for a parser since you can't just iterate > through argv[] and search for problematic commands/options, since you > have to distinguish options themselves (--work-tree above) and option > arguments (foo above). Thanks for the explanation; I knew of the global options but didn't know that the last command would be syntactically legal. For commands like such iterating over argv[] wouldn't work (not in all cases). Though a beginner may not enter commands of this sort, I agree we shouldn't rely on that. If it were only for stuck commands, regexes would've worked. I can now see why a parser would be needed here, which can recognize global options and the above command syntax. But for this example, > In my example above, I played with global options (before "git" in the > command-line), but I could also have done that with per-command options > taking arguments, like > > git push --repo --force > > Here, --force is the name of the repo (again, probably a stupid name, > but why not), not the --force option. would the parser also be required to understand all options and arguments for all git commands? Although --force could not be a branch name (git denies it), but it may not be so for other commands. >> I wasn't sure if we are allowed to code before the actual coding period >> begins >> so I kept it that way. I'll update it now. > You're not "forced" to, but you can write code whenever you like. We've > already seen code written before the application! > Nice! I too would like to get started early :) -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [GSOC/RFC] GSoC Proposal Draft | Git Beginner
Sidhant Sharmawrites: > On Monday 21 March 2016 12:22 AM, Matthieu Moy wrote: > >> Note that it implies writting an almost full-blown option parser to >> recognize commands like >> >> ggit --work-tree git --namespace reset --git-dir --hard git log >> >> (just looking for "git", "reset" and "--hard" in the command-line would >> not work here). > > Could you please elaborate on the above command, I'm unable to > understand its syntax. I thought all git commands follow the > `git command ` syntax, so using simple string > manipulations and regexes would work. Am I missing something? The full syntax is git [global options] [options and arguments for a command] For example: git -p log => -p is the option for "git" itself, which means "paginate" git log -p => -p is the option for "git log", which means "patch" Options can have stuck or non-stuck form, for example git --work-tree=foo <=> git --work-tree foo git --work-tree git --namespace reset --git-dir --hard git log <=> git --work-tree=git --namespace=reset --git-dir=--hard git log (This is probably a stupid command to type, but it's legal) The later is source of issues for a parser since you can't just iterate through argv[] and search for problematic commands/options, since you have to distinguish options themselves (--work-tree above) and option arguments (foo above). In my example above, I played with global options (before "git" in the command-line), but I could also have done that with per-command options taking arguments, like git push --repo --force Here, --force is the name of the repo (again, probably a stupid name, but why not), not the --force option. > I wasn't sure if we are allowed to code before the actual coding period begins > so I kept it that way. I'll update it now. You're not "forced" to, but you can write code whenever you like. We've already seen code written before the application! -- Matthieu Moy http://www-verimag.imag.fr/~moy/ -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [GSOC/RFC] GSoC Proposal Draft | Git Beginner
On Monday 21 March 2016 12:22 AM, Matthieu Moy wrote: > Sidhant Sharma <tigerkid...@gmail.com> writes: > >> A wrapper is to be implemented around (currently called 'ggit'), which will >> provide the following user interface: >> `ggit ` > There's actually already a tool doing this: > > https://people.gnome.org/~newren/eg/ > > I'm Cc-ing the author. > > I heard good feedback about the tool in the early days of Git, when git > itself was rather clearly not ready for bare mortals. The tool seems > abandonned since 2013 (last release), my guess is that git became usable > enough and eg is not needed as much as it was. For example, eg defaulted > to push.default=tracking before we did the change to push.default=simple > in git. Nice! I'll take a look at its source and see how it works. > > I think the "wrapper" approach is sound. It avoids touching git itself > and breaking things that depend on git (for example, adding > core.denyHardReset to let "git reset --hard" error out would be > unacceptable because it would mean that any script using "git reset > --hard" would break when a user has the option set in ~/.gitconfig). > > Note that it implies writting an almost full-blown option parser to > recognize commands like > > ggit --work-tree git --namespace reset --git-dir --hard git log > > (just looking for "git", "reset" and "--hard" in the command-line would > not work here). Could you please elaborate on the above command, I'm unable to understand its syntax. I thought all git commands follow the `git command ` syntax, so using simple string manipulations and regexes would work. Am I missing something? >> The wrapper will assess the arguments passed to it, and if they are detected >> to >> be safe, it will simply pass them through to 'git'. >> >> Warning for potentially destructive commands >> >> For every command that is entered, the wrapper will assess the subcommand and >> its options. In that, it will first check if the subcommand (eg. add, >> commit, rebase) is present in a list of predefined 'potentially destructive' >> commands. This can be done by searching through a radix tree for the >> subcommand. >> If found, then the arguments to the subcommand will be checked for specific >> flags. The graylisted flags for the destructive commands will be stored as an >> array of regular expressions, and the current command's arguments will be >> checked against them. If matches are found, a warning is displayed. 'ggit' >> for the warning would be >> "You are about to do X, which will permanently destroy Y. Are you sure you >> wish >> to continue? [Y/n] " >> If the user enters Y[es], the command will be executed as is (by passing it >> unaltered to git). In the case of Y[es], 'ggit' will also give tips for >> undoing >> the changes made by this command (by referring the user to correct commands >> and >> reflog), if the command can be undone. In case the command cannot be undone, >> 'ggit' will display an additional line in the warning like >> "The changes made by this command cannot be undone. Please proceed >> cautiously". >> In the case of n[o], 'ggit' will exit without executing the command. >> Usage tips and documentation >> >> The wrapper will also be responsible for showing a short description of every >> command that is entered through 'ggit'. This shall be done for every command >> unconditionally. > I'm not 100% convinced that this is a good idea: it'd be tempting for > the user to run a command just to know what it does. Perhaps it's better > to let the user run "git -h" instead. But it could indeed help > for commands doing very different things depending on the options, like > > $ git checkout foo > Checks-out branch foo > $ git checkout -b bar > Creating a new branch bar and checking it out > $ git checkout HEAD -- . > Reverting directory . to its last commited state Yes, I did consider that and came up with this: I thought we can have an option like --intro or --doc that will just print the intro snippet for the command without actually running. Though "git -h" is an option, I wasn't inclined towards it as I think sometimes the output from -h may not make sense to a new user. Plus, -h only gives an elaborate list of syntax and options/arguments but not say what the command does. > ... > > (I think a list of examples would be an important addition to your > proposal to clarify the plans) Will do that. >> The description will be derived from the actual documentation, but >> will primarily aim to help the begi
Re: [GSOC/RFC] GSoC Proposal Draft | Git Beginner
Sidhant Sharma <tigerkid...@gmail.com> writes: > Implement a beginner mode for Git. > > Abstract > > Git is a very powerful version control system, with an array of features > that lend the user with great capabilities. But it often so happens that some > beginners are overwhelmed by its complexity and are unable to fully understand > and thus, utilize Git. Moreover, often beginners do not fully understand > the command they are using and end up making destructive (and occasionally, > irreversible) changes to the repository. > > The beginner mode will assist such users in using Git by warning them > before making possibly destructive changes. It will also display tips and > short snippets of documentation for better understanding the Git model. [...] (Google summer of code Idea suggested here: http://git.github.io/SoC-2016-Ideas/#git-beginner ) > A wrapper is to be implemented around (currently called 'ggit'), which will > provide the following user interface: > `ggit ` There's actually already a tool doing this: https://people.gnome.org/~newren/eg/ I'm Cc-ing the author. I heard good feedback about the tool in the early days of Git, when git itself was rather clearly not ready for bare mortals. The tool seems abandonned since 2013 (last release), my guess is that git became usable enough and eg is not needed as much as it was. For example, eg defaulted to push.default=tracking before we did the change to push.default=simple in git. I think the "wrapper" approach is sound. It avoids touching git itself and breaking things that depend on git (for example, adding core.denyHardReset to let "git reset --hard" error out would be unacceptable because it would mean that any script using "git reset --hard" would break when a user has the option set in ~/.gitconfig). Note that it implies writting an almost full-blown option parser to recognize commands like ggit --work-tree git --namespace reset --git-dir --hard git log (just looking for "git", "reset" and "--hard" in the command-line would not work here). Another option would be to have a C implementation of ggit that would reuse the whole git source code, but set a flag "beginner_mode" to true before starting, and then introduce "if (beginner_mode)" within Git's source code. I think the wrapper approach is better since it avoids "polluting" Git's source code itself. > The wrapper will assess the arguments passed to it, and if they are detected > to > be safe, it will simply pass them through to 'git'. > > Warning for potentially destructive commands > > For every command that is entered, the wrapper will assess the subcommand and > its options. In that, it will first check if the subcommand (eg. add, > commit, rebase) is present in a list of predefined 'potentially destructive' > commands. This can be done by searching through a radix tree for the > subcommand. > If found, then the arguments to the subcommand will be checked for specific > flags. The graylisted flags for the destructive commands will be stored as an > array of regular expressions, and the current command's arguments will be > checked against them. If matches are found, a warning is displayed. 'ggit' > for the warning would be > "You are about to do X, which will permanently destroy Y. Are you sure you > wish > to continue? [Y/n] " > If the user enters Y[es], the command will be executed as is (by passing it > unaltered to git). In the case of Y[es], 'ggit' will also give tips for > undoing > the changes made by this command (by referring the user to correct commands > and > reflog), if the command can be undone. In case the command cannot be undone, > 'ggit' will display an additional line in the warning like > "The changes made by this command cannot be undone. Please proceed > cautiously". > In the case of n[o], 'ggit' will exit without executing the command. > Usage tips and documentation > > The wrapper will also be responsible for showing a short description of every > command that is entered through 'ggit'. This shall be done for every command > unconditionally. I'm not 100% convinced that this is a good idea: it'd be tempting for the user to run a command just to know what it does. Perhaps it's better to let the user run "git -h" instead. But it could indeed help for commands doing very different things depending on the options, like $ git checkout foo Checks-out branch foo $ git checkout -b bar Creating a new branch bar and checking it out $ git checkout HEAD -- . Reverting directory . to its last commited state ... (I think a list of examples would be an important addition to your proposal to clarify the plans) > The description will be derived from the actual document
[GSOC/RFC] GSoC Proposal Draft | Git Beginner
Hi, I have drafted my proposal for the project 'Git Beginner', and would like to request your suggestions on improving it. I'm also reading up the Git documentation and the Git ProBook (again) to make notes for the beginner documentation. Would be great to hear your comments on it. Thanks and regards, Sidhant Sharma --- Implement a beginner mode for Git. Abstract Git is a very powerful version control system, with an array of features that lend the user with great capabilities. But it often so happens that some beginners are overwhelmed by its complexity and are unable to fully understand and thus, utilize Git. Moreover, often beginners do not fully understand the command they are using and end up making destructive (and occasionally, irreversible) changes to the repository. The beginner mode will assist such users in using Git by warning them before making possibly destructive changes. It will also display tips and short snippets of documentation for better understanding the Git model. About Me Name : Sidhant Sharma Email [1] : Sidhant.Sharma1208 gmail.com Email [2] : Tigerkid001 gmail.com College : Delhi Technological University Studying : Software Engineering IRC : tk001 (or _tk_) Phone : 91-9990-606-081 Country : India Interests : Computers, Books, Photography Github : Tigerkid001 LinkedIn : https://in.linkedin.com/in/sidhantsharma12 Technical Experience Authored several Mozilla Firefox and Google Chrome extensions. Developed a robust Plugin framework for Android for a startup. Learning Linux kernel programming via the Eudyptula Challenge. Developed natural language processor for sarcasm detection in tweets. Developed gesture detection module as a college minor project. Active Firefox Add-ons Editor at AMO (addons mozilla org). Currently working on a restaurant image classification project as second college minor project. Why I chose Git I have been using Git for about two years now, and it has become an indispensable daily-use tool for me. Getting a chance to participate in GSoC for the first time under Git is very exciting. It will give me an opportunity to intimately know the system and a chance to help in making it better and more powerful. Proposal Ideas Page: Git Beginner The following tasks summarize the project: Implement a wrapper around Git A wrapper is to be implemented around (currently called 'ggit'), which will provide the following user interface: `ggit ` For example, `ggit add --all` The wrapper will assess the arguments passed to it, and if they are detected to be safe, it will simply pass them through to 'git'. Warning for potentially destructive commands For every command that is entered, the wrapper will assess the subcommand and its options. In that, it will first check if the subcommand (eg. add, commit, rebase) is present in a list of predefined 'potentially destructive' commands. This can be done by searching through a radix tree for the subcommand. If found, then the arguments to the subcommand will be checked for specific flags. The graylisted flags for the destructive commands will be stored as an array of regular expressions, and the current command's arguments will be checked against them. If matches are found, a warning is displayed. 'ggit' for the warning would be "You are about to do X, which will permanently destroy Y. Are you sure you wish to continue? [Y/n] " If the user enters Y[es], the command will be executed as is (by passing it unaltered to git). In the case of Y[es], 'ggit' will also give tips for undoing the changes made by this command (by referring the user to correct commands and reflog), if the command can be undone. In case the command cannot be undone, 'ggit' will display an additional line in the warning like "The changes made by this command cannot be undone. Please proceed cautiously". In the case of n[o], 'ggit' will exit without executing the command. Usage tips and documentation The wrapper will also be responsible for showing a short description of every command that is entered through 'ggit'. This shall be done for every command unconditionally. The description will be derived from the actual documentation, but will primarily aim to help the beginner understand the Git workflow and the Git model. Timeline Community Bonding Period Week 1 : Discuss the flow of course with the mentor. Discuss adequate data structures and search techniques to be used. Week 2-3 : Discuss over an extensive list of commands that should be classified as destructive. Discuss appropriate short descriptions for commands. Week 4 : Discuss code structure, tests, optimization for least overhead and other details. Coding Starts Week 1-2 : Submit code for a basic wrapper that will warn for a subset of the potentially destructive command, and continue if the command is safe. and this is stored as per to provide backward compatibility. Week 3-6 : Extend the wrapper to warn for all commands in the list, along with
Re: "Medium" log format: change proposal for author != committer
On Tue, Sep 15, 2015 at 6:52 PM, Junio C Hamanowrote: > > * Enhance the "--pretty=format:" thing so that the current set of >hardcoded --pretty=medium,short,... formats and your modified >"medium" can be expressed as a custom format string. > > * Introduce a configuration mechanism to allow users to define new >short-hand, e.g. if you have this in your $HOME/.gitconfig: > > [pretty "robin"] > format = "commit %H%nAuthor: %an <%ae>%n..." > Afiak there is already support for this.. from "git help config": pretty. Alias for a --pretty= format string, as specified in git-log(1). Any aliases defined here can be used just as the built-in pretty formats could. For example, running git config pretty.changelog "format:* %H %s" would cause the invocation git log --pretty=changelog to be equivalent to running git log "--pretty=format:* %H %s". Note that an alias with the same name as a built-in format will be silently ignored. >and run "git log --pretty=robin", it would behave as if you said >"git log --pretty="format:commit %H%nAuthor: %an <%ae>%n...". > So this should already be supported... but to support "robinsformat" we'd need to be able to "show committer only if different from author"... Not sure how that would work. > * (optional) Replace the hardcoded implementations of pretty >formats with short-hand names like "medium", "short", etc. with a >built-in set of pretty.$name.format using the configuration >mechanism. But we need to make sure this does not hurt >performance for common cases. > This part obviously hasn't been done, I don't know if any particular format is not expressable today by the pretty syntax or not.. But at least configuration does work. I use it as part of displaying the Fixes: ("name") used by the upstream kernel for marking bug fixes of known commits. Thus the only real thing would be implementing a % modifier which allows showing commiter if it's not the same as author. (or vise versa) Ideally we could take work from the ref-filter library and the proposed "%if" stuff but I don't tihnk this was actually implemented yet, and I don't know if that would even work in the pretty modifiers. Regards, Jake -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: "Medium" log format: change proposal for author != committer
"Robin H. Johnson"writes: > Specifically, if the author is NOT the same as the committer, then > display both in the header. Otherwise continue to display only the > author. I too found myself wanting to see both of the names sometimes, and the "fuller" format was added explicitly for that purpose. Even though I agree "show only one, and both only when they are different" is a reasonable and possibly useful format, it is out of question to change what "--pretty=medium" does. It has been with us forever and people and their scripts do rely on it. It would be good if we can say $ git log --pretty=robinsformat but with a better name to show such an output. Having said that, I'm moderately negative about adding it as yet another hard-coded format. We simply have too many, and we do not need one more. What we need instead is a flexible framework to let users get what they want. I think what needs to happen is: * Enhance the "--pretty=format:" thing so that the current set of hardcoded --pretty=medium,short,... formats and your modified "medium" can be expressed as a custom format string. * Introduce a configuration mechanism to allow users to define new short-hand, e.g. if you have this in your $HOME/.gitconfig: [pretty "robin"] format = "commit %H%nAuthor: %an <%ae>%n..." and run "git log --pretty=robin", it would behave as if you said "git log --pretty="format:commit %H%nAuthor: %an <%ae>%n...". * (optional) Replace the hardcoded implementations of pretty formats with short-hand names like "medium", "short", etc. with a built-in set of pretty.$name.format using the configuration mechanism. But we need to make sure this does not hurt performance for common cases. -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
"Medium" log format: change proposal for author != committer
Hi, I want to propose a change to the 'medium' log output format, to improve readability. Specifically, if the author is NOT the same as the committer, then display both in the header. Otherwise continue to display only the author. This would aid quick review of changes in git-log & git-show output. -- Robin Hugh Johnson Gentoo Linux: Developer, Infrastructure Lead E-Mail : robb...@gentoo.org GnuPG FP : 11ACBA4F 4778E3F6 E4EDF38E B27B944E 34884E85 -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Proposal for git stash : add --staged option
Hi again, just wanted to tell that I have created a solution by doing a few lines of scripting: git-cstash ``` #/bin/sh git commit -m 'temporary, will be stashed soon' git stash --include-untracked git reset HEAD^1 git stash git stash pop stash@{1} ``` Le 2015-04-22 11:25, Johannes Schindelin a écrit : Hi Edgar, On 2015-04-22 10:30, edgar.h...@netapsys.fr wrote: When you have a lot of unstaged files, and would like to test what happens if you undo some of the changes that you think are unecessary, you would rather keep a copy of those changes somewhere. For example Changed but not updated: M config_test.xml M config_real.xml I have changed both config_test.xml and config_real.xml, but I think the changes made in config_test.xml are unnecessary. However, I would still like to keep them somewhere in case it breaks something. In this case for example, I would like to be able to stash only the file config_test.xml Eg: git add config_test.xml git stash --staged So that after this, my git looks like this: Changed but not updated: M config_real.xml and my stash contains only the changes introduced in config_test.xml `git stash --keep-index` doesn't give the necessary control, because it will still stash everything (and create unnecessary merge complications if I change the files and apply the stash) I often have the same problem. How about doing this: ```sh git add config_real.xml git stash -k git reset ``` The difference between our approaches is that I keep thinking of the staging area as the place to put changes I want to *keep*, not that I want to forget for a moment. Having said that, I am sympathetic to your cause, although I would rather have `git stash [--patch] -- [file...]` that would be used like `git add -p` except that the selected changes are *not* staged, but stashed instead. Ciao, Johannes -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Please Acknowledge My Proposal!!
Please Acknowledge My Proposal!! My name is Mr. Juan Martin Domingo a lawyer resident in Spain. I am writing to let you know I have some FUNDS I want to transfer and am seeking if you can be a beneficiary...Do not hesitate to Contact me for more information if interested: gva.abogad...@aim.com). Sincerely Juan Martin Domingo. -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Please Acknowledge My Proposal!!
Please Acknowledge My Proposal!! My name is Mr. Juan Martin Domingo a lawyer resident in Spain. I am writing to let you know I have some FUNDS I want to transfer and am seeking if you can be a beneficiary...Do not hesitate to Contact me for more information if interested: gva.abogad...@aim.com). Sincerely Juan Martin Domingo. -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Proposal for git stash : add --staged option
Hi, the ```sh git add config_real.xml git stash -k git reset ``` is not very well suited because the -k option to keep the index. However, the index will still be put inside the stash. So what you propose is equivalent to: ```sh git stash git stash apply stash@\{0\} git checkout --config_test.xml ``` `git stash --patch` can do the job (and I think that's what I'm going to use from now), but it's still a bit cumbersome in some situations. Best, Edgar Le 2015-04-22 11:25, Johannes Schindelin a écrit : Hi Edgar, On 2015-04-22 10:30, edgar.h...@netapsys.fr wrote: When you have a lot of unstaged files, and would like to test what happens if you undo some of the changes that you think are unecessary, you would rather keep a copy of those changes somewhere. For example Changed but not updated: M config_test.xml M config_real.xml I have changed both config_test.xml and config_real.xml, but I think the changes made in config_test.xml are unnecessary. However, I would still like to keep them somewhere in case it breaks something. In this case for example, I would like to be able to stash only the file config_test.xml Eg: git add config_test.xml git stash --staged So that after this, my git looks like this: Changed but not updated: M config_real.xml and my stash contains only the changes introduced in config_test.xml `git stash --keep-index` doesn't give the necessary control, because it will still stash everything (and create unnecessary merge complications if I change the files and apply the stash) I often have the same problem. How about doing this: ```sh git add config_real.xml git stash -k git reset ``` The difference between our approaches is that I keep thinking of the staging area as the place to put changes I want to *keep*, not that I want to forget for a moment. Having said that, I am sympathetic to your cause, although I would rather have `git stash [--patch] -- [file...]` that would be used like `git add -p` except that the selected changes are *not* staged, but stashed instead. Ciao, Johannes -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Proposal for git stash : add --staged option
Hello, There's some feature of git that I have been missing. When you have a lot of unstaged files, and would like to test what happens if you undo some of the changes that you think are unecessary, you would rather keep a copy of those changes somewhere. For example Changed but not updated: M config_test.xml M config_real.xml I have changed both config_test.xml and config_real.xml, but I think the changes made in config_test.xml are unnecessary. However, I would still like to keep them somewhere in case it breaks something. In this case for example, I would like to be able to stash only the file config_test.xml Eg: git add config_test.xml git stash --staged So that after this, my git looks like this: Changed but not updated: M config_real.xml and my stash contains only the changes introduced in config_test.xml `git stash --keep-index` doesn't give the necessary control, because it will still stash everything (and create unnecessary merge complications if I change the files and apply the stash) Best, Edgar -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Proposal for git stash : add --staged option
Hi Edgar, On 2015-04-22 10:30, edgar.h...@netapsys.fr wrote: When you have a lot of unstaged files, and would like to test what happens if you undo some of the changes that you think are unecessary, you would rather keep a copy of those changes somewhere. For example Changed but not updated: M config_test.xml M config_real.xml I have changed both config_test.xml and config_real.xml, but I think the changes made in config_test.xml are unnecessary. However, I would still like to keep them somewhere in case it breaks something. In this case for example, I would like to be able to stash only the file config_test.xml Eg: git add config_test.xml git stash --staged So that after this, my git looks like this: Changed but not updated: M config_real.xml and my stash contains only the changes introduced in config_test.xml `git stash --keep-index` doesn't give the necessary control, because it will still stash everything (and create unnecessary merge complications if I change the files and apply the stash) I often have the same problem. How about doing this: ```sh git add config_real.xml git stash -k git reset ``` The difference between our approaches is that I keep thinking of the staging area as the place to put changes I want to *keep*, not that I want to forget for a moment. Having said that, I am sympathetic to your cause, although I would rather have `git stash [--patch] -- [file...]` that would be used like `git add -p` except that the selected changes are *not* staged, but stashed instead. Ciao, Johannes -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC/GSoC] Proposal Draft: Unifying git branch -l, git tag -l, and git for-each-ref
On 03/26/2015 10:07 PM, Jeff King wrote: On Mon, Mar 23, 2015 at 06:39:20PM +0530, karthik nayak wrote: All three commands select a subset of the repository’s refs and print the result. There has been an attempt to unify these commands by Jeff King[3]. I plan on continuing his work[4] and using his approach to tackle this project. I would be cautious about the work in my for-each-ref-contains-wip branch. At one point it was reasonably solid, but it's now a year and a half old, and I've been rebasing it without paying _too_ much attention to correctness. I think some subtle bugs have been introduced as it has been carried forward. Also, the very first patch (factoring out the contains traversal) is probably better served by this series: http://thread.gmane.org/gmane.comp.version-control.git/252472 I don't remember all of the issues offhand that need to be addressed in it, but there were plenty of review comments. Thanks for the link, will go through that. For extended selection behaviour such as ‘--contains’ or ‘--merged’ we could implement these within the library by providing functions which closely mimic the current methods used individually by ‘branch -l’ and ‘tag -l’. For eg to implement ‘--merged’ we implement a ‘compute_merge()’ function, which with the help of the revision API’s will be able to perform the same function as ‘branch -l --merged’. One trick with making a library-like interface is that some of the selection routines can work on a streaming list of refs (i.e., as we see each ref we can say yes or no) and some must wait until the end (e.g., --merged does a single big merge traversal). It's probably not the end of the world to just always collect all the refs, then filter them, then sort them, then print them. It may delay the start of output in some minor cases, but I doubt that's a big deal (and anyway, the packed-refs code will load them all into an array anyway, so collecting them in a second array is probably not a big deal). I think I noted this down while going through your implementation also. You even mentioned this on the mailing list if I'm not wrong. Will have to work out a design around this and think about it more. For formatting functionality provided by ‘for-each-ref’ we replicate the ‘show_ref’ function in ‘for-each-ref.c’ where the format is given to the function and the function uses the format to obtain atom values and prints the corresponding atom values to the screen. This feature would allow us to provide format functionality which could act as a base for the ‘-v’ option also. Yeah, I'd really like to see --format for git branch, and have -v just feed that a hard-coded format string (or even a configurable one). Although Jeff has built a really good base to build upon, I shall use his work as more of a reference and work on unification of the three commands from scratch. Good. :) -Peff Thanks for the Review/Tips. Regards -Karthik -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC/GSoC] Proposal Draft: Unifying git branch -l, git tag -l, and git for-each-ref
On Mon, Mar 23, 2015 at 06:39:20PM +0530, karthik nayak wrote: All three commands select a subset of the repository’s refs and print the result. There has been an attempt to unify these commands by Jeff King[3]. I plan on continuing his work[4] and using his approach to tackle this project. I would be cautious about the work in my for-each-ref-contains-wip branch. At one point it was reasonably solid, but it's now a year and a half old, and I've been rebasing it without paying _too_ much attention to correctness. I think some subtle bugs have been introduced as it has been carried forward. Also, the very first patch (factoring out the contains traversal) is probably better served by this series: http://thread.gmane.org/gmane.comp.version-control.git/252472 I don't remember all of the issues offhand that need to be addressed in it, but there were plenty of review comments. For extended selection behaviour such as ‘--contains’ or ‘--merged’ we could implement these within the library by providing functions which closely mimic the current methods used individually by ‘branch -l’ and ‘tag -l’. For eg to implement ‘--merged’ we implement a ‘compute_merge()’ function, which with the help of the revision API’s will be able to perform the same function as ‘branch -l --merged’. One trick with making a library-like interface is that some of the selection routines can work on a streaming list of refs (i.e., as we see each ref we can say yes or no) and some must wait until the end (e.g., --merged does a single big merge traversal). It's probably not the end of the world to just always collect all the refs, then filter them, then sort them, then print them. It may delay the start of output in some minor cases, but I doubt that's a big deal (and anyway, the packed-refs code will load them all into an array anyway, so collecting them in a second array is probably not a big deal). For formatting functionality provided by ‘for-each-ref’ we replicate the ‘show_ref’ function in ‘for-each-ref.c’ where the format is given to the function and the function uses the format to obtain atom values and prints the corresponding atom values to the screen. This feature would allow us to provide format functionality which could act as a base for the ‘-v’ option also. Yeah, I'd really like to see --format for git branch, and have -v just feed that a hard-coded format string (or even a configurable one). Although Jeff has built a really good base to build upon, I shall use his work as more of a reference and work on unification of the three commands from scratch. Good. :) -Peff -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[RFC/GSoC v2] Proposal: Make git-pull and git-am builtins
Since the deadline is fast approaching, and I've read that google-melange usually becomes overwhelmed near the deadline, I'll try to iterate on the proposal as much as possible. Below is v2, mostly small changes in response to Matthieu's and Junio's reviews. The changes are as follows: * Make it clear that zero spawning of processes is an ideal -- it doesn't have to be so in practice. * Swap rewrite of git-pull and git-am in timeline. It is better to push the first patch to the mailing list as soon as possible. * Make it clear that as part of refactoring, commonly recurring patterns can be codified and implemented in the internal git API. * Add microproject v5. * Make it clear that Windows is not the only one that has poor IO performance. Poor IO performance can stem from the choice of operating system, filesystem and underlying storage performance. * Cite filesystem cache feature in git for windows. * Remove section-numbering directive, github messes it up. * State what the end-product of the final stage is. The updated version is also in the gist[1]. [1] https://gist.github.com/pyokagan/1b7b0d1f4dab6ba3cef1 --8-- Make `git-pull` and `git-am` builtins :Abstract: `git-pull` and `git-am` are frequently used git subcommands. However, they are porcelain commands and implemented as shell scripts, which has some limitations which can cause poor performance, especially in non-POSIX environments like Windows. I propose to rewrite these scripts into low level C code and make them builtins. This will increase git's portability, and may improve the efficiency and performance of these commands. .. section-numbering:: Limitations of shell scripts = `git-pull` is a commonly executed command to check for new changes in the upstream repository and, if there are, fetch and integrate them into the current branch. `git-am` is another commonly executed command for applying a series of patches from a mailbox to the current branch. They are both git porcelain commands -- they have no access to git's low level internal API. Currently, they are implemented by the shell scripts ``git-pull.sh`` and ``git-am.sh`` respectively. These shell scripts require a fully-functioning POSIX shell and utilities. As a result, these commands are difficult to port to non-POSIX environments like Windows. Since porcelain commands do not have access to git's internal API, performing any git-related function, no matter how trivial, requires git to be spawned in a separate process. This limitation leads to these git commands being relatively inefficient, and can cause long run times on certain platforms that do not have copy-on-write ``fork()`` semantics. Spawning processes can be slow --- Shell scripting, by itself, is severely limited in what it can do. Performing most operations in shell scripts require external executables to be called. For example, ``git-pull.sh`` spawns the git executable not only to perform complex git operations like `git-fetch` and `git-merge`, but it also spawns the git executable for trivial tasks such as retrieving configuration values with `git-config` and even quoting of command-line arguments with ``git rev-parse --sq-quote``. As a result, these shell scripts usually end up spawning a lot of processes. Process spawning is usually implemented as a ``fork()`` followed by an ``exec()`` by shells. This can be slow on systems that do not support copy-on-write semantics for ``fork()``, and thus needs to duplicate the memory of the parent process for every ``fork()`` call -- an expensive process. Furthermore, starting up processes on Windows is generally expensive as it performs `several extra steps`_ such as such as using an inter-process call to notify the Windows Client/Server Runtime Subsystem(CSRSS) about the process creation and checking for App Compatibility requirements. .. _`several extra steps`: http://www.microsoft.com/mspress/books/sampchap/4354a.aspx The official Windows port of git, Git for Windows, uses MSYS2 [#]_ to emulate ``fork()``. Since Windows does not support forking semantics natively, MSYS2 can only emulate ``fork()`` `without copy-on-write semantics`_. Coupled with Windows heavy process creation, this causes huge slowdowns of git on Windows. .. _`without copy-on-write semantics`: https://www.cygwin.com/faq.html#faq.api.fork A no-updates `git-pull`, for example, takes an average of 5.1s [#]_, as compared to Linux which only takes an average of 0.08s. 5 seconds, while seemingly short, would seem like an eternity to a user who just wants to quickly fetch and merge changes from upstream. `git-am`'s implementation reads each patch from the mailbox in a while loop, spawning many processes for each patch. Considering the cost of spawning each process, as well
Re: [RFC/GSoC] Proposal: Make git-pull and git-am builtins
Paul Tan pyoka...@gmail.com writes: I think it's still good to have the ideal in mind though (and whoops I forgot to put in the word ideal in the text). Using or not using fork is merely one of the trade-offs we can make. If all other things are equal, no fork is better than a fork is a meaningless statement, as all other things are never equal in real life---doing things internally will have a cost of having to clean up and a risk to get that part wrong, for example. Engineering is a fine balancing act and setting an absolute goal is not a healthy attitude. -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC/GSoC] Proposal: Make git-pull and git-am builtins
On 24.03.2015 17:37, Paul Tan wrote: I'm applying for git in the Google Summer of Code this year. For my project, I propose to rewrite git-pull.sh and git-am.sh into fast optimized C builtins. I've already hacked up a prototype of a builtin git-pull in [1], and it showed a promising 8x improvement in execution time on Windows. I cannot thank you enough for starting this effort. As one of the project owners of Git for Windows I can confirm the (shell) script Git commands to be a major source of pain. I really hope your proposal gets accepted and you'll be able to successfully complete this task. All the best! -- Sebastian Schuberth -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC/GSoC] Proposal: Make git-pull and git-am builtins
On Thu, Mar 26, 2015 at 1:54 AM, Junio C Hamano gits...@pobox.com wrote: Paul Tan pyoka...@gmail.com writes: I think it's still good to have the ideal in mind though (and whoops I forgot to put in the word ideal in the text). Using or not using fork is merely one of the trade-offs we can make. If all other things are equal, no fork is better than a fork is a meaningless statement, as all other things are never equal in real life---doing things internally will have a cost of having to clean up and a risk to get that part wrong, for example. Engineering is a fine balancing act and setting an absolute goal is not a healthy attitude. No, I do not mean all other things are equal, I meant all other things are ideal, meaning that human factors are not involved. I thought we were in agreement that calling functions in the internal API is technically superior to forking, assuming that there are no bugs or errors. Isn't this one of the reasons why libgit2 exists? If for whatever reason spawning an external git process is chosen, it would be because rewriting all the code paths without committing any errors would take too much effort. I will switch the word requirements to the word guidelines to make it sound less strict. However my above point still stands. -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC/GSoC] Proposal: Make git-pull and git-am builtins
Hi, On Wed, Mar 25, 2015 at 2:37 AM, Junio C Hamano gits...@pobox.com wrote: Paul Tan pyoka...@gmail.com writes: ..., I propose the following requirements for the rewritten code: 1. No spawning of external git processes. This is to support systems with high ``fork()`` or process creation overhead, and to reduce redundant IO by taking advantage of the internal object, index and configuration cache. I suspect this may probably be too strict in practice. True, we should never say run_command_capture() just to to read from git rev-parse---we should just call get_sha1() instead. But for a complex command whose execution itself far outweighs the cost of forking, I do not think it is fair to say your project failed if you chose to run_command() it. For example, it may be perfectly OK to invoke git merge via run_command(). Yes, which is why I proposed writing a baseline using only the run-command APIs first. Any other optimizations can then be done selectively after that. I think it's still good to have the ideal in mind though (and whoops I forgot to put in the word ideal in the text). 3. The resulting builtin should not have wildly different behavior or bugs compared to the shell script. This on the other hand is way too loose. The original and the port must behave identically, unless the difference is fixing bugs in the original. I was considering that there may be slight behavioral changes when the rewritten code is modified to take greater advantage of the internal API, especially since some builtins due to historical issues, may have duplicated code from the internal API[1]. [1] I'm suspecting that the implementation of --merge-base in show-branch.c re-implements get_octopus_merge_bases(). Potential difficulties === Rewriting code may introduce bugs ... Yes, but that is a reasonable risk you need to manage to gain the benefit from this project. Of course, the downside of following this too strictly is that if there were any logical bugs in the original code, or if the original code is unclear, the rewritten code would inherit these problems too. I'd repeat my comment on the 3. above. Identifying and fixing bugs is great, but otherwise don't worry about this too much. Being bug-to-bug compatible with the original is way better than introducing new bugs of an unknown nature. Well yes, but I was thinking that if there are any glaring errors in the original source then it would be better to fix these errors during the rewrite than wasting time writing code that replicates these errors. Rewritten code may become harder to understand ... And also it may become harder to modify. That is the largest problem with any rewrite, and we should spend the most effort to avoid it. A new bugs introduced we can later fix as long as the result is understandable and maintainable. For the purpose of reducing git's dependencies, the rewritten C code should not depend on other libraries or executables other than what is already available to git builtins. Perhaps misphrased; see below. In this case I was thinking of making git depend on another project. (e.g, using an external regular expression library). Of course a balance has to be made in this aspect (thus the use of should not), but git-pull and git-am are relatively simple so there should be no need for that, We can see that the C version requires much more lines compared to the shell pipeline,... That is something you would solve by introducing reusable code in run_command API, isn't it? That is how various rewrites in the past did, and this project should do so too. You should aim to do this project by not just using what is already available, but adding what you discover is a useful reusable pattern into a set of new functions in the already available API set. Whoops, forgot to mention that here. (A brief mention was made on this kind of refactoring in the Development Approach). Thank you for your review. -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[RFC/GSoC] Proposal: Make git-pull and git-am builtins
Hi all, I'm applying for git in the Google Summer of Code this year. For my project, I propose to rewrite git-pull.sh and git-am.sh into fast optimized C builtins. I've already hacked up a prototype of a builtin git-pull in [1], and it showed a promising 8x improvement in execution time on Windows. Below is the full text of the proposal as submitted to google-melange for your review and feedback. It is marked up in reStructuredText. The latest (and rendered) version can be found at [2]. Regards, Paul. [1] http://thread.gmane.org/gmane.comp.version-control.git/265628 [2] https://gist.github.com/pyokagan/1b7b0d1f4dab6ba3cef1 (Thanks Matthieu for suggesting to post this on the mailing list. Will reply to your comments in a separate email). --8-- Make `git-pull` and `git-am` builtins :Abstract: `git-pull` and `git-am` are frequently used git subcommands. However, they are porcelain commands and implemented as shell scripts, which has some limitations which can cause poor performance, especially in non-POSIX environments like Windows. I propose to rewrite these scripts into low level C code and make them builtins. This will increase git's portability, and may improve the efficiency and performance of these commands. .. section-numbering:: Limitations of shell scripts = `git-pull` is a commonly executed command to check for new changes in the upstream repository and, if there are, fetch and integrate them into the current branch. `git-am` is another commonly executed command for applying a series of patches from a mailbox to the current branch. They are both git porcelain commands -- with no access to git's low level internal API. Currently, they are implemented by the shell scripts ``git-pull.sh`` and ``git-am.sh`` respectively. These shell scripts require a fully-functioning POSIX shell and utilities. As a result, these commands are difficult to port to non-POSIX environments like Windows. Since porcelain commands do not have access to git's internal API, performing any git-related function, no matter how trivial, requires git to be spawned in a separate process. This limitation leads to these git commands being relatively inefficient, and can cause long run times on certain platforms that do not have copy-on-write ``fork()`` semantics. Spawning processes can be slow --- Shell scripting, by itself, is severely limited in what it can do. Performing most operations in shell scripts require external executables to be called. For example, ``git-pull.sh`` spawns the git executable not only to perform git operations like `git-fetch` and `git-merge`, but it also spawns the git executable for trivial tasks such as retrieving configuration values with `git-config` and even quoting of command-line arguments with ``git rev-parse --sq-quote``. As a result, these shell scripts usually end up spawning a lot of processes. Process spawning is usually implemented as a ``fork()`` followed by an ``exec()`` by shells. This can be slow on systems that do not support copy-on-write semantics for ``fork()``, and thus needs to duplicate the memory of the parent process for every ``fork()`` call -- an expensive process. Furthermore, starting up processes on Windows is generally expensive as it performs `several extra steps`_ such as such as using an inter-process call to notify the Windows Client/Server Runtime Subsystem(CSRSS) about the process creation and checking for App Compatibility requirements. .. _`several extra steps`: http://www.microsoft.com/mspress/books/sampchap/4354a.aspx The official Windows port of git, Git for Windows, uses MSYS2 [#]_ to emulate ``fork()``. Since Windows does not support forking semantics natively, MSYS2 can only emulate ``fork()`` `without copy-on-write semantics`_. Coupled with Windows heavy process creation, this causes huge slowdowns of git on Windows. .. _`without copy-on-write semantics`: https://www.cygwin.com/faq.html#faq.api.fork A no-updates `git-pull`, for example, takes an average of 5.1s [#]_, as compared to Linux which only takes an average of 0.08s. 5 seconds, while seemingly short, would seem like an eternity to a user who just wants to quickly fetch and merge changes from upstream. `git-am`'s implementation reads each patch from the mailbox in a while loop, spawning many processes for each patch. Considering the cost of spawning each process, as well as the fact that runtime grows linearly with the number of patches, git-am takes a long time to process a seemingly small number of patches on Windows as compared to Linux. A quick benchmarks shows that `git-am` takes 7m 20.39s to apply 100 patches on Windows, compared to Linux, which took only 0.08s. Commands which call `git-am` are also affected as well. ``git-rebase--am.sh``, which implements the default
Re: [RFC/GSoC] Proposal: Make git-pull and git-am builtins
On Tue, Mar 24, 2015 at 6:19 PM, Matthieu Moy matthieu@grenoble-inp.fr wrote: A few minor details: on operating systems with poor file system performance (i.e. Windows) = that's not only windows, I also commonly use a slow filesystem on Linux, just because it's NFS. Mentionning other cases of poor filesystem performance would show that the benefit is not limited to windows users, and would give less of a taste of windows-bashing. Ah right, I didn't think of network file systems. Thanks for the suggestion. About the timeline: I'd avoid too much parallelism. Usually, it's best to try to send a first patch to the mailing list as soon as possible, hence focus on one point first (I'd do that with pull, since that's the one which is already started). Then, you can parallelize coding on git am and the discussion on the pull patches. Whatever you plan, review and polishing takes more than that ;-). The risk is to end up with an almost good but not good enough to be mergeable code. That said, your timeline does plan patches and review early, so I'm not too worried. Well, I was thinking that after the full rewrite (2nd stage, halfway through the project), any optimizations made to the code will be done iteratively (and in separate small patches) so as to keep the patch series in an always almost mergeable state. This will hopefully make it much easier and shorter to do any final polishing and review for merging. A general advice: if time allows, try to contribute to discussions and review other than your own patches. It's nice to feel integrated in the community and not the GSoC student working alone at home ;-). Yeah I apologize for not participating in the list so actively because writing the git-pull prototype and the proposal took a fair chunk of my time. Also, my expertise with the code base is not that great yet so it takes quite a bit more effort for me to contribute constructively, but I expect that will improve in the future. Now that the proposal is more or less complete I can spend more time on discussions. Thanks, Paul -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC/GSoC] Proposal: Make git-pull and git-am builtins
Paul Tan pyoka...@gmail.com writes: ..., I propose the following requirements for the rewritten code: 1. No spawning of external git processes. This is to support systems with high ``fork()`` or process creation overhead, and to reduce redundant IO by taking advantage of the internal object, index and configuration cache. I suspect this may probably be too strict in practice. True, we should never say run_command_capture() just to to read from git rev-parse---we should just call get_sha1() instead. But for a complex command whose execution itself far outweighs the cost of forking, I do not think it is fair to say your project failed if you chose to run_command() it. For example, it may be perfectly OK to invoke git merge via run_command(). 3. The resulting builtin should not have wildly different behavior or bugs compared to the shell script. This on the other hand is way too loose. The original and the port must behave identically, unless the difference is fixing bugs in the original. Potential difficulties === Rewriting code may introduce bugs ... Yes, but that is a reasonable risk you need to manage to gain the benefit from this project. Of course, the downside of following this too strictly is that if there were any logical bugs in the original code, or if the original code is unclear, the rewritten code would inherit these problems too. I'd repeat my comment on the 3. above. Identifying and fixing bugs is great, but otherwise don't worry about this too much. Being bug-to-bug compatible with the original is way better than introducing new bugs of an unknown nature. Rewritten code may become harder to understand ... And also it may become harder to modify. That is the largest problem with any rewrite, and we should spend the most effort to avoid it. A new bugs introduced we can later fix as long as the result is understandable and maintainable. For the purpose of reducing git's dependencies, the rewritten C code should not depend on other libraries or executables other than what is already available to git builtins. Perhaps misphrased; see below. We can see that the C version requires much more lines compared to the shell pipeline,... That is something you would solve by introducing reusable code in run_command API, isn't it? That is how various rewrites in the past did, and this project should do so too. You should aim to do this project by not just using what is already available, but adding what you discover is a useful reusable pattern into a set of new functions in the already available API set. -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC/GSoC] Proposal: Make git-pull and git-am builtins
Paul Tan pyoka...@gmail.com writes: On Tue, Mar 24, 2015 at 6:19 PM, Matthieu Moy matthieu@grenoble-inp.fr wrote: About the timeline: I'd avoid too much parallelism. Usually, it's best to try to send a first patch to the mailing list as soon as possible, hence focus on one point first (I'd do that with pull, since that's the one which is already started). Then, you can parallelize coding on git am and the discussion on the pull patches. Whatever you plan, review and polishing takes more than that ;-). The risk is to end up with an almost good but not good enough to be mergeable code. That said, your timeline does plan patches and review early, so I'm not too worried. Well, I was thinking that after the full rewrite (2nd stage, halfway through the project), any optimizations made to the code will be done iteratively (and in separate small patches) Yes, that's why I'm not too worried. But being able to say this part is done, it won't disturb me anymore ASAP is still good IMHO, even if this part is not so big. But again, I'm thinking out loudly, feel free to ignore. A general advice: if time allows, try to contribute to discussions and review other than your own patches. It's nice to feel integrated in the community and not the GSoC student working alone at home ;-). Yeah I apologize for not participating in the list so actively because writing the git-pull prototype and the proposal took a fair chunk of my time. Don't apologize, you're doing great. I'm only pointing out things that could be even better, but certainly not blaming you! -- Matthieu Moy http://www-verimag.imag.fr/~moy/ -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[RFC/GSoC] Proposal Draft: Unifying git branch -l, git tag -l, and git for-each-ref
Hello, I have completed the micro project[1] and have also been working on adding a --literally option for cat-file[2]. I have left out the personal information part of the proposal here, will fill that in while submitting my final proposal. Currently, I have been reading about how branch -l, “tag -l” and “for-each-refs” work and how they implement the selection and formatting options. Since this is a draft for my final proposal I would love to hear from you all about : * Suggestions on my take of this idea and how I could improve it or modify it. * Anything more I might have missed out on, in the proposal. GSOC Proposal : Unifying git branch -l, git tag -l, and git for-each-ref # Main objectives of the project: * Build a common library for which can handle both selection and formatting of refs. * Use this library throughout ‘branch -l’, ‘tag -l’ and ‘for-each-ref’. * Implement options available in some of these commands onto others. # Amongst ‘branch -l’, ‘tag -l’ and ‘for-each-ref’ : * ‘git branch -l’ and ‘git tag -l’ share the ‘--contains’ option. * 'git tag' and 'git branch' could use a formatting option (This could also be used to implement the verbose options) For eg: git branch -v could be implemented using : git for-each-ref refs/heads --format='%(refname:short) %(objectname:short) %(upstream:track) %(contents:subject)' This shows that having a formatting option for these two would mean that the verbose options could be implemented using the formatting option itself. * 'git for-each-refs' could use all the selection options. This would enhance the uses of for-each-refs itself. Users can then view only refs based on what they may be looking for. * formatting options for ‘git branch -l’ and ‘git tag -l’. This would enable the user to view information as per the users requirements and format. # Approach All three commands select a subset of the repository’s refs and print the result. There has been an attempt to unify these commands by Jeff King[3]. I plan on continuing his work[4] and using his approach to tackle this project. As per the common library for ‘branch -l’, ‘tag -l’ and ‘for-each-ref’ I plan on creating a file (mostly as ref-filter.c in terms with what Jeff has already done) which will provide API’s to add refs to get a list of all refs. This will be used along with ‘for_each_*_ref’ for obtaining the refs required. This gives us the basic functionality of obtaining the refs required by the command. Here we could have a basic data structure (struct ref_filter_item) which would denote a particular ref and have another data structure to hold a list of these refs (struct ref_filter). Then after getting the required refs, we could print the information. For extended selection behaviour such as ‘--contains’ or ‘--merged’ we could implement these within the library by providing functions which closely mimic the current methods used individually by ‘branch -l’ and ‘tag -l’. For eg to implement ‘--merged’ we implement a ‘compute_merge()’ function, which with the help of the revision API’s will be able to perform the same function as ‘branch -l --merged’. For formatting functionality provided by ‘for-each-ref’ we replicate the ‘show_ref’ function in ‘for-each-ref.c’ where the format is given to the function and the function uses the format to obtain atom values and prints the corresponding atom values to the screen. This feature would allow us to provide format functionality which could act as a base for the ‘-v’ option also. As Jeff has already done, we could also add parse options. Although Jeff has built a really good base to build upon, I shall use his work as more of a reference and work on unification of the three commands from scratch. I plan on coding for this project using a test driven development, where I will write tests (initially failing) which will be based on the objectives of the project and then write code to pass those tests. # Timeline This is a rough plan of how I will spend the summer working on this project. Community bonding period: Work on understanding how all three commands work in total detail. And build up on the design of unification of the three commands. Read through Jeff’s attempt at unification and get a grasp of what to do. Week 1 : Write tests and documentation which will the goal of this project. This will set
Re: About the proposal format of GSoc 2015
Shanti Swarup Tunga b112...@iiit-bh.ac.in writes: hey I am Shanti Swarup Tunga . I want to know is there any proposal format for Git .If not what should we focus in the proposal . You probably already found http://git.github.io/SoC-2015-Ideas.html There's no particular requirement on the format other than the ones there. -- Matthieu Moy http://www-verimag.imag.fr/~moy/ -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
About the proposal format of GSoc 2015
hey I am Shanti Swarup Tunga . I want to know is there any proposal format for Git .If not what should we focus in the proposal . -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Fwd: [RFC] [GSoC Proposal Draft] Unifying git branch -l,git tag -l and git for-each-ref
Hi all, I have attempted a microproject [1][2] and this is my first draft of the proposal.I have included only the matter regarding my approach to solving the problem and shall add my personal details later. Please be kind enough to go through my proposal and suggest modifications or detailing wherever required. Also give me feedback on whether my approach to solving the problem is correct. In the meantime I am reading up the code of Jeff's attempt at unification,here [3] for preparing my final proposal. Title - Unifying git branch -l,git tag -l and git for-each-ref Abstract git for-each-ref and the list modes of git branch and git tag involve selecting a subset of the refs and printing out the result. Currently the implementations are not shared in the sense that :- SELECTION 1. git branch knows --contains,--merged and --no-merged 2. git tag only knows --contains FORMATTING 1. git for-each-ref knows formatting which none of the other two commands know. SORTING 1. git tag knows sorting only on the basis of refnames 2. git for-each-ref knows sorting on the basis of all the fieldnames which can be used in its --format option The idea is to unify the computations for these processes in a common library and teach these commands all these options uniformly. Why do we need unification? These commands try to accomplish more or less the same thing . So,new features would most likely be applicable to all three of them. So, unification will allow us build new features for all these commands in one go instead of doing it separately for each of the three commands. Jeff has already worked quite a bit on unifying the selection part. I shall use that work as a starting point when I start off building the library and its API calls. Deliverables 1. The unified library will borrow the --contains implementation from git tag (due to the speed up it had received), the --merged/--no-merged implementation from git branch and the --format implementation from git for-each-ref. 2. The commands will then be taught these options by making calls to this library functions and structures. 3. Add documentation and tests for these new features. Optionals - 1. Implement the --sort option for these commands in the unified library. 2. Add documentation and tests for this feature Approach The common library will contain a structure which will store the present state of the list of refs in the sense that after we perform a computation(eg. --contains commit) on the list of refs, the new list will store the result of that computation. The structure will also have other attributes which the options structure will take in as its (void *)value attribute’s value before parsing the different options. This is to communicate to the structure about the various options(eg. --merged, --format, --sort) we want to use. The list of refs shall be fetched by the API in accordance with the command(eg. git tag) and its option(eg. --merged) which were passed to the API. Next comes the matter of printing out the results according to the format specified (the default format for the command if no format is specified). This will be done in a method similar to how git for-each-ref prints out the results in the given format. Approximate Timeline (To estimate the amount of work that can be done in summers though it may change during the project[based on advice from mentors]) May 03 - May 10 Read and understand the implementation of --contains option in git tag and the --merged/--no-merged implementation in git branch. May 11 - May 17 Go through Jeff’s work on unification to get detailed pointers on how to start with unifying selection. Finalise all the structures required and also the API calls the library would have to make for the selection options. May 18 - May 24 Start working on the API. Discuss ideas with mentor, brainstorm on the details of what function calls will be made to the API and what function calls will be made by the API. CODING PERIOD BEGINS May 25 - May 31 Implement the --contains option in the library by taking the cue from how git tag --contains is implemented. June 1 - June 7 Implement the -merge and --no-merged options similar to how they are implemented in git branch June 8 - June 11 Make computations more efficient, improve comments and start documentation. Discuss about additional features and requirements with mentors. June 12 - June 25 Teach the three commands to use the API for formatting and sorting. Add tests and refactor the code of the API if required. Complete the documentation for the new features added. MID-TERM EVALUATION June 26 - June 30 Discuss with mentors about the state and the pace with which the project is coming on. Start finalising the details of the further goals to be accomplished. July 01 - July 07 Start working on the formatting
Re: Feature Proposal: Track all branches from a given remote
Hi Brian: [remote origin] fetch = refs/heads/*:refs/heads/* Yes, you're right, this works just fine as long as I move out from a branch that's not in the remote in question, for example by doing: git checkout -b nothing git fetch - OR - git pull Do you think there would be any interest in a patch that added this as a simple command line option, though? I guess the idea of this patch then would simply change this line in the .git/config file for the length of the operation (and specified remote), execute the git pull command, and then reset the configuration after the command finished. (There really wouldn't be a need to affect the configuration on the filesystem - simply the effective configuration used while git is running for this operation). Thanks, ~Scott -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Feature Proposal: Track all branches from a given remote
Scott Johnson jayw...@gmail.com writes: Do you think there would be any interest in a patch that added this as a simple command line option, though? I guess the idea of this patch then would simply change this line in the .git/config file for the length of the operation (and specified remote), execute the git pull command, and then reset the configuration after the command finished. There is no need to modify the configuration, you can pass the fetch spec on the command line. Andreas. -- Andreas Schwab, sch...@linux-m68k.org GPG Key fingerprint = 58CA 54C7 6D53 942B 1756 01D3 44D5 214B 8276 4ED5 And now for something completely different. -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Feature Proposal: Track all branches from a given remote
Hello git experts: Recently, I've encountered the problem where I would like to set my local repository copy to track all branches on a given remote. There does not appear to be a switch for this in the git-branch command currently, however, I will admit that my somewhat limited understanding of the git-branch manpage might be causing me simply not to see it. It seems as though this is a use case that some users of git encounter now and then, as illustrated by this post: http://stackoverflow.com/a/6300386/281460 I was thinking that it might be useful to add a new option to git branch, perhaps something like: git-branch --track-remote remotename Where remotename specifies a given remote, and the command will track all branches remotes/remotename/* to refs/heads/*. So, for example, if I were to run: git-branch --track-remote origin and I had two branches on origin, master and maint, respectively, after the command finishes, my local repo would now have two branches, master (set up to track origin/master), and maint (setup to track origin/maint). I'm not entirely sure how to handle naming conflicts, for example if 'maint' already existed on another remote, and was set up to track from that remote previous to this invocation of the command. If I were to start work on a patch, would there be any interest in this feature, or are there reasons why it isn't currently implemented? Thank you, ~Scott Johnson -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Feature Proposal: Track all branches from a given remote
On Sat, Oct 25, 2014 at 04:34:30PM -0700, Scott Johnson wrote: Hello git experts: Recently, I've encountered the problem where I would like to set my local repository copy to track all branches on a given remote. There does not appear to be a switch for this in the git-branch command currently, however, I will admit that my somewhat limited understanding of the git-branch manpage might be causing me simply not to see it. I don't know about a command line option for this, but I think there's a way to achieve what you're looking for. So, for example, if I were to run: git-branch --track-remote origin and I had two branches on origin, master and maint, respectively, after the command finishes, my local repo would now have two branches, master (set up to track origin/master), and maint (setup to track origin/maint). You could do something like this in .git/config: [remote origin] fetch = refs/heads/*:refs/heads/* You won't be able to fetch if you would overwrite the current branch, though. -- brian m. carlson / brian with sandals: Houston, Texas, US +1 832 623 2791 | http://www.crustytoothpaste.net/~bmc | My opinion only OpenPGP: RSA v4 4096b: 88AC E9B2 9196 305B A994 7552 F1BA 225C 0223 B187 signature.asc Description: Digital signature
I'VE A FINANCIAL PROPOSAL FOR YOU. ARE YOU INTERESTED?
I'M SORRY I CANNOT GIVE YOU IMMEDIATE DETAILS ON THE ISSUE UNTIL I CONFIRM YOUR INTEREST. BE ATTENTIVE TO THE SUBJECT LINE AND SEND YOUR REPLY ON SAME MAIL TRAIL TO AID CONTINUITY. REGARDS, MR. PETER KREMER -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Proposal for pruning tags
On 06/05/2014 04:51 PM, Robert Dailey wrote: I've never contributed to the Git project before. I'm a Windows user, so I use msysgit, but I'd be happy to install linux just so I can help implement this feature if everyone feels it would be useful. Right now AFAIK, there is no way to prune tags through Git. The way I currently do it is like so: $ git tag -l | xargs git tag -d $ git fetch --all Junio explained some limitations of tags (namely that there is only one tags namespace that is shared project-wide) that makes your wish impossible to implement the way it works for branches. Local tags are awkward for the same reason. It is too easy to push them accidentally to a central repository and too hard to delete them after that has happened. They kindof spread virally, as you have noticed. I recommend against using local tags in general. Recent Git does have a feature that might help you. *If* you have a central repository that is authoritative WRT tags, then you can sync the tags in your local repository to the tags in the central repo using git fetch --prune $REMOTE +refs/tags/*:refs/tags/* You might also be able to use a pre-receive hook on the central repo to prevent tags from being pushed by people who shouldn't be doing so, or to require that tags have an approved format (like refs/tags/release-\d+\.\d+\.\d+ or whatever) to try to prevent a recurrence of the problem. Michael -- Michael Haggerty mhag...@alum.mit.edu http://softwareswirl.blogspot.com/ -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Proposal for pruning tags
On Thu, Jun 5, 2014 at 3:50 PM, Junio C Hamano gits...@pobox.com wrote: I think you need to explain what you mean by prune a lot better than what you are doing in your message to be understood by others. After seeing the above two commands, my *guess* of what you want to do is to remove any of your local tag that is *not* present in the repository you usually fetch from (aka origin), but that directly contradicts with what you said you wish, i.e. This is not only wasteful, but dangerous. I might accidentally delete a local tag I haven't pushed yet... which only shows that your definition of prune is different from remove what I do not have at 'origin'. But it does not say *how* that is different. How should prune behave differently from the two commands above? How does your prune decide a tag needs to be removed locally when it is not at your origin [*1*]? There is *nothing* in git that lets you look at a local tag that is missing from the other side and determine if that is something you did not want to push (hence it is missing there) of if that is something you forgot to push (hence it is missing there but you would rather have pushed if you did not forget). So you must have some new mechanism to record and/or infer that distinction in mind, but it is not clear what it is from your message. So until that is clarified, there is not much more to say if your feature has any merit---as there is no way to tell what that feature exactly is, at least not yet ;-) snip You're right I didn't clarify, although I feel you're not providing the most welcome response to someone who isn't as familiar with the internals of Git as you are. It was an oversight on my part. What I was expecting is that it would behave exactly like branch pruning does, but that would require remote tracking tags, which we don't have. So, apparently my idea doesn't hold much water. The general problem I see in the day to day workflow with my team is that if tags exist locally and they push, those tags continuously get recreated on the remote repo even after I delete them remotely. So I can never truly delete tags until I go to each person and make sure the tool they're using isn't accidentally pushing tags. For example, SourceTree pushes all tags by default. Everyone on my team is new to Git, so they don't know to turn that off. Having git clean up tags automatically would really help with this, even though you may not feel it's the responsibility of Git. It's more of a usability issue, it's just prone to error. I can setup my config to prune tracking branches after I pull. Having something like this for tags would be wonderful. However, this requires a bigger overhaul than what I initially was proposing. -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Proposal for pruning tags
Robert Dailey rcdailey.li...@gmail.com writes: ... Having git clean up tags automatically would really help with this, even though you may not feel it's the responsibility of Git. It's more of a usability issue, I agree with Having ... help with this. I did not say at all that it is not something Git should and can try to help. I also agree with it is a usability issue. The thing is, the word automatically in your clean up tags automatically is still too loose a definition of what we want, and we cannot come up with a way to help users without tightening that looseness. As you said, you are looking for something that can tell between two kinds of tags that locally exist without having a copy at the 'origin': - ones that you do not want to keep - others that you haven't pushed to (or forgot to push to) 'origin' without giving the users a way to help Git to tell these two kinds apart and only remove the former. So... -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Proposal for pruning tags
I've never contributed to the Git project before. I'm a Windows user, so I use msysgit, but I'd be happy to install linux just so I can help implement this feature if everyone feels it would be useful. Right now AFAIK, there is no way to prune tags through Git. The way I currently do it is like so: $ git tag -l | xargs git tag -d $ git fetch --all This is not only wasteful, but dangerous. I might accidentally delete a local tag I haven't pushed yet. What would be great is if we had the following: git tag prune [remote|--all] The remote is needed in decentralized workflows (upstream vs origin). I'd also like to see an `--all` option in place of the remote, which means it will prune local tags from all remotes. I'm not sure if this command line structure will work, but it can be altered as necessary. Alternatively, this might also make sense on the remote command: git remote prune remote --tags Again I'm not an expert at the internals of Git, so I wanted to share my idea with the community first to see if this holds water or if there is already some built in way of doing this. Thanks for hearing out my idea! -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Proposal for pruning tags
Robert Dailey rcdailey.li...@gmail.com writes: I've never contributed to the Git project before. I'm a Windows user, so I use msysgit, but I'd be happy to install linux just so I can help implement this feature if everyone feels it would be useful. Right now AFAIK, there is no way to prune tags through Git. The way I currently do it is like so: $ git tag -l | xargs git tag -d $ git fetch --all I think you need to explain what you mean by prune a lot better than what you are doing in your message to be understood by others. After seeing the above two commands, my *guess* of what you want to do is to remove any of your local tag that is *not* present in the repository you usually fetch from (aka origin), but that directly contradicts with what you said you wish, i.e. This is not only wasteful, but dangerous. I might accidentally delete a local tag I haven't pushed yet... which only shows that your definition of prune is different from remove what I do not have at 'origin'. But it does not say *how* that is different. How should prune behave differently from the two commands above? How does your prune decide a tag needs to be removed locally when it is not at your origin [*1*]? There is *nothing* in git that lets you look at a local tag that is missing from the other side and determine if that is something you did not want to push (hence it is missing there) of if that is something you forgot to push (hence it is missing there but you would rather have pushed if you did not forget). So you must have some new mechanism to record and/or infer that distinction in mind, but it is not clear what it is from your message. So until that is clarified, there is not much more to say if your feature has any merit---as there is no way to tell what that feature exactly is, at least not yet ;-) [Footnote] *1* By the way, removing and then refetching would be a silly way to do this kind of thing anyway. After removing but before you have a chance to fetch, your ISP may severe your network connection and then what happens? Whatever your definition of prune is, I would think it would be built around ls-remote --tags output, to see what tags the other repository (or other repositories, by looping over the remotes you interact with) have, compare that set with the tags you locally have in order to decide which subset of tags you locally have to remove. -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC][GSOC] Proposal Draft for GSoC, Suggest Changes
Hello, Now that i have already submitted my proposal to GSOC , i was wondering if there is any way where i could contribute to git via bug fixes or something similar to the microprojects which was available prior to GSOC application. Also wondering if any clarification was needed as per my proposal. Would be great to hear from you all . Thanks - Karthik -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC] [GSoC] Draft of Proposal for GSoC
Parts of v2, once again, i'd love some more comments on what I've rewritten On Fri, Mar 21, 2014 at 1:42 AM, Jeff King p...@peff.net wrote: On Thu, Mar 20, 2014 at 02:15:29PM -0400, Brian Bourn wrote: Going through the annals of the listserve thus far I've found a few discussions which provide some insight towards this process as well as some experimental patches that never seem to have made it through[1][2][3][4] Reading the past work in this area is a good way to get familiar with it. It looks like most of the features discussed in the threads you link have been implemented. The one exception seems to be negative patterns. I think that would be a good feature to build on top of the unified implementation, once all three commands are using it. I would start by beginning a deprecation plan for git branch -l very similar to the one Junio presents in [5], moving -create-reflog to -g, That makes sense. I hadn't really considered -l as another point of inconsistency between the commands, but it definitely is. Following this I would begin the real work of the project which would involve moving the following flag operations into a standard library say 'list-options.h' --contains [6] --merged [7] --no-merged[8] --format This Library would build these options for later interpretation by parse_options Can you sketch out what the API would look like for this unified library? What calls would the 3 programs need to make into it? Something like this? Sample api calls Add_Opt_Group() Parse_with_contains() Parse_with_merged() Parse_with_no_merged() Parse_with_formatting() (each of the 4 calls above may have internal calls within the library in order to parse the option for each of the different function which may call these functions) For the most part I haven't finalized my weekly schedule but a basic breakdown would be Can you go into more detail here? Remember that writing code is only one part of the project. You'll need to be submitting your work, getting review and feedback, and iterating on it. One problem that students have is queuing up a large amount of work to send to the list. Then they twiddle their thumbs waiting for review to come back (which takes a long time, because they just dumped a large body of work on the reviewers). If you want to make effective use of your time, it helps to try to break tasks down into smaller chunks, and think about the dependencies between the chunks. When one chunk is in review, you can be designing and coding on another. This one I can absolutely understand, I tried to break this part down into very managable parts and give myself a little time at the end of each coding period to clean up each previous section. this slop time also allows for me to hopefully add some of the extra features that have been thought of. I'm thinking something like this makes it a little better, Weekly Schedule Start-Midterm Week 1- Begin deprecation of -l in git branch/establish exactly how long each stage of the deprecation should take. Spend some time reading *.c files even deeper while getting to know any current patches occurring in any area near my work files. Lastly, this week will be spent going through the Mailing-list finding previous work done in this area and any other experimental patches Week 2- Move Opt_Group callbacks for the functions into Library Week 3-Make a Contains Function in the library which will work for all three functions Week 4-Add Merge function in library Week 5-Add a No Merge function in library Weeks 7-8 spend time polishing the library and cleaning up the patches for final submission of library to the project Deliverables for midterm- Library finished pending polish and acceptance into the git repository Midterm Week 9- refactor all files to use the contains flag from the file. Week 10- use Merge from library in all relevant files Week 11-use no-merge from library in all relevant files Week 11-12- implement the format flags in all relevant files (this will be slightly harder as I think this might involve calling for-each-ref in the code for tag and branch. Ultimately there is a chance that part of the code for doing for-each-ref will end up in this library as well), additionally add in the code for formatting the relevant opt_Groups into the necessary files. Week 13-14 Polish patches via mailing-list and clean up all the refactoring of the files that has occurred.(optionally, add more formatting changes such as negative patterns and numbering each output into the library). Deliverables for Final- working library hopefully added into the code, and all of the relevant patches for using the library mostly polished and, minimally, pending peer review for submission into the code base. I do wonder if this plan might be a little on the conservative side, if anything, I think this could take a slightly shorter time than planned, but In that case I can always work on other additions to format.
Re: [RFC] [GSoC] Draft of Proposal for GSoC
Brian Bourn ba.bo...@gmail.com writes: Something like this? Sample api calls Add_Opt_Group() Parse_with_contains() Parse_with_merged() Parse_with_no_merged() Parse_with_formatting() (each of the 4 calls above may have internal calls within the library in order to parse the option for each of the different function which may call these functions) This list is a bit too sketchy to be called sample api calls, at least to me. Can you elaborate a bit more? What do they do, what does the caller expect to see (do they get something as return values? do they expect some side effects?)? -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC] [GSoC] Draft of Proposal for GSoC
On Fri, Mar 21, 2014 at 1:45 PM, Junio C Hamano gits...@pobox.com wrote: Brian Bourn ba.bo...@gmail.com writes: Something like this? Sample api calls Add_Opt_Group() Parse_with_contains() Parse_with_merged() Parse_with_no_merged() Parse_with_formatting() (each of the 4 calls above may have internal calls within the library in order to parse the option for each of the different function which may call these functions) This list is a bit too sketchy to be called sample api calls, at least to me. Can you elaborate a bit more? What do they do, what does the caller expect to see (do they get something as return values? do they expect some side effects?)? so something like this would be better I'm assuming? Some basic sample API calls are found below, each of these would hold code to complete parsing and/or formatting the flags. Add_Opt_Group() - returns an OPT_CALLBACK with contains, merged, no-merged, or formatting which can be used in a commands options list. Execute_list()-the main call into the library and would pass into the library all of the necessary flags and arguments for parsing the request and executing it. This would accept the flags like -contain, with arguments such as the commit or pattern that is being searched for. The next four commands would be called by execute_list() to execute the original command with respect to the flags that are passed into this library. Parse_with_contains() Parse_with_merged() Parse_with_no_merged() Parse_with_formatting() -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC] [GSoC] Draft of Proposal for GSoC
On Fri, Mar 21, 2014 at 02:03:41PM -0400, Brian Bourn wrote: What do they do, what does the caller expect to see (do they get something as return values? do they expect some side effects?)? so something like this would be better I'm assuming? Some basic sample API calls are found below, each of these would hold code to complete parsing and/or formatting the flags. Add_Opt_Group() - returns an OPT_CALLBACK with contains, merged, no-merged, or formatting which can be used in a commands options list. Execute_list()-the main call into the library and would pass into the library all of the necessary flags and arguments for parsing the request and executing it. This would accept the flags like -contain, with arguments such as the commit or pattern that is being searched for. The next four commands would be called by execute_list() to execute the original command with respect to the flags that are passed into this library. Parse_with_contains() Parse_with_merged() Parse_with_no_merged() Parse_with_formatting() Think about how the callers would use them. Will git-branch just call Parse_with_contains? If so, where would that call go? What arguments would it take, and what would it do? I don't think those calls are enough. We probably need: 1. Some structure to represent a list of refs and store its intermediate state. 2. Some mechanism for telling that structure about the various filters, sorters, and formatters we want to use (and this needs to be hooked into the option-parsing somehow). 3. Some mechanism for getting the listed refs out of that structure, formatting them, etc. -Peff -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC] [GSoC] Draft of Proposal for GSoC
On Fri, Mar 21, 2014 at 2:07 PM, Jeff King p...@peff.net wrote: On Fri, Mar 21, 2014 at 02:03:41PM -0400, Brian Bourn wrote: What do they do, what does the caller expect to see (do they get something as return values? do they expect some side effects?)? so something like this would be better I'm assuming? Some basic sample API calls are found below, each of these would hold code to complete parsing and/or formatting the flags. Add_Opt_Group() - returns an OPT_CALLBACK with contains, merged, no-merged, or formatting which can be used in a commands options list. Execute_list()-the main call into the library and would pass into the library all of the necessary flags and arguments for parsing the request and executing it. This would accept the flags like -contain, with arguments such as the commit or pattern that is being searched for. The next four commands would be called by execute_list() to execute the original command with respect to the flags that are passed into this library. Parse_with_contains() Parse_with_merged() Parse_with_no_merged() Parse_with_formatting() Think about how the callers would use them. Will git-branch just call Parse_with_contains? If so, where would that call go? What arguments would it take, and what would it do? I don't think those calls are enough. We probably need: 1. Some structure to represent a list of refs and store its intermediate state. 2. Some mechanism for telling that structure about the various filters, sorters, and formatters we want to use (and this needs to be hooked into the option-parsing somehow). 3. Some mechanism for getting the listed refs out of that structure, formatting them, etc. keeping some of my function calls to do the actual work I think I settled on this A possible API is given below, each of these would hold code to complete parsing and/or formatting the flags. There will be a struct in the library called refs_list() which when initialized will iterate through all the refs in a repository and add them to this list. there would be a function which would retrieve ref structs from that function. Get_ref_from_list()- which would return a single ref from the list. Add_Opt_Group() - returns an OPT_CALLBACK with contains, merged, no-merged, or formatting which can be used in a commands options list. Execute_list()-the main call into the library and would pass into the library all of the necessary flags and arguments for parsing the request and executing it. This would accept the flags like contain, with arguments such as the commit or pattern that is being searched for. This will then parse the refs_list using the four commands below to make, sort, filter, and format an output list which will then be printed or returned by this function. Any Call into the API from an outside source would call one of the previous two functions, all other commands in the API would be for internal use only, in order to simplify the process of calling into this library. The next four commands would be called by execute_list() to further format the refs_list with respect to the flags that are passed into this library. These would also take the additional arguments from execute_list() such as patterns to parse or which commit to filter out. these calls would modify the refs_list for eventual printing. Parse_list _with_contains() Parse_list_with_merged() Parse_list_with_no_merged() Format_list() of course this would still depend on deciding whether or not we want to return to the original command to print or if printing can be handled by the library itself. -Peff -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: GSoC proposal: port pack bitmap support to libgit2.
Hi, Sorry for this late reply, I was busy for past few days. On Fri, Mar 14, 2014 at 12:34 PM, Jeff King p...@peff.net wrote: On Wed, Mar 12, 2014 at 04:19:23PM +0800, Yuxuan Shui wrote: I'm Yuxuan Shui, a undergraduate student from China. I'm applying for GSoC 2014, and here is my proposal: I found this idea on the ideas page, and did some research about it. The pack bitmap patchset add a new .bitmap file for every pack file which contains the reachability information of selected commits. This information is used to speed up git fetching and cloning, and produce a very convincing results. The goal of my project is to port the pack bitmap implementation in core git to libgit2, so users of libgit2 could benefit from this optimization as well. Please let me know if my proposal makes sense, thanks. You'd want to flesh it out a bit more to show how you're thinking about tackling the problem: - What are the areas of libgit2 that you will need to touch? Be specific. What's the current state of the packing code? What files and functions will you need to touch? Firstly I will need to implement bitmap creation in libgit2's git_packbuilder_* functions (probably also git_odb_write_pack), so libgit2 could support bitmap creation. Then I will need to change git_revwalk_* functions to make them use bitmap. Since the operations that can benefit from bitmap is, if my understanding is correct, all using the git_revwalk_* functions, having bitmap support in revwalk functions should be enough. Files I need to touch probably are: revwalk.c pack-objects.c If I need to change the API of packbuilder or revwalk functions I will have to change the callers as well: push.c fetch.c and transport/smart_protocol.c I haven't read all the code to put together a list of functions I need to change, but I think the list will be long. - What are the challenges you expect to encounter in porting the code? The architecture differences between git and libgit2 will probably be a challenge. - Can you give a detailed schedule of the summer's work? What will you work on in each week? What milestones do you expect to hit, and when? I don't really have a plan, but I'll try to provide a rough schedule. I'll read the code and try to understand the code, to the point where I can start to add new code. This will probably take a week. For next three or four weeks I should be implementing bitmap creation in packbuilder. Then for the rest of time I will be optimizing revwalk using bitmap. -Peff -- Regards Yuxuan Shui -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[RFC][GSOC] Proposal Draft for GSoC, Suggest Changes
Hello, I have completed my microproject under the guidance of Eric, After going through the code and previous mailing lists. I have drafted my Proposal. Still going through the code as of now and figuring things out. Would be great to have your suggestions on my proposal, so that i can improve it before submitting it. Also have written the proposal in markdown for easier formatting. Doesn't look pretty on plain text. Thanks Karthik Git configuration API improvements Abstract Currently git_config() has a few issues which need to be addressed: Reads and parses the configuration files each time. Values cannot be unset,only can be set to false which has different implications. Repeated setting and un-setting of value in a particular new header, leaves trails. This project is to fix these problems while also retaining backward compatibility wherever git_config() is called, by implementing a cache for the config's in a tree data structure, which provides for easier modification. About Me Name : Karthik Nayak Email : karthik@gmail.com College : BMS Institute of Technology Studying : Engineering In Computer Science IRC : nayak94 Phone : 91--XXX-XXX Country : India Interests : Guitar, Photography, Craft. Github : KarthikNayak Technical Experience Have been Learning about the Linux Kernel and its implementation on the android platform. Released also on XDA-Dev for the phones LG P500 and Xperia SP. Working on a Library in C on various Sorting Techniques. Contributed to the Open-Source Lab Manual for Colleges under VTU. Active Member of Gnu/Linux Users Group in College and Free Software Movement of Karnataka. Why i Picked Git This is my first attempt at GSOC and as I began going through the list of organisations, what struck me is that I haven't really used any of the software's of most of the listed organisations. That's when I realized why not contribute to something I use on a daily basis, this way I wont be contributing only because I want to take part in GSOC, rather I'd contribute because I would love to be a part of something I use on a regular basis and would be able to contribute to the project even after GSOC. Proposal Ideas Page : Git configuration API improvements The Following improvements have to be made to how configs are handled in git : Read all the config files once and store them in an appropriate data structure. I suggest the use of an tree data structure to store the cache of the config files. I think tree data structure is a better choice over a hash - key data structure as a tree data structure although has a lower time efficiency than a hash - key data structure while traversing for a config request. A tree data structure can more optimal for further improvements like the problem with setting and unsetting of configs can be easily handled as when a node under a particular header is deleted the header can check if it has no children nodes and on being true can delete the header from the config file. Change git_config() to iterate through the pre-read values in memory rather than re-reading the configuration files. This function should remain backwards-compatible with the old implementation so that callers don't have to all be rewritten at once. Now whenever git_config() is called within a single invocation of git it can traverse the tree data structure already created and get the particular config. This needs to maintain backward compatibility. So the Basic functioning of functions like git_config() and so on would change the API should remain the same for the user invoking these calls. Add new API functions that allow the cache to be inquired easily and efficiently. Rewrite callers to use the new API wherever possible. Now that the base data structure and underlying changes have been made for the data structure to work have been made, we can now add various new API functions to assist the usage of the data structure. And also rewrite callers to use the new API's made available Issues to be addressed Headers and comments left are all configs under a header is deleted. whenever we set and unset configs under a particular header it leaves garbage value behind, for example : git config pull.rebase true git config --unset pull.rebase git config pull.rebase true git config --unset pull.rebase would result in : [pull] [pull] And further changes made appear under the last header. The issue also gives rise to comments being stranded within a header. Possible Solution : Make sure that the header is deleted whenever the last config under it is deleted. Also delete comments within a header and comments made above a particular config when a config is removed and comments made above a header when the whole header is being removed. How to invalidate the cache correctly in the case that the configuration is changed while git is executing. If config is being changed while git is currently running then the changes need to be considered. Possible Solution : A simple
[RFC] [GSoC] Draft of Proposal for GSoC
Hi all, This is a first draft of my Proposal for GSoC, I'd love feedback about what I might be missing and any other files I should read regarding this, so far I have read most of tag.c, branch.c, builtin/for-each-ref.c, parse-options.c. once again I hope I can get the same amount of helpful feedback as when I submitted my Microproject. My name is Brian Bourn, I'm currently a computer engineering student at Columbia university in the city of New York. I've used git since my freshman year however this past week has been my first time attempting to contribute to the project, and I loved it. I'd particularly like to tackle Unifying git branch -l, git tag -l, and git for-each-ref. This functionality seems like an important update to me as it will simplify usage of git throughout three different commands, a noble pursuit which is not contained in any other project. Going through the annals of the listserve thus far I've found a few discussions which provide some insight towards this process as well as some experimental patches that never seem to have made it through[1][2][3][4] I would start by beginning a deprecation plan for git branch -l very similar to the one Junio presents in [5], moving -create-reflog to -g, Following this I would begin the real work of the project which would involve moving the following flag operations into a standard library say 'list-options.h' --contains [6] --merged [7] --no-merged[8] --format This Library would build these options for later interpretation by parse_options Next I would implement these flags in the three files so that they are uniform and the same formatting and list capabilities can be used on all three. The formatting option will be especially useful for branch and tag as it will allow users to better understand what is in each ref that they grab. For the most part I haven't finalized my weekly schedule but a basic breakdown would be Start-Midterm Begin deprecation of -l Spend some time reading *.c files even deeper Build Library(dedicate Minimum one week per function moved) Midterm-finish Implement the list flags Implement the format flags (if time is left over, add some formatting) Additionally I am thinking about adding some more formatting tools such as numbering outputs. What do you all think of this? [1]http://git.661346.n2.nabble.com/More-formatting-with-git-tag-l-tt6739049.html [2]http://git.661346.n2.nabble.com/RFC-branch-list-branches-by-single-remote-tt6645679.html#a6725483 [3]http://git.661346.n2.nabble.com/RFC-PATCH-tag-make-list-exclude-lt-pattern-gt-tt7270451.html#a7338712 [4]http://git.661346.n2.nabble.com/RFC-branch-list-branches-by-single-remote-tt6645679.html#a6728878 [5]http://git.661346.n2.nabble.com/RFC-PATCH-0-2-RFC-POC-patterns-for-branch-list-tt6309233.html [6]https://github.com/git/git/blob/master/builtin/branch.c#L817 [7] https://github.com/git/git/blob/master/builtin/branch.c#L849 [8] https://github.com/git/git/blob/master/builtin/branch.c#L843 Regards, Brian Bourn -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC] [GSoC] Draft of Proposal for GSoC
Hello again, Please it would be very helpful for me to get some comments on this proposal I would be very grateful towards anyone who could take some time to look at it, even if it's just the wording. Regards, Brian Bourn On Thu, Mar 20, 2014 at 2:15 PM, Brian Bourn ba.bo...@gmail.com wrote: Hi all, This is a first draft of my Proposal for GSoC, I'd love feedback about what I might be missing and any other files I should read regarding this, so far I have read most of tag.c, branch.c, builtin/for-each-ref.c, parse-options.c. once again I hope I can get the same amount of helpful feedback as when I submitted my Microproject. My name is Brian Bourn, I'm currently a computer engineering student at Columbia university in the city of New York. I've used git since my freshman year however this past week has been my first time attempting to contribute to the project, and I loved it. I'd particularly like to tackle Unifying git branch -l, git tag -l, and git for-each-ref. This functionality seems like an important update to me as it will simplify usage of git throughout three different commands, a noble pursuit which is not contained in any other project. Going through the annals of the listserve thus far I've found a few discussions which provide some insight towards this process as well as some experimental patches that never seem to have made it through[1][2][3][4] I would start by beginning a deprecation plan for git branch -l very similar to the one Junio presents in [5], moving -create-reflog to -g, Following this I would begin the real work of the project which would involve moving the following flag operations into a standard library say 'list-options.h' --contains [6] --merged [7] --no-merged[8] --format This Library would build these options for later interpretation by parse_options Next I would implement these flags in the three files so that they are uniform and the same formatting and list capabilities can be used on all three. The formatting option will be especially useful for branch and tag as it will allow users to better understand what is in each ref that they grab. For the most part I haven't finalized my weekly schedule but a basic breakdown would be Start-Midterm Begin deprecation of -l Spend some time reading *.c files even deeper Build Library(dedicate Minimum one week per function moved) Midterm-finish Implement the list flags Implement the format flags (if time is left over, add some formatting) Additionally I am thinking about adding some more formatting tools such as numbering outputs. What do you all think of this? [1]http://git.661346.n2.nabble.com/More-formatting-with-git-tag-l-tt6739049.html [2]http://git.661346.n2.nabble.com/RFC-branch-list-branches-by-single-remote-tt6645679.html#a6725483 [3]http://git.661346.n2.nabble.com/RFC-PATCH-tag-make-list-exclude-lt-pattern-gt-tt7270451.html#a7338712 [4]http://git.661346.n2.nabble.com/RFC-branch-list-branches-by-single-remote-tt6645679.html#a6728878 [5]http://git.661346.n2.nabble.com/RFC-PATCH-0-2-RFC-POC-patterns-for-branch-list-tt6309233.html [6]https://github.com/git/git/blob/master/builtin/branch.c#L817 [7] https://github.com/git/git/blob/master/builtin/branch.c#L849 [8] https://github.com/git/git/blob/master/builtin/branch.c#L843 Regards, Brian Bourn -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC] [GSoC] Draft of Proposal for GSoC
On Thu, Mar 20, 2014 at 02:15:29PM -0400, Brian Bourn wrote: Going through the annals of the listserve thus far I've found a few discussions which provide some insight towards this process as well as some experimental patches that never seem to have made it through[1][2][3][4] Reading the past work in this area is a good way to get familiar with it. It looks like most of the features discussed in the threads you link have been implemented. The one exception seems to be negative patterns. I think that would be a good feature to build on top of the unified implementation, once all three commands are using it. I would start by beginning a deprecation plan for git branch -l very similar to the one Junio presents in [5], moving -create-reflog to -g, That makes sense. I hadn't really considered -l as another point of inconsistency between the commands, but it definitely is. Following this I would begin the real work of the project which would involve moving the following flag operations into a standard library say 'list-options.h' --contains [6] --merged [7] --no-merged[8] --format This Library would build these options for later interpretation by parse_options Can you sketch out what the API would look like for this unified library? What calls would the 3 programs need to make into it? For the most part I haven't finalized my weekly schedule but a basic breakdown would be Can you go into more detail here? Remember that writing code is only one part of the project. You'll need to be submitting your work, getting review and feedback, and iterating on it. One problem that students have is queuing up a large amount of work to send to the list. Then they twiddle their thumbs waiting for review to come back (which takes a long time, because they just dumped a large body of work on the reviewers). If you want to make effective use of your time, it helps to try to break tasks down into smaller chunks, and think about the dependencies between the chunks. When one chunk is in review, you can be designing and coding on another. Additionally I am thinking about adding some more formatting tools such as numbering outputs. What do you all think of this? Something like numbering might make sense as part of the formatting code (e.g., a new placeholder that expands to n for the nth line of output). I think that would be fairly straightforward, but again, it makes sense to me to unify the implementations first, and then we can build new features on top. -Peff -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[RFC][GSoC] Calling for comments regarding rough draft of proposal
Hi, I have already done the microproject, which has been merged into main last week. I have prepared a rough draft of my proposal for review, read all the previous mailing list threads about it.I am reading the codebase little by little. Please suggest improvements on the following topics, 1.I have read one-third of config.c and will complete reading it by tomorrow.Is there any other piece of code relevant to this proposal? 2.Other things I should add to the proposal that I have left off?I am getting confused what extra details I should add to the proposal. I will add the informal parts(my background, schedule for summer etc) of the proposal later. 3.Did I understand anything wrong or if my approach to solving problems is incorrect,if yes, I will redraft my proposal according to your suggestions. -- #GSoC Proposal : Git configuration API improvements --- #Proposed Improvements * Fix git config --unset to clean up detritus from sections that are left empty. * Read the configuration from files once and cache the results in an appropriate data structure in memory. * Change `git_config()` to iterate through the pre-read values in memory rather than re-reading the configuration files. * Add new API calls that allow the cache to be inquired easily and efficiently. Rewrite other functions like `git_config_int()` to be cache-aware. * Rewrite callers to use the new API wherever possible. * How to invalidate the cache correctly in the case that the configuration is changed while `git` is executing. #Future Improvements *Allow configuration values to be unset via a config file -- ##Changing the git_config api to retrieve values from memory Approach:- We parse the config file once, storing the raw values to records in memory. After the whole config has been read, iterate through the records, feeding the surviving values into the callback in the order they were originally read (minus deletions). Path to follow for the api conversion, 1. Convert the parser to read into an in-memory representation, but leave git_config() as a wrapper which iterates over it. 2. Add query functions like config_string_get() which will inquire cache for values efficiently. 3. Convert callbacks to query functions one by one. I propose two approaches for the format of the internal cache, 1.Using a hashmap to map keys to their values.This would bring as an advantage, constant time lookups for the values.The implementation will be similar to dict data structure in python, for example, section.subsection --mapped-to-- multi_value_string This approach loses the relative order of different config keys. 2.Another approach would be to actually represent the syntax tree of the config file in memory. That would make lookups of individual keys more expensive, but would enable other manipulation. E.g., if the syntax tree included nodes for comments and other non-semantic constructs, then we can use it for a complete rewrite. And git config becomes: 1. Read the tree. 2. Perform operations on the tree (add nodes, delete nodes, etc). 3. Write out the tree. and things like remove the section header when the last item in the section is removed become trivial during step 2. I still prefer the hashmap way of implementing the cache,as empty section headers are not so problematic(no processing pitfalls) and are sometimes annotated with comments which become redundant and confusing if the section header is removed.As for the aesthetic problem I propose a different solution for it below. -- ##Tidy configuration files When a configuration file is repeatedly modified, often garbage is left behind. For example, after git config pull.rebase true git config --unset pull.rebase git config pull.rebase true git config --unset pull.rebase the bottom of the configuration file is left with the useless lines [pull] [pull] Also,setting a config value, appends the key-value pair at the end of file without checking for empty main keys even if the main key(like [my]) is already present and empty.It works fine if the main key with an already present sub-key. for example:- git config pull.rebase true git config --unset pull.rebase git config pull.rebase true git config pull.option true gives [pull] [pull] rebase = true option = true Also, a possible detriment is presence of comments, For Example:- [my] # This section is for my own private settings Expected output: 1. When we delete the last key in a section, we should be able to delete the section header. 2. When we add a key into a section, we should be able to reuse
Re: [RFC][GSoC] Calling for comments regarding rough draft of proposal
tanay abhra tanay...@gmail.com writes: 2.Other things I should add to the proposal that I have left off?I am getting confused what extra details I should add to the proposal. I will add the informal parts(my background, schedule for summer etc) of the proposal later. I would not label the schedule and success criteria informal; without them how would one judge if the proposal has merits? Other things like your background and previous achievements would become relevant, after it is decided that the proposed project has merits, to see if you are a good fit to work on that project, so I agree with your message that it is sensible to defer them before the other parts of the proposal is ironed out. #Proposed Improvements * Fix git config --unset to clean up detritus from sections that are left empty. * Read the configuration from files once and cache the results in an appropriate data structure in memory. * Change `git_config()` to iterate through the pre-read values in memory rather than re-reading the configuration files. * Add new API calls that allow the cache to be inquired easily and efficiently. Rewrite other functions like `git_config_int()` to be cache-aware. I think we already had a discussion to point out git_config_int() is not a good example for this bullet point (check the list archive). The approach seciton seems to use a more sensible example (point 2). * Rewrite callers to use the new API wherever possible. * How to invalidate the cache correctly in the case that the configuration is changed while `git` is executing. I wouldn't list this as an item of list of improvements. It is merely a point you have to be careful about because you are doing other improvements based on read all into memory first and do not re-read files approach, no? In the current code, when somebody does git_config_set() and then later uses git_config() to grab the value of the variable set with the first call, we will read the value written to the file with the first call. With the proposed change, if you parse from the file upfront, callers to git_config_set() will need to somehow invalidate that stale copy in memory, either updating only the changed part (harder) or just discarding the cache (easy). ##Changing the git_config api to retrieve values from memory Approach:- We parse the config file once, storing the raw values to records in memory. After the whole config has been read, iterate through the records, feeding the surviving values into the callback in the order they were originally read (minus deletions). Path to follow for the api conversion, 1. Convert the parser to read into an in-memory representation, but leave git_config() as a wrapper which iterates over it. 2. Add query functions like config_string_get() which will inquire cache for values efficiently. 3. Convert callbacks to query functions one by one. I propose two approaches for the format of the internal cache, 1.Using a hashmap to map keys to their values.This would bring as an advantage, constant time lookups for the values.The implementation will be similar to dict data structure in python, for example, section.subsection --mapped-to-- multi_value_string I have no idea what you wanted to illustrate with that example at all. This approach loses the relative order of different config keys. As long as it keeps the order of multi-value elements, it should not be a problem. 2.Another approach would be to actually represent the syntax tree of the config file in memory. That would make lookups of individual keys more expensive, but would enable other manipulation. E.g., if the syntax tree included nodes for comments and other non-semantic constructs, then we can use it for a complete rewrite. for a complete rewrite of what? And git config becomes: 1. Read the tree. 2. Perform operations on the tree (add nodes, delete nodes, etc). 3. Write out the tree. and things like remove the section header when the last item in the section is removed become trivial during step 2. Are you saying you will try both approaches during the summer? You should be able to look-up quickly *and* to preserve order at the same time within one approach, by either annotating the tree with a hash, or the other way around to annotate the hash with each node remembering where in the original file it came from (which you will need to keep in order to report errors anyway). -- ##Tidy configuration files When a configuration file is repeatedly modified, often garbage is left behind. For example, after git config pull.rebase true git config --unset pull.rebase git config pull.rebase true git config --unset pull.rebase the bottom of the configuration file is left with the useless lines [pull] [pull] Also,setting a config value, appends the key-value pair at the end
[GSoC] Choosing a Project Proposal
Hi all, I'm Currently trying to decide on a project to work on in for Google Summer of Code, I'm stuck choosing between three which I find really interesting and I was wondering if any of them are particularly more pressing then the others. I would also love some comments on each of these three if possible expanding on them. the three projects I'm considering are, 1. Unifying git branch -l, git tag -l, and git for-each-ref 2. Refactor tempfile handling 3. Improve triangular workflow support Once again, I would appreciate all feedback on which of these are most important. Thanks for the Help, Brian Bourn -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: GSoC proposal: port pack bitmap support to libgit2.
Hi, On Wed, Mar 12, 2014 at 4:19 PM, Yuxuan Shui yshu...@gmail.com wrote: Hi, I'm Yuxuan Shui, a undergraduate student from China. I'm applying for GSoC 2014, and here is my proposal: I found this idea on the ideas page, and did some research about it. The pack bitmap patchset add a new .bitmap file for every pack file which contains the reachability information of selected commits. This information is used to speed up git fetching and cloning, and produce a very convincing results. The goal of my project is to port the pack bitmap implementation in core git to libgit2, so users of libgit2 could benefit from this optimization as well. Please let me know if my proposal makes sense, thanks. P.S. I've submitted by microproject patch[1], but haven't received any response yet. [1]: http://thread.gmane.org/gmane.comp.version-control.git/243854 -- Regards Yuxuan Shui Could anyone please review my proposal a little bit? Is this project helpful and worth doing? Did I get anything wrong in my proposal? Thanks. -- Regards Yuxuan Shui -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Proposal: Write git subtree info to .git/config
Has there been any talk about adding a stub for git subtrees in .git/config? The primary benefits would be: 1. Determine what sub directories of the project were at one time pulled from another repo (where from and which commit id), without having to attempt to infer this by scanning the log. 2. Simplify command syntax by providing a predictable default (ie. last pulled from, last pushed to), and not requiring the repo argument optional. 3. Improvement for default commit id to start split operations over using --rejoin which creates blank log entries just so the log scan can find it (afaict). It's a default either way, so it can still always be explicitly specified. If this information were available in the config, I think additional features could be added as well: - The command 'git subtree pull' for instance could be made to pull *all* subtrees, similar to the way 'git submodule update' works. - An option -i (interactive), or -p (prompt), etc. could be added that confirms the defaults read from the config before actually executing the command with implicit arguments, and the ability to modify the arguments before the command actually executes. - If the current working directory from which the command is run happens to be a subtree specified in the config, the --prefix could even be implied. None of these ideas would break the way the command currently works since it can still always take explicit arguments. There's a comment in the documentation about the command that says: Unlike submodules, subtrees do not need any special constructions (like .gitmodule files or gitlinks) be present in your repository It would still be true that subtrees do not *need* any special config settings, but that doesn't mean they are bad, and by having them the command could be improved and made easier to use. I'm happy to contribute the changes myself if this proposal is acceptable. -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Proposal: Write git subtree info to .git/config
John Butterfield johnb...@gmail.com writes: Has there been any talk about adding a stub for git subtrees in .git/config? I do not think so, and that is probably for a good reason. A subtree biding can change over time, but .git/config is about recording information that do not change depending on what tree you are looking at, so there is an impedance mismatch---storing that information in .git/config is probably a wrong way to go about it. It might help to keep track of In this tree, the tip of that other history is bound as a subtree at this path, which means that information more naturally belongs to each tree, I would think. -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Proposal: Write git subtree info to .git/config
A subtree biding can change over time, but .git/config is about recording information that do not change depending on what tree you are looking at, so there is an impedance mismatch---storing that information in .git/config is probably a wrong way to go about it. I see. How about a .gitsubtrees config file in the root of a project? It might help to keep track of In this tree, the tip of that other history is bound as a subtree at this path, which means that information more naturally belongs to each tree, I would think. Anything in the subdirectory must be part of the contents of the subtree repo. It should not know how it is linked to it's parent project; parents should know how their children are fetched. Therefore it cannot live in the subtree. Subtrees could be nested. So, should the config be in the root of the parent subtree? This makes sense to me. Example: / A/ B/# a subtree of (blah) X/ Y/ # a subtree of (yada-yada) Z/ So, lets say B has many updates remotely, including pushing and pulling changes to Y. When pulling the changes from B, it would be convenient for it to come with the meta data, (subtree repo and commit info) for Y. So how does that sound; Could we store subtree repo and commit id references per folder in a .gitsubtrees file in the root of every project? (Project B is technically it's own project so it would pull it's own .gitsubtrees in /B/.gitsubtrees) `John On Thu, Mar 13, 2014 at 4:36 PM, Junio C Hamano gits...@pobox.com wrote: John Butterfield johnb...@gmail.com writes: Has there been any talk about adding a stub for git subtrees in .git/config? I do not think so, and that is probably for a good reason. A subtree biding can change over time, but .git/config is about recording information that do not change depending on what tree you are looking at, so there is an impedance mismatch---storing that information in .git/config is probably a wrong way to go about it. It might help to keep track of In this tree, the tip of that other history is bound as a subtree at this path, which means that information more naturally belongs to each tree, I would think. -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Proposal: Write git subtree info to .git/config
by per folder I meant, for each subtree On Thu, Mar 13, 2014 at 5:43 PM, John Butterfield johnb...@gmail.com wrote: A subtree biding can change over time, but .git/config is about recording information that do not change depending on what tree you are looking at, so there is an impedance mismatch---storing that information in .git/config is probably a wrong way to go about it. I see. How about a .gitsubtrees config file in the root of a project? It might help to keep track of In this tree, the tip of that other history is bound as a subtree at this path, which means that information more naturally belongs to each tree, I would think. Anything in the subdirectory must be part of the contents of the subtree repo. It should not know how it is linked to it's parent project; parents should know how their children are fetched. Therefore it cannot live in the subtree. Subtrees could be nested. So, should the config be in the root of the parent subtree? This makes sense to me. Example: / A/ B/# a subtree of (blah) X/ Y/ # a subtree of (yada-yada) Z/ So, lets say B has many updates remotely, including pushing and pulling changes to Y. When pulling the changes from B, it would be convenient for it to come with the meta data, (subtree repo and commit info) for Y. So how does that sound; Could we store subtree repo and commit id references per folder in a .gitsubtrees file in the root of every project? (Project B is technically it's own project so it would pull it's own .gitsubtrees in /B/.gitsubtrees) `John On Thu, Mar 13, 2014 at 4:36 PM, Junio C Hamano gits...@pobox.com wrote: John Butterfield johnb...@gmail.com writes: Has there been any talk about adding a stub for git subtrees in .git/config? I do not think so, and that is probably for a good reason. A subtree biding can change over time, but .git/config is about recording information that do not change depending on what tree you are looking at, so there is an impedance mismatch---storing that information in .git/config is probably a wrong way to go about it. It might help to keep track of In this tree, the tip of that other history is bound as a subtree at this path, which means that information more naturally belongs to each tree, I would think. -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: GSoC proposal: port pack bitmap support to libgit2.
On Wed, Mar 12, 2014 at 04:19:23PM +0800, Yuxuan Shui wrote: I'm Yuxuan Shui, a undergraduate student from China. I'm applying for GSoC 2014, and here is my proposal: I found this idea on the ideas page, and did some research about it. The pack bitmap patchset add a new .bitmap file for every pack file which contains the reachability information of selected commits. This information is used to speed up git fetching and cloning, and produce a very convincing results. The goal of my project is to port the pack bitmap implementation in core git to libgit2, so users of libgit2 could benefit from this optimization as well. Please let me know if my proposal makes sense, thanks. You'd want to flesh it out a bit more to show how you're thinking about tackling the problem: - What are the areas of libgit2 that you will need to touch? Be specific. What's the current state of the packing code? What files and functions will you need to touch? - What are the challenges you expect to encounter in porting the code? - Can you give a detailed schedule of the summer's work? What will you work on in each week? What milestones do you expect to hit, and when? -Peff -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
GSoC proposal: port pack bitmap support to libgit2.
Hi, I'm Yuxuan Shui, a undergraduate student from China. I'm applying for GSoC 2014, and here is my proposal: I found this idea on the ideas page, and did some research about it. The pack bitmap patchset add a new .bitmap file for every pack file which contains the reachability information of selected commits. This information is used to speed up git fetching and cloning, and produce a very convincing results. The goal of my project is to port the pack bitmap implementation in core git to libgit2, so users of libgit2 could benefit from this optimization as well. Please let me know if my proposal makes sense, thanks. P.S. I've submitted by microproject patch[1], but haven't received any response yet. [1]: http://thread.gmane.org/gmane.comp.version-control.git/243854 -- Regards Yuxuan Shui -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [GSoC14][RFC] Proposal Draft: Refactor tempfile handling
Currently the linked list of lockfiles only grows, never shrinks. Once an object has been linked into the list, there is no way to remove it again even after the lock has been released. So if a lock needs to be created dynamically at a random place in the code, its memory is unavoidably leaked. Ah yes, I see. I think a good example is config.git_config_set_multivar_in_file, which even contains a comment detailing the problem: Since lockfile.c keeps a linked list of all created lock_file structures, it isn't safe to free(lock). It's better to just leave it hanging around. But I have a feeling that if we want to use a similar mechanism to handle all temporary files (of which there can be more), then it would be a good idea to lift this limitation. It will require some care, though, to make sure that record removal is done in a way that is threadsafe and safe in the event of all expected kinds of process death. It sounds like a threadsafe linked-list with an interface to manually remove elements from the list is the solution here; does that sound reasonable? Ensuring thread safety without sacrificing readability is probably more difficult than it sounds, but I don't think it's impossible. I'll add some more details on this to my proposal[1]. Thank you! - Brian Gesiak [1] https://www.google-melange.com/gsoc/proposal/review/student/google/gsoc2014/modocache/5629499534213120 -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [GSoC14][RFC] Proposal Draft: Refactor tempfile handling
On 03/01/2014 10:04 PM, Brian Gesiak wrote: Hello all, My name is Brian Gesiak. I'm a research student at the University of Tokyo, and I'm hoping to participate in this year's Google Summer of Code by contributing to Git. I'm a longtime user, first-time contributor--some of you may have noticed my microproject patches.[1][2] I'd like to gather some information on one of the GSoC ideas posted on the ideas page. Namely, I'm interested in refactoring the way tempfiles are cleaned up. The ideas page points out that while lock files are closed and unlinked[3] when the program exits[4], object pack files implement their own brand of temp file creation and deletion. This implementation doesn't share the same guarantees as lock files--it is possible that the program terminates before the temp file is unlinked.[5] Lock file references are stored in a linked list. When the program exits, this list is traversed and each file is closed and unlinked. It seems to me that this mechanism is appropriate for temp files in general, not just lock files. Thus, my proposal would be to extract this logic into a separate module--tempfile.h, perhaps. Lock and object files would share the tempfile implementation. That is, both object and lock temp files would be stored in a linked list, and all of these would be deleted at program exit. I'm very enthused about this project--I think it has it all: - Tangible benefits for the end-user - Reduced complexity in the codebase - Ambitious enough to be interesting - Small enough to realistically be completed in a summer Please let me know if this seems like it would make for an interesting proposal, or if perhaps there is something I am overlooking. Any feedback at all would be appreciated. Thank you! Hi Brian, Thanks for your proposal. I have a technical point that I think your proposal should address: Currently the linked list of lockfiles only grows, never shrinks. Once an object has been linked into the list, there is no way to remove it again even after the lock has been released. So if a lock needs to be created dynamically at a random place in the code, its memory is unavoidably leaked. This hasn't been much of a problem in the past because (1) the number of locks acquired/released during a Git invocation is reasonable, and (2) a lock object (even if it is already in the list) can be reused after the lock has been released. So there are many lock callsites that define one static lock instance and use it over and over again. But I have a feeling that if we want to use a similar mechanism to handle all temporary files (of which there can be more), then it would be a good idea to lift this limitation. It will require some care, though, to make sure that record removal is done in a way that is threadsafe and safe in the event of all expected kinds of process death. Michael -- Michael Haggerty mhag...@alum.mit.edu http://softwareswirl.blogspot.com/ -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [GSoC14][RFC] Proposal Draft: Refactor tempfile handling
On Tue, Mar 11, 2014 at 05:27:05PM +0100, Michael Haggerty wrote: Thanks for your proposal. I have a technical point that I think your proposal should address: Currently the linked list of lockfiles only grows, never shrinks. Once an object has been linked into the list, there is no way to remove it again even after the lock has been released. So if a lock needs to be created dynamically at a random place in the code, its memory is unavoidably leaked. Thanks, I remember thinking about this when I originally conceived of the idea, but I forgot to mention it in the idea writeup. In most cases the potential leaks are finite and small, but object creation and diff tempfiles could both be unbounded. So this is definitely something to consider. In both cases we have a bounded number of _simultaneous_ tempfiles, so one strategy could be to continue using static objects. But it should not be hard to do it dynamically, and I suspect the resulting API will be a lot easier to comprehend. -Peff -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [GSoC14][RFC] Proposal Draft: Refactor tempfile handling
On Sun, Mar 09, 2014 at 02:04:16AM +0900, Brian Gesiak wrote: Once the logic is extracted into a nice API, there are several other places that can use it, too: ... I've found the following four areas so far: 1. lockfile.lock_file 2. git-compat-util.odb_mkstemp 3. git-compat-util.odb_pack_keep 4. diff.prepare_temp_file Tons of files use (1) and (2). (3) is less common, and (4) is only used for external diffs. Yeah, I would expect (1) and (2) to be the most frequent. (3) gets written on every push and fetch, but only for a short period. (4) is also used for diff's textconv, though like external diffs, they are relatively rare. In my experience, most of the cruft that gets left is from (2), since a push or fetch will spool to a tmpfile, then verify the results via git index-pack. Any failure there leaves the file in place. There are a few other potential candidates we can find by grepping for mkstemp. Not all of those might want cleanup, but it's a starting point for investigation. the shallow_XX tempfiles I'm not sure I was able to find this one. Are you referring to the lock files used when fetching, such as in fetch-pack.c? I mean the xmkstemp from setup_temporary_shallow in shallow.c. I'd say the biggest difference between lockfiles and object files is that tempfile methods like odb_mkstemp need to know the location of the object directory. Aside from that, lockfiles and the external diff files appear to be cleaned up at exit, while temporary object files tend to have a more finely controlled lifecycle. I'm still investigating this aspect of the proposal, though. The diff tempfiles are true tempfiles; they always go away in the end (though of course we want to clean them up as we finish with them, rather than doing it all at the end). Lockfiles may get committed into place (i.e., via atomic rename) or rolled back (deleted). Object files should generally be hard-linked into place, but there is some extra magic in move_temp_to_file to fallback to renames. Some of that we may be able to get rid of (e.g., we try to avoid doing cross-directory renames at all these days, so the comment there may be out of date). One question, though: the idea on the ideas page specifies that temporary pack and object files may optionally be cleaned up in case of error during program execution. How will users specify their preference? I think the API for creating temporary files should allow cleanup options to be specified on a per-file basis. That way each part of the program that creates tempfiles can specify a different config value to determine the cleanup policy. That probably makes sense. I certainly had a config option in mind. I mentioned above that the most common cruft is leftover packfiles from pushes and fetches. We haven't deleted those historically because the same person often controls both the client and the server, and they would want to possibly do forensics on the packfile sent to the remote, or even rescue objects out of it. But the remote end may simply have rejected the pack by some policy, and has no interest in forensics. Having a config option for each type of file may be cool, but I don't know how useful it would be in practice. Still, it's certainly worth thinking about and looking into. -Peff -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [GSoC14][RFC] Proposal Draft: Refactor tempfile handling
Excellent, thank you very much for the feedback, Jeff! It was very helpful and encouraging. I've done some more research based on your comments. Once the logic is extracted into a nice API, there are several other places that can use it, too: ... I've found the following four areas so far: 1. lockfile.lock_file 2. git-compat-util.odb_mkstemp 3. git-compat-util.odb_pack_keep 4. diff.prepare_temp_file Tons of files use (1) and (2). (3) is less common, and (4) is only used for external diffs. the shallow_XX tempfiles I'm not sure I was able to find this one. Are you referring to the lock files used when fetching, such as in fetch-pack.c? What are the mismatches in how lockfiles and object files are handled? E.g., how do we finalize them into place? How should the API be designed to minimize race conditions (e.g., if we get a signal delivered while we are committing or cleaning up a file)? I'd say the biggest difference between lockfiles and object files is that tempfile methods like odb_mkstemp need to know the location of the object directory. Aside from that, lockfiles and the external diff files appear to be cleaned up at exit, while temporary object files tend to have a more finely controlled lifecycle. I'm still investigating this aspect of the proposal, though. One question, though: the idea on the ideas page specifies that temporary pack and object files may optionally be cleaned up in case of error during program execution. How will users specify their preference? I think the API for creating temporary files should allow cleanup options to be specified on a per-file basis. That way each part of the program that creates tempfiles can specify a different config value to determine the cleanup policy. Thanks for all your help so far! - Brian Gesiak PS: I'm maintaining a working draft of my proposal here, in case anyone wants to offer any feedback prior to its submission: https://gist.github.com/modocache/9434914 On Tue, Mar 4, 2014 at 7:42 AM, Jeff King p...@peff.net wrote: On Sun, Mar 02, 2014 at 06:04:39AM +0900, Brian Gesiak wrote: My name is Brian Gesiak. I'm a research student at the University of Tokyo, and I'm hoping to participate in this year's Google Summer of Code by contributing to Git. I'm a longtime user, first-time contributor--some of you may have noticed my microproject patches.[1][2] Yes, we did notice them. Thanks, and welcome. :) The ideas page points out that while lock files are closed and unlinked[3] when the program exits[4], object pack files implement their own brand of temp file creation and deletion. This implementation doesn't share the same guarantees as lock files--it is possible that the program terminates before the temp file is unlinked.[5] Lock file references are stored in a linked list. When the program exits, this list is traversed and each file is closed and unlinked. It seems to me that this mechanism is appropriate for temp files in general, not just lock files. Thus, my proposal would be to extract this logic into a separate module--tempfile.h, perhaps. Lock and object files would share the tempfile implementation. That is, both object and lock temp files would be stored in a linked list, and all of these would be deleted at program exit. Yes, I think this is definitely the right way to go. We should be able to unify the tempfile handling for all of git. Once the logic is extracted into a nice API, there are several other places that can use it, too: - the external diff code creates tempfiles and uses its own cleanup routines - the shallow_XX tempfiles (these are not cleaned right now, though I sent a patch recently for them to do their own cleanup) Those are just off the top of my head. There may be other spots, too. It is worth thinking in your proposal about some of the things that the API will want to handle. What are the mismatches in how lockfiles and object files are handled? E.g., how do we finalize them into place? How should the API be designed to minimize race conditions (e.g., if we get a signal delivered while we are committing or cleaning up a file)? Please let me know if this seems like it would make for an interesting proposal, or if perhaps there is something I am overlooking. Any feedback at all would be appreciated. Thank you! You definitely have a grasp of what the project is aiming for, and which areas need to be touched. -Peff -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [GSoC14][RFC] Proposal Draft: Refactor tempfile handling
On Sun, Mar 02, 2014 at 06:04:39AM +0900, Brian Gesiak wrote: My name is Brian Gesiak. I'm a research student at the University of Tokyo, and I'm hoping to participate in this year's Google Summer of Code by contributing to Git. I'm a longtime user, first-time contributor--some of you may have noticed my microproject patches.[1][2] Yes, we did notice them. Thanks, and welcome. :) The ideas page points out that while lock files are closed and unlinked[3] when the program exits[4], object pack files implement their own brand of temp file creation and deletion. This implementation doesn't share the same guarantees as lock files--it is possible that the program terminates before the temp file is unlinked.[5] Lock file references are stored in a linked list. When the program exits, this list is traversed and each file is closed and unlinked. It seems to me that this mechanism is appropriate for temp files in general, not just lock files. Thus, my proposal would be to extract this logic into a separate module--tempfile.h, perhaps. Lock and object files would share the tempfile implementation. That is, both object and lock temp files would be stored in a linked list, and all of these would be deleted at program exit. Yes, I think this is definitely the right way to go. We should be able to unify the tempfile handling for all of git. Once the logic is extracted into a nice API, there are several other places that can use it, too: - the external diff code creates tempfiles and uses its own cleanup routines - the shallow_XX tempfiles (these are not cleaned right now, though I sent a patch recently for them to do their own cleanup) Those are just off the top of my head. There may be other spots, too. It is worth thinking in your proposal about some of the things that the API will want to handle. What are the mismatches in how lockfiles and object files are handled? E.g., how do we finalize them into place? How should the API be designed to minimize race conditions (e.g., if we get a signal delivered while we are committing or cleaning up a file)? Please let me know if this seems like it would make for an interesting proposal, or if perhaps there is something I am overlooking. Any feedback at all would be appreciated. Thank you! You definitely have a grasp of what the project is aiming for, and which areas need to be touched. -Peff -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[GSoC14][RFC] Proposal Draft: Refactor tempfile handling
Hello all, My name is Brian Gesiak. I'm a research student at the University of Tokyo, and I'm hoping to participate in this year's Google Summer of Code by contributing to Git. I'm a longtime user, first-time contributor--some of you may have noticed my microproject patches.[1][2] I'd like to gather some information on one of the GSoC ideas posted on the ideas page. Namely, I'm interested in refactoring the way tempfiles are cleaned up. The ideas page points out that while lock files are closed and unlinked[3] when the program exits[4], object pack files implement their own brand of temp file creation and deletion. This implementation doesn't share the same guarantees as lock files--it is possible that the program terminates before the temp file is unlinked.[5] Lock file references are stored in a linked list. When the program exits, this list is traversed and each file is closed and unlinked. It seems to me that this mechanism is appropriate for temp files in general, not just lock files. Thus, my proposal would be to extract this logic into a separate module--tempfile.h, perhaps. Lock and object files would share the tempfile implementation. That is, both object and lock temp files would be stored in a linked list, and all of these would be deleted at program exit. I'm very enthused about this project--I think it has it all: - Tangible benefits for the end-user - Reduced complexity in the codebase - Ambitious enough to be interesting - Small enough to realistically be completed in a summer Please let me know if this seems like it would make for an interesting proposal, or if perhaps there is something I am overlooking. Any feedback at all would be appreciated. Thank you! - Brian Gesiak [1] http://thread.gmane.org/gmane.comp.version-control.git/242891 [2] http://thread.gmane.org/gmane.comp.version-control.git/242893 [3] https://github.com/git/git/blob/v1.9.0/lockfile.c#L18 [4] https://github.com/git/git/blob/v1.9.0/lockfile.c#L143 [5] https://github.com/git/git/blob/v1.9.0/pack-write.c#L350 -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Business Proposal
I am Mr. Mr. Leung Wing Lok and I work with Hang Seng Bank, Hong Kong. I have a Business Proposal of $19,500,000.00 of mutual benefits. Contact me via leungwlok...@yahoo.com.vn for more info.-- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Proposal] Clonable scripts
Hi, On Tue, Sep 10, 2013 at 12:18 AM, Ramkumar Ramachandra artag...@gmail.com wrote: Niels Basjes wrote: As we all know the hooks ( in .git/hooks ) are not cloned along with the code of a project. Now this is a correct approach for the scripts that do stuff like emailing the people responsible for releases or submitting the commit to a CI system. More often than not, maintainers come with these hooks and they keep them private. Yes. Initially I wanted to propose introducing fully clonable (pre-commit) hook scripts. However I can imagine that a malicious opensource coder can create a github repo and try to hack the computer of a contributer via those scripts. So having such scripts is a 'bad idea'. I think it's a good idea, since the contributer can look through the scripts. What I meant to say is that having fully functional unrestricted scripts that are cloned is a bad idea. Having restricted cloned scripts to me is a goog idea (or atleast, that is what I propose here). 3) For the regular hooks this language is also support and when located in the (not cloned!) .git/hooks directory they are just as powerful as a normal script (i.e. can control CI, send emails, etc.). I'm confused now; how can .git/hooks be as powerful as .githooks? The former users should consider uploading their code on GitHub. The way I envisioned is is that the scripting language in .git/hooks is pick any language you like with the builtin language as a new addition. In the .githooks (which is under version control in the code base and cloned) is a the same builtin language, yet constrained in a sandbox. Which reminds me that we need to have GitTogethers. Thanks for this! You're welcome. -- Best regards / Met vriendelijke groeten, Niels Basjes -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Proposal] Clonable scripts
On 09/10/2013 02:18 AM, Niels Basjes wrote: As we all know the hooks ( in .git/hooks ) are not cloned along with the code of a project. Now this is a correct approach for the scripts that do stuff like emailing the people responsible for releases or submitting the commit to a CI system. For several other things it makes a lot of sense to give the developer immediate feedback. Things like the format of the commit message (i.e. it must start with an issue tracker id) or compliance with a coding standard. Initially I wanted to propose introducing fully clonable (pre-commit) hook scripts. However I can imagine that a malicious opensource coder can create a github repo and try to hack the computer of a contributer via those scripts. So having such scripts is a 'bad idea'. If those scripts were how ever written in a language that is build into the git program and the script are run in such a way that they can only interact with the files in the local git (and _nothing_ outside of that) this would be solved. Also have a builtin scripting language also means that this would run on all operating systems (yes, even Windows). So I propose the following new feature: 1) A scripting language is put inside git. Perhaps a version of python or ruby or go or ... (no need for a 'new' language) 2) If a project contains a folder called .githooks in the root of the code base then the rules/scripts that are present there are executed ONLY on the system doing the actual commit. These scripts are run in such a limited way that they can only read the files in the repository, they cannot do any networking/write to disk/etc and they can only do a limited set op actions against the current operation at hand (i.e. do checks, parse messages, etc). 3) For the regular hooks this language is also support and when located in the (not cloned!) .git/hooks directory they are just as powerful as a normal script (i.e. can control CI, send emails, etc.). Like I said, this is just a proposal and I would like to know what you guys think. I am not in favour of any idea like this. It will end in some sort of compromise (in both sense of the word!) It has to be voluntary, but we can make it easier. I suggest something like this: - some special directory can have normal hook files, but it's just a place holder. - each hook code file comes with some meta data at the top, say githook name, hook name, version, remote-name. I'll use these examples: pre-commit crlf-check 1.1 origin - on a clone/pull, if there is a change to any of these code files when compared to the previous HEAD, and if the program is running interactively, then you can ask and setup these hooks. The purpose of the remote name in the stored metadata is that we don't want to bother updating when we pull from some other repo, like when merging a feature branch. The purpose of the version number is so you can do some intelligent things, even silently upgrade under certain conditions. All we're doing is making things easier compared to what you can already do even now (which is completely manual and instructions based). I don't think anything more intrusive or forced is wise. And people who say it is OK, I'm going to seriously wonder if you work for the NSA (directly or indirectly). Sadly, that is not meant to be a joke question; such is life now. -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Proposal] Clonable scripts
On Mon, 09 Sep 2013 22:48:42 +, Niels Basjes wrote: ... However I can imagine that a malicious opensource coder can create a github repo and try to hack the computer of a contributer via those scripts. So having such scripts is a 'bad idea'. Given that half the repos out there are cloned to 'make install' in them...it's still a bad idea. If those scripts were how ever written in a language that is build into the git program and the script are run in such a way that they can only interact with the files in the local git (and _nothing_ outside of that) this would be solved. I still think this is a nightmare of maintenance. You'd need a restricted version of a language that doesn't allow access outside the repo (and no TCP either), and someone will always miss some module... Not that it wouldn't be cool, yet. ... Like I said, this is just a proposal and I would like to know what you guys think. I think there are generally two use cases: - Many people working on repos in an organization. Give them a wrapper script that does the clone (and also knows the clone URL already), that will set up hooks and configuration as needed. - github-style cooperation. Add a make hooks to your Makefile that sets up the hooks your project seems to want. After all, this is for the developers to pre-check what they will submit, so it is in their own interest to have (and cross-read) the hooks. Andreas -- Totally trivial. Famous last words. From: Linus Torvalds torvalds@*.org Date: Fri, 22 Jan 2010 07:29:21 -0800 -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[Proposal] Clonable scripts
Hi, As we all know the hooks ( in .git/hooks ) are not cloned along with the code of a project. Now this is a correct approach for the scripts that do stuff like emailing the people responsible for releases or submitting the commit to a CI system. For several other things it makes a lot of sense to give the developer immediate feedback. Things like the format of the commit message (i.e. it must start with an issue tracker id) or compliance with a coding standard. Initially I wanted to propose introducing fully clonable (pre-commit) hook scripts. However I can imagine that a malicious opensource coder can create a github repo and try to hack the computer of a contributer via those scripts. So having such scripts is a 'bad idea'. If those scripts were how ever written in a language that is build into the git program and the script are run in such a way that they can only interact with the files in the local git (and _nothing_ outside of that) this would be solved. Also have a builtin scripting language also means that this would run on all operating systems (yes, even Windows). So I propose the following new feature: 1) A scripting language is put inside git. Perhaps a version of python or ruby or go or ... (no need for a 'new' language) 2) If a project contains a folder called .githooks in the root of the code base then the rules/scripts that are present there are executed ONLY on the system doing the actual commit. These scripts are run in such a limited way that they can only read the files in the repository, they cannot do any networking/write to disk/etc and they can only do a limited set op actions against the current operation at hand (i.e. do checks, parse messages, etc). 3) For the regular hooks this language is also support and when located in the (not cloned!) .git/hooks directory they are just as powerful as a normal script (i.e. can control CI, send emails, etc.). Like I said, this is just a proposal and I would like to know what you guys think. -- Best regards / Met vriendelijke groeten, Niels Basjes -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Proposal] Clonable scripts
On 9 September 2013 13:48, Niels Basjes ni...@basjes.nl wrote: If those scripts were how ever written in a language that is build into the git program and the script are run in such a way that they can only interact with the files in the local git (and _nothing_ outside of that) this would be solved. That sounds interesting. Also have a builtin scripting language also means that this would run on all operating systems (yes, even Windows). This would be *very* helpful. It's a total pain trying to get hooks working across different OSes. So I propose the following new feature: 1) A scripting language is put inside git. Perhaps a version of python or ruby or go or ... (no need for a 'new' language) That sounds nice but ... 2) If a project contains a folder called .githooks in the root of the code base then the rules/scripts that are present there are executed ONLY on the system doing the actual commit. These scripts are run in such a limited way that they can only read the files in the repository, they cannot do any networking/write to disk/etc and they can only do a limited set op actions against the current operation at hand (i.e. do checks, parse messages, etc). ... how would you prevent Ruby/Python/Go/$GeneralProgLang from executing arbitrary code? Like I said, this is just a proposal and I would like to know what you guys think. I love the idea but I'm not sure how feasible it is. I think you would be forced to copy an existing language and somehow make it secure (seems like a maintenance nightmare) or to create your own language (potentially a lot of work). But perhaps something more declarative might be usable? -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Proposal] Clonable scripts
On Mon, Sep 9, 2013 at 11:13 PM, Hilco Wijbenga hilco.wijbe...@gmail.com wrote: On 9 September 2013 13:48, Niels Basjes ni...@basjes.nl wrote: So I propose the following new feature: 1) A scripting language is put inside git. Perhaps a version of python or ruby or go or ... (no need for a 'new' language) That sounds nice but ... 2) If a project contains a folder called .githooks in the root of the code base then the rules/scripts that are present there are executed ONLY on the system doing the actual commit. These scripts are run in such a limited way that they can only read the files in the repository, they cannot do any networking/write to disk/etc and they can only do a limited set op actions against the current operation at hand (i.e. do checks, parse messages, etc). ... how would you prevent Ruby/Python/Go/$GeneralProgLang from executing arbitrary code? Some kind of sandbox? Like I said, this is just a proposal and I would like to know what you guys think. I love the idea but I'm not sure how feasible it is. I think you would be forced to copy an existing language and somehow make it secure (seems like a maintenance nightmare) or to create your own language (potentially a lot of work). But perhaps something more declarative might be usable? As far as I'm concerned it should be the 'best suitable' language for the task at hand. -- Best regards / Met vriendelijke groeten, Niels Basjes -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Proposal] Clonable scripts
Niels Basjes wrote: As we all know the hooks ( in .git/hooks ) are not cloned along with the code of a project. Now this is a correct approach for the scripts that do stuff like emailing the people responsible for releases or submitting the commit to a CI system. More often than not, maintainers come with these hooks and they keep them private. For several other things it makes a lot of sense to give the developer immediate feedback. Things like the format of the commit message (i.e. it must start with an issue tracker id) or compliance with a coding standard. i.e. tracker ID. Compliance is simply a request. The developer must be able to pick it up from surrounding style. Initially I wanted to propose introducing fully clonable (pre-commit) hook scripts. However I can imagine that a malicious opensource coder can create a github repo and try to hack the computer of a contributer via those scripts. So having such scripts is a 'bad idea'. I think it's a good idea, since the contributer can look through the scripts. If those scripts were how ever written in a language that is build into the git program and the script are run in such a way that they can only interact with the files in the local git (and _nothing_ outside of that) this would be solved. GNU make. Also have a builtin scripting language also means that this would run on all operating systems (yes, even Windows). kbuild tends to get complicated. So I propose the following new feature: 1) A scripting language is put inside git. Perhaps a version of python or ruby or go or ... (no need for a 'new' language) make + go sounds like a good alternative. 2) If a project contains a folder called .githooks in the root of the code base then the rules/scripts that are present there are executed ONLY on the system doing the actual commit. These scripts are run in such a limited way that they can only read the files in the repository, they cannot do any networking/write to disk/etc and they can only do a limited set op actions against the current operation at hand (i.e. do checks, parse messages, etc). Submodules and url.url.insteadOf come in handy here. 3) For the regular hooks this language is also support and when located in the (not cloned!) .git/hooks directory they are just as powerful as a normal script (i.e. can control CI, send emails, etc.). I'm confused now; how can .git/hooks be as powerful as .githooks? The former users should consider uploading their code on GitHub. Like I said, this is just a proposal and I would like to know what you guys think. Best regards / Met vriendelijke groeten, Which reminds me that we need to have GitTogethers. Thanks for this! -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: A naive proposal for preventing loose object explosions
mf...@codeaurora.org writes: Object lookups should likely not get any slower than if repack were not run, and the extra new pack might actually help find some objects quicker. In general, having an extra pack, only to keep objects that you know are available in other packs, will make _all_ object accesses, not just the ones that are contained in that extra pack, slower. Instead of mmapping all the .idx files for all the available packfiles, we could build a table that records, for each packed object, from which packfile at what offset the data is available to optimize the access, but obviously building that in-core table will take time, so it may not be a good trade-off to do so at runtime (a precomputed super-.idx that we can mmap at runtime might be a good way forward if that turns out to be the case). Does this sound like it would work? Sorry, but it is unclear what problem you are trying to solve. Is it that you do not like that repack -A ejects unreferenced objects and makes it loose, which you may have many? The loosen_unused_packed_objects() function used by repack -A calls the force_object_loose() function (actually, it is the sole caller of the function). If you tweak the latter to stream to a single new graveyard packfile and mark it as kept until expiry, would it solve the issue the same way but with much smaller impact? There already is an infrastructure available to open a single output packfile and send multiple objects to it in bulk-checkin.c, and I am wondering if you can take advantage of the framework. The existing interface to it assumes that the object data is coming from a file descriptor (the interface was built to support bulk-checkin of many objects in an empty repository), and it needs refactoring to allow stream_to_pack() to take different kind of data sources in the form of stateful callback function, though. -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: A naive proposal for preventing loose object explosions
On Friday, September 06, 2013 11:19:02 am Junio C Hamano wrote: mf...@codeaurora.org writes: Object lookups should likely not get any slower than if repack were not run, and the extra new pack might actually help find some objects quicker. In general, having an extra pack, only to keep objects that you know are available in other packs, will make _all_ object accesses, not just the ones that are contained in that extra pack, slower. My assumption was that if the new pack, with all the consolidated reachable objects in it, happens to be searched first, it would actually speed things up. And if it is searched last, then the objects weren't in the other packs so how could it have made it slower? It seems this would only slow down the missing object path? But it sounds like all the index files are mmaped up front? Then yes, I can see how it would slow things down. However, it is one only extra (hopefully now well optimized) pack. My base assumption was that even if it does slow things down, it would likely be unmeasurable and a price worth paying to avoid an extreme penalty. Instead of mmapping all the .idx files for all the available packfiles, we could build a table that records, for each packed object, from which packfile at what offset the data is available to optimize the access, but obviously building that in-core table will take time, so it may not be a good trade-off to do so at runtime (a precomputed super-.idx that we can mmap at runtime might be a good way forward if that turns out to be the case). Does this sound like it would work? Sorry, but it is unclear what problem you are trying to solve. I think you guessed it below, I am trying to prevent loose object explosions by keeping unreachable objects around in packs (instead of loose) until expiry. With the current way that pack-objects works, this is the best I could come up with (I said naive). :( Today the git-repack calls git pack-objects like this: git pack-objects --keep-true-parents --honor-pack-keep -- non-empty --all --reflog $args /dev/null $PACKTMP This has no mechanism to place unreachable objects in a pack. If git pack-objects supported an option which streamed them to a separate file (as you suggest below), that would likely be the main piece needed to avoid the heavy-handed approach I was suggesting. The problem is how to define the interface for this? How do we get the filename of the new unreachable packfile? Today the name of the new packfile is sent to stdout, would we just tack on another name? That seems like it would break some assumptions? Maybe it would be OK if it only did that when an --unreachable flag was added? Then git-repack could be enhanced to understand that flag and the extra filenames it outputs? Is it that you do not like that repack -A ejects unreferenced objects and makes it loose, which you may have many? Yes, several times a week we have people pushing the kernel to wrong projects, this leads to 4M loose objects. :( Without a solution for this regular problem, we are very scared to move our repos off of SSDs. This leads to hour plus long fetches. The loosen_unused_packed_objects() function used by repack -A calls the force_object_loose() function (actually, it is the sole caller of the function). If you tweak the latter to stream to a single new graveyard packfile and mark it as kept until expiry, would it solve the issue the same way but with much smaller impact? Yes. There already is an infrastructure available to open a single output packfile and send multiple objects to it in bulk-checkin.c, and I am wondering if you can take advantage of the framework. The existing interface to it assumes that the object data is coming from a file descriptor (the interface was built to support bulk-checkin of many objects in an empty repository), and it needs refactoring to allow stream_to_pack() to take different kind of data sources in the form of stateful callback function, though. That feels beyond what I could currently dedicate the time to do. Like I said, my solution is heavy handed but it felt simple enough for me to try. I can spare the extra disk space and I am not convinced the performance hit would be bad. I would, of course, be delighted if someone else were to do what you suggest, but I get that it's my itch... -Martin -- The Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by The Linux Foundation -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
A naive proposal for preventing loose object explosions
I am imagining what I consider to be a naive approach to preventing loose unreachable object explosions. It may seem a bit heavy handed at first, but every conversation so far about this issue seems to have died, so I am looking for a simple incremental improvement to what we have today. I theorize that this approach will provide the same protections (good and bad) against races as using git-repack -A -d and git-prune --expire time regularly will today. 1a) Add --prune-packed option to git-repack to force a call to git prune-packed, without having to specify the -d option to git-repack. 1b) Add a --keep marker option to git-repack which will create a keep file with marker in it for existing pack files which were repacked (not to the new pack). 1c) Now instead of running: git-repack -A -d run: git-repack --prune-packed --keep 'prune-when-expired' This should effectively keep a duplicate copy of all old packfiles around, but the new pack file will not have unreferenced objects in it. This is similar to having unreachable loose objects left around, but it also keeps around extra copy(ies) of reachable objects wasting some disk space. While this will normally consume more disk space in pack files, it will not explode loose objects, which will likely save a lot of space when such explosions would have occured. Of course, this should also prevent the severe performance downsides to these explosions. Object lookups should likely not get any slower than if repack were not run, and the extra new pack might actually help find some objects quicker. Safety with respect to unreachable object race conditions should be the same as using git repack -A -d since at least one copy of every object should be kept around during this run? Then: 2a) Add support for passing in a list of pack files to git-repack. This list will then be used as the original existing list instead of finding all packfiles without keeps. 2b) Add an --expire-marked marker option to git-prune which will find any pack files with a .keep with marker in it, and evaluate if it meets the --expire time. If so, it will also call: git-repack -a -d expired-pack-files... This should repack any reachable objects from the expired-pack-files into a single new pack file. This may again cause some reachable object duplication (likely with the same performance affects as the first git-repack phase above), but unreachable objects from expired- pack-files will now have been pruned as they would have been if they had originally been turned into loose objects. 3) Finally on the next repack cycle the current duplicated reachable objects should likely get fully reconsolidated into a single copy. Does this sound like it would work? I may attempt to construct this for internal use (since it is a bit hacky). It feels like it could be done mostly with some simple shell modding/wrapping (feels less scary than messing with the core C tools). I wonder if I a missing some obvious flaw to this approach? Thanks for any insights, -Martin -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
My Proposal
Good day, I Am Chi Pui;Do not be surprised! I got your email contact via the World Email On-line Directory I am crediting officer at Sino Pac Bank Plc in Hong Kong and i have a deal of $17.3M to discuss with you urgently. Regards, Mr.Chi Pui -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Proposal From Mr. Gibson Mouka.
Dear Friend, I decided to contact you to help me actualize this business for the mutual benefit of both our families. I am the Auditing and Accounting section manager in a bank, there is one of our customers who have made fixed deposit of sum of ($39.5)million for 7 years and upon maturity; I discovered that he died after a brief illness without any next of kin on his file. I am contacting you for joining hands with the honesty and truth to ensure that the fund is transferred into your bank account. Please reply quickly enough to enable me decide how to proceed with further details. Regards. Mr. Gibson Mouka. -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] repack: rewrite the shell script in C (squashing proposal)
This patch is meant to be squashed into bb4335a21441a0 (repack: rewrite the shell script in C), I'll do so when rerolling the series. For reviewing I'll just send this patch. * Remove comments, which likely get out of date (authorship is kept in git anyway) * rename get_pack_filenames to get_non_kept_pack_filenames * catch return value of unlink and fail as the shell version did * beauty fixes to remove_temporary_files as Junio proposed * install signal handling after static variables packdir, packtmp are set * remove adding the empty string to the buffer. * fix the rollback mechanism (wrong variable name) Signed-off-by: Stefan Beller stefanbel...@googlemail.com --- builtin/repack.c | 78 ++-- 1 file changed, 36 insertions(+), 42 deletions(-) diff --git a/builtin/repack.c b/builtin/repack.c index 1f13e0d..e0d1f17 100644 --- a/builtin/repack.c +++ b/builtin/repack.c @@ -1,8 +1,3 @@ -/* - * The shell version was written by Linus Torvalds (2005) and many others. - * This is a translation into C by Stefan Beller (2013) - */ - #include builtin.h #include cache.h #include dir.h @@ -13,9 +8,8 @@ #include string-list.h #include argv-array.h -/* enabled by default since 22c79eab (2008-06-25) */ static int delta_base_offset = 1; -char *packdir; +static char *packdir, *packtmp; static const char *const git_repack_usage[] = { N_(git repack [options]), @@ -41,18 +35,16 @@ static void remove_temporary_files(void) DIR *dir; struct dirent *e; - /* .git/objects/pack */ - strbuf_addstr(buf, get_object_directory()); - strbuf_addstr(buf, /pack); - dir = opendir(buf.buf); - if (!dir) { - strbuf_release(buf); + dir = opendir(packdir); + if (!dir) return; - } - /* .git/objects/pack/.tmp-$$-pack-* */ + strbuf_addstr(buf, packdir); + + /* dirlen holds the length of the path before the file name */ dirlen = buf.len + 1; - strbuf_addf(buf, /.tmp-%d-pack-, (int)getpid()); + strbuf_addf(buf, %s, packtmp); + /* prefixlen holds the length of the prefix */ prefixlen = buf.len - dirlen; while ((e = readdir(dir))) { @@ -73,11 +65,16 @@ static void remove_pack_on_signal(int signo) raise(signo); } -static void get_pack_filenames(struct string_list *fname_list) +/* + * Adds all packs hex strings to the fname list, which do not + * have a corresponding .keep file. + */ +static void get_non_kept_pack_filenames(struct string_list *fname_list) { DIR *dir; struct dirent *e; char *fname; + size_t len; if (!(dir = opendir(packdir))) return; @@ -86,7 +83,7 @@ static void get_pack_filenames(struct string_list *fname_list) if (suffixcmp(e-d_name, .pack)) continue; - size_t len = strlen(e-d_name) - strlen(.pack); + len = strlen(e-d_name) - strlen(.pack); fname = xmemdupz(e-d_name, len); if (!file_exists(mkpath(%s/%s.keep, packdir, fname))) @@ -95,14 +92,14 @@ static void get_pack_filenames(struct string_list *fname_list) closedir(dir); } -static void remove_redundant_pack(const char *path, const char *sha1) +static void remove_redundant_pack(const char *path_prefix, const char *hex) { const char *exts[] = {.pack, .idx, .keep}; int i; struct strbuf buf = STRBUF_INIT; size_t plen; - strbuf_addf(buf, %s/%s, path, sha1); + strbuf_addf(buf, %s/%s, path_prefix, hex); plen = buf.len; for (i = 0; i ARRAY_SIZE(exts); i++) { @@ -115,15 +112,14 @@ static void remove_redundant_pack(const char *path, const char *sha1) int cmd_repack(int argc, const char **argv, const char *prefix) { const char *exts[2] = {.idx, .pack}; - char *packtmp; struct child_process cmd; struct string_list_item *item; struct argv_array cmd_args = ARGV_ARRAY_INIT; struct string_list names = STRING_LIST_INIT_DUP; - struct string_list rollback = STRING_LIST_INIT_DUP; + struct string_list rollback = STRING_LIST_INIT_NODUP; struct string_list existing_packs = STRING_LIST_INIT_DUP; struct strbuf line = STRBUF_INIT; - int count_packs, ext, ret; + int nr_packs, ext, ret, failed; FILE *out; /* variables to be filled by option parsing */ @@ -173,11 +169,11 @@ int cmd_repack(int argc, const char **argv, const char *prefix) argc = parse_options(argc, argv, prefix, builtin_repack_options, git_repack_usage, 0); - sigchain_push_common(remove_pack_on_signal); - packdir = mkpathdup(%s/pack, get_object_directory()); packtmp = mkpathdup(%s/.tmp-%d-pack, packdir, (int)getpid()); + sigchain_push_common(remove_pack_on_signal); +
Re: [PATCH] repack: rewrite the shell script in C (squashing proposal)
Stefan Beller stefanbel...@googlemail.com writes: @@ -41,18 +35,16 @@ static void remove_temporary_files(void) DIR *dir; struct dirent *e; + dir = opendir(packdir); + if (!dir) return; + strbuf_addstr(buf, packdir); + + /* dirlen holds the length of the path before the file name */ dirlen = buf.len + 1; + strbuf_addf(buf, %s, packtmp); + /* prefixlen holds the length of the prefix */ Thanks to the name of the variable that is self-describing, this comment does not add much value. But it misses the whole point of my suggestion in the earlier message to phrase these like so: /* Point at the slash at the end of .../objects/pack/ */ dirlen = strlen(packdir) + 1; /* Point at the dash at the end of .../.tmp-%d-pack- */ prefixlen = buf.len - dirlen; to clarify what the writer considers as the prefix is, which may be quite different from what the readers think the prefix is. In .tmp-2342-pack-0d8beaa5b76e824c9869f0d1f1b19ec7acf4982f.pack, is the prefix .tmp-2342-, .tmp-2342-pack, or .tmp-2342-pack-? int cmd_repack(int argc, const char **argv, const char *prefix) { ... packdir = mkpathdup(%s/pack, get_object_directory()); packtmp = mkpathdup(%s/.tmp-%d-pack, packdir, (int)getpid()); + sigchain_push_common(remove_pack_on_signal); + argv_array_push(cmd_args, pack-objects); argv_array_push(cmd_args, --keep-true-parents); argv_array_push(cmd_args, --honor-pack-keep); ... + rollback_failure.items[i].string, + rollback_failure.items[i].string); } exit(1); } The scripted version uses trap 'rm -f $PACKTMP-*' 0 1 2 3 15 so remove_temporary_files() needs to be called before exiting from the program without getting killed by a signal. Thanks. -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Proposal: sharing .git/config
Jeff King wrote: I don't think you can avoid the 3-step problem and retain the safety in the general case. Forgetting implementation details for a minute, you have either a 1-step system: 1. Fetch and start using config from the remote. which is subject to fetching and executing malicious config, or: 1. Fetch config from remote. 2. Inspect it. 3. Integrate it into the current config. I don't understand your emphasis on step 2. Isn't the configuration written by me? Why would it be malicious? I've just started thinking about how to design something that will allow us to share configuration elegantly [1]. Essentially, the metadata repository will consist of *.layout files, one for each repository to clone, containing the .git/config to write after cloning that repository. So, a git.layout might look like: [layout] directory = git [remote origin] url = git://github.com/git/git [remote ram] url = g...@github.com:artagnon/git [remote junio] url = git://github.com/gitster/git As you can see the [layout] is a special section which will tell our fetcher where to place the repository. Everything else is meant to be inserted into the repository's .git/config. However, I can foresee a problem in scaling: when I ask a specific directory like a/b/c to be populated (equivalent of repo sync `a/b/c`), it'll have to parse the layout.directory variable of all the .layout files, and this can be slow. So, maybe we should have a special _manifest.layout listing all the paths? Further, I see this as a way to work with projects that would otherwise require nested submodules like the Android project. What do you think? [1]: https://github.com/artagnon/src.layout -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html