Re: Problem with --shallow-submodules option
Hi, Thanks for the clarification, it makes sense now. Thanks, Istvan On 30 June 2016 at 22:57, Stefan Bellerwrote: > On Thu, Jun 30, 2016 at 6:27 AM, Istvan Zakar wrote: >> Hello, >> >> Thanks for your answers. I tested it after the changes were made on >> the git server, and it seems to be working. But some other issue came >> up. >> >> We have quite many submodules in our project so I did some comaprision: >> >> If I do a clone with these parameters: >> --jobs 20 --recurse-submodules >> >> The clone lasts ~53 seconds, and the total size of the folder is around 2 GB. >> >> If I add the shallow-submodules option, the size of the folder will be >> a bit below 1GB, so the size decreased as I expected, but the time of >> the clone itself increased to 90 seconds. It seems the last step of >> the command, checking out the submodules is executed one-by-one, and >> not in parallel, so it seems at this step the jobs parameter does not >> have effect. >> >> Is it intentional, or there is some option I missed? > > It was intentional at the time of submitting the patches. > The checkout phase is a bit complicated as it combines the > newly cloned submodules as well as the submodules to incrementally > fetch into one bucket and treats them the same. > > And for submodules that were fetched incrementally you may run into problems > when combining that with the local state (e.g. rebase or merge configured in > `submodule..update` or passed on the command line), which requires > human interaction (resolving the merge conflict), which we want to present one > at a time to the user. > > The handling for the user is not quite clear, when to stop, see: > 15ffb7cde48b73b3d5ce259443db7d2e0ba13750 (submodule update: continue > when a checkout fails) > 877449c136539cf8b9b4ed9cfe33a796b7b93f93 (git-submodule.sh: clarify > the "should we die now" logic) > > So we want to die as soon as we see a merge conflict or other > error that is likely to require some human interaction. > To do that properly we need to have complicated logic or just update > one submodule at a time. > > For initial checkouts we know that there will be no merge conflicts, i.e. > it will be a "checkout -f" (with an implicit must_die_on_failure=no) > So we could run all checkouts of submodules in parallel, too. We'd > just need to write the patch for that. > > As the cloning is already done in parallel, we can hook into the initial > checkout there easily. I'd build that on top of [1], creating a similar > commit. > In the successful case of `update_clone_task_finished` (the case with > `!result` -> return 0;) we would need to add the checkout command to > the queue instead of just finishing. > > [1] > https://github.com/gitster/git/commit/665b35eccd39fefd714cb5c332277a6b94fd9386 > > >> >> I'm using git 2.9.0 on client side. >> >> Thanks, >>Istvan >> >> ps: if I update the submodules with --depth 1 parameter in parallel >> using xargs it lasts about 18 seconds, so it's a workaround for this >> issue, but it would be nice to do it with a single command. >> >> >> >> >> On 22 June 2016 at 17:31, Fredrik Gustafsson wrote: >>> On Mon, Jun 20, 2016 at 01:06:39PM +, Istvan Zakar wrote: I'm working on a relatively big project with many submodules. During cloning for testing I tried to decrease the amount of data need to be fetched from the server by using --shallow-submodules option in the clone command. It seems to check out the tip of the remote repo, and if it's not the commit registered in the superproject the submodule update fails (obviously). Can I somehow tell to fetch that exact commit I need for my superproject? >>> >>> Maybe. http://stackoverflow.com/questions/2144406/git-shallow-submodules >>> gives a good overview of this problem. >>> >>> git fetches a branch and is shallow from that branch, which might be an >>> other sha1 than the one the submodule points to, (as you say). This >>> is/was one of the drawbacks with this method. However the since git 2.8, >>> git will try to fetch the sha1 direct (and not the branch). So then it >>> will work, if(!), the server supports direct access to sha1. This was >>> previously not allowed due to security concerns (if I recall correctly). >>> >>> So the answer is, yes this will work if you've a recent version of git >>> and support on the server side for doing this. Unfortunately I'm not >>> sure which git version is needed on the server side for this to work. >>> >>> -- >>> Fredrik Gustafsson >>> >>> phone: +46 733-608274 >>> e-mail: iv...@iveqy.com >>> website: http://www.iveqy.com >> -- >> To unsubscribe from this list: send the line "unsubscribe git" in >> the body of a message to majord...@vger.kernel.org >> More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More
Re: Problem with --shallow-submodules option
On Thu, Jun 30, 2016 at 6:27 AM, Istvan Zakarwrote: > Hello, > > Thanks for your answers. I tested it after the changes were made on > the git server, and it seems to be working. But some other issue came > up. > > We have quite many submodules in our project so I did some comaprision: > > If I do a clone with these parameters: > --jobs 20 --recurse-submodules > > The clone lasts ~53 seconds, and the total size of the folder is around 2 GB. > > If I add the shallow-submodules option, the size of the folder will be > a bit below 1GB, so the size decreased as I expected, but the time of > the clone itself increased to 90 seconds. It seems the last step of > the command, checking out the submodules is executed one-by-one, and > not in parallel, so it seems at this step the jobs parameter does not > have effect. > > Is it intentional, or there is some option I missed? It was intentional at the time of submitting the patches. The checkout phase is a bit complicated as it combines the newly cloned submodules as well as the submodules to incrementally fetch into one bucket and treats them the same. And for submodules that were fetched incrementally you may run into problems when combining that with the local state (e.g. rebase or merge configured in `submodule..update` or passed on the command line), which requires human interaction (resolving the merge conflict), which we want to present one at a time to the user. The handling for the user is not quite clear, when to stop, see: 15ffb7cde48b73b3d5ce259443db7d2e0ba13750 (submodule update: continue when a checkout fails) 877449c136539cf8b9b4ed9cfe33a796b7b93f93 (git-submodule.sh: clarify the "should we die now" logic) So we want to die as soon as we see a merge conflict or other error that is likely to require some human interaction. To do that properly we need to have complicated logic or just update one submodule at a time. For initial checkouts we know that there will be no merge conflicts, i.e. it will be a "checkout -f" (with an implicit must_die_on_failure=no) So we could run all checkouts of submodules in parallel, too. We'd just need to write the patch for that. As the cloning is already done in parallel, we can hook into the initial checkout there easily. I'd build that on top of [1], creating a similar commit. In the successful case of `update_clone_task_finished` (the case with `!result` -> return 0;) we would need to add the checkout command to the queue instead of just finishing. [1] https://github.com/gitster/git/commit/665b35eccd39fefd714cb5c332277a6b94fd9386 > > I'm using git 2.9.0 on client side. > > Thanks, >Istvan > > ps: if I update the submodules with --depth 1 parameter in parallel > using xargs it lasts about 18 seconds, so it's a workaround for this > issue, but it would be nice to do it with a single command. > > > > > On 22 June 2016 at 17:31, Fredrik Gustafsson wrote: >> On Mon, Jun 20, 2016 at 01:06:39PM +, Istvan Zakar wrote: >>> I'm working on a relatively big project with many submodules. During >>> cloning for testing I tried to decrease the amount of data need to be >>> fetched from the server by using --shallow-submodules option in the clone >>> command. It seems to check out the tip of the remote repo, and if it's not >>> the commit registered in the superproject the submodule update fails >>> (obviously). Can I somehow tell to fetch that exact commit I need for my >>> superproject? >> >> Maybe. http://stackoverflow.com/questions/2144406/git-shallow-submodules >> gives a good overview of this problem. >> >> git fetches a branch and is shallow from that branch, which might be an >> other sha1 than the one the submodule points to, (as you say). This >> is/was one of the drawbacks with this method. However the since git 2.8, >> git will try to fetch the sha1 direct (and not the branch). So then it >> will work, if(!), the server supports direct access to sha1. This was >> previously not allowed due to security concerns (if I recall correctly). >> >> So the answer is, yes this will work if you've a recent version of git >> and support on the server side for doing this. Unfortunately I'm not >> sure which git version is needed on the server side for this to work. >> >> -- >> Fredrik Gustafsson >> >> phone: +46 733-608274 >> e-mail: iv...@iveqy.com >> website: http://www.iveqy.com > -- > To unsubscribe from this list: send the line "unsubscribe git" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Problem with --shallow-submodules option
Hello, Thanks for your answers. I tested it after the changes were made on the git server, and it seems to be working. But some other issue came up. We have quite many submodules in our project so I did some comaprision: If I do a clone with these parameters: --jobs 20 --recurse-submodules The clone lasts ~53 seconds, and the total size of the folder is around 2 GB. If I add the shallow-submodules option, the size of the folder will be a bit below 1GB, so the size decreased as I expected, but the time of the clone itself increased to 90 seconds. It seems the last step of the command, checking out the submodules is executed one-by-one, and not in parallel, so it seems at this step the jobs parameter does not have effect. Is it intentional, or there is some option I missed? I'm using git 2.9.0 on client side. Thanks, Istvan ps: if I update the submodules with --depth 1 parameter in parallel using xargs it lasts about 18 seconds, so it's a workaround for this issue, but it would be nice to do it with a single command. On 22 June 2016 at 17:31, Fredrik Gustafssonwrote: > On Mon, Jun 20, 2016 at 01:06:39PM +, Istvan Zakar wrote: >> I'm working on a relatively big project with many submodules. During >> cloning for testing I tried to decrease the amount of data need to be >> fetched from the server by using --shallow-submodules option in the clone >> command. It seems to check out the tip of the remote repo, and if it's not >> the commit registered in the superproject the submodule update fails >> (obviously). Can I somehow tell to fetch that exact commit I need for my >> superproject? > > Maybe. http://stackoverflow.com/questions/2144406/git-shallow-submodules > gives a good overview of this problem. > > git fetches a branch and is shallow from that branch, which might be an > other sha1 than the one the submodule points to, (as you say). This > is/was one of the drawbacks with this method. However the since git 2.8, > git will try to fetch the sha1 direct (and not the branch). So then it > will work, if(!), the server supports direct access to sha1. This was > previously not allowed due to security concerns (if I recall correctly). > > So the answer is, yes this will work if you've a recent version of git > and support on the server side for doing this. Unfortunately I'm not > sure which git version is needed on the server side for this to work. > > -- > Fredrik Gustafsson > > phone: +46 733-608274 > e-mail: iv...@iveqy.com > website: http://www.iveqy.com -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Problem with --shallow-submodules option
On Mon, Jun 20, 2016 at 01:06:39PM +, Istvan Zakar wrote: > I'm working on a relatively big project with many submodules. During > cloning for testing I tried to decrease the amount of data need to be > fetched from the server by using --shallow-submodules option in the clone > command. It seems to check out the tip of the remote repo, and if it's not > the commit registered in the superproject the submodule update fails > (obviously). Can I somehow tell to fetch that exact commit I need for my > superproject? Maybe. http://stackoverflow.com/questions/2144406/git-shallow-submodules gives a good overview of this problem. git fetches a branch and is shallow from that branch, which might be an other sha1 than the one the submodule points to, (as you say). This is/was one of the drawbacks with this method. However the since git 2.8, git will try to fetch the sha1 direct (and not the branch). So then it will work, if(!), the server supports direct access to sha1. This was previously not allowed due to security concerns (if I recall correctly). So the answer is, yes this will work if you've a recent version of git and support on the server side for doing this. Unfortunately I'm not sure which git version is needed on the server side for this to work. -- Fredrik Gustafsson phone: +46 733-608274 e-mail: iv...@iveqy.com website: http://www.iveqy.com -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Problem with --shallow-submodules option
On Mon, Jun 20, 2016 at 11:32 PM, Istvan Zakarwrote: > Hi, > > Thanks for the answer. > So it means that it is a setting on the server side which can be > activated? (I guess it depends on the version of the server) > I did some reading in the topic. Are you talking about this setting > "uploadpack.allowReachableSHA1InWant", or did I misunderstood what I > read? No that's exactly what I meant; sorry for not spelling that out. Thanks, Stefan > > Thanks, > Istvan > > On 20 June 2016 at 19:45, Stefan Beller wrote: >> On Mon, Jun 20, 2016 at 6:06 AM, Istvan Zakar wrote: >>> Hello, >>> >>> I'm working on a relatively big project with many submodules. During >>> cloning for testing I tried to decrease the amount of data need to be >>> fetched from the server by using --shallow-submodules option in the clone >>> command. It seems to check out the tip of the remote repo, and if it's not >>> the commit registered in the superproject the submodule update fails >>> (obviously). >> >> Yes that is broken as the depth of a submodule is counted from its own HEAD >> not from the superprojects sha1 as it should. >> >> So it does >> >> git clone --depth=1 >> >> if HEAD != recorded gitlink sha1, >> git fetch >> >> git checkout >> >>> Can I somehow tell to fetch that exact commit I need for my >>> superproject? >> >> Some servers support fetching by direct sha1, which is what we make use >> of here, then it sort-of works. >> >> If the server doesn't support the capability to fetch an arbitrary sha1, >> the submodule command fails, with a message such as >> >> error: no such remote ref $sha1 >> Fetched in submodule path '', but it did not contain >> $sha1. Direct fetching of that commit failed. >> >> So if it breaks for you now, I would suggest not using that switch, I >> don't think there is a quick >> workaround. >> >>> >>> Thanks, >>>Istvan >> >> Thanks, >> Stefan >> >>> >>> -- >>> To unsubscribe from this list: send the line "unsubscribe git" in >>> the body of a message to majord...@vger.kernel.org >>> More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Problem with --shallow-submodules option
Hi, Thanks for the answer. So it means that it is a setting on the server side which can be activated? (I guess it depends on the version of the server) I did some reading in the topic. Are you talking about this setting "uploadpack.allowReachableSHA1InWant", or did I misunderstood what I read? Thanks, Istvan On 20 June 2016 at 19:45, Stefan Bellerwrote: > On Mon, Jun 20, 2016 at 6:06 AM, Istvan Zakar wrote: >> Hello, >> >> I'm working on a relatively big project with many submodules. During >> cloning for testing I tried to decrease the amount of data need to be >> fetched from the server by using --shallow-submodules option in the clone >> command. It seems to check out the tip of the remote repo, and if it's not >> the commit registered in the superproject the submodule update fails >> (obviously). > > Yes that is broken as the depth of a submodule is counted from its own HEAD > not from the superprojects sha1 as it should. > > So it does > > git clone --depth=1 > > if HEAD != recorded gitlink sha1, > git fetch > > git checkout > >> Can I somehow tell to fetch that exact commit I need for my >> superproject? > > Some servers support fetching by direct sha1, which is what we make use > of here, then it sort-of works. > > If the server doesn't support the capability to fetch an arbitrary sha1, > the submodule command fails, with a message such as > > error: no such remote ref $sha1 > Fetched in submodule path '', but it did not contain > $sha1. Direct fetching of that commit failed. > > So if it breaks for you now, I would suggest not using that switch, I > don't think there is a quick > workaround. > >> >> Thanks, >>Istvan > > Thanks, > Stefan > >> >> -- >> To unsubscribe from this list: send the line "unsubscribe git" in >> the body of a message to majord...@vger.kernel.org >> More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Problem with --shallow-submodules option
On Mon, Jun 20, 2016 at 6:06 AM, Istvan Zakarwrote: > Hello, > > I'm working on a relatively big project with many submodules. During > cloning for testing I tried to decrease the amount of data need to be > fetched from the server by using --shallow-submodules option in the clone > command. It seems to check out the tip of the remote repo, and if it's not > the commit registered in the superproject the submodule update fails > (obviously). Yes that is broken as the depth of a submodule is counted from its own HEAD not from the superprojects sha1 as it should. So it does git clone --depth=1 if HEAD != recorded gitlink sha1, git fetch git checkout > Can I somehow tell to fetch that exact commit I need for my > superproject? Some servers support fetching by direct sha1, which is what we make use of here, then it sort-of works. If the server doesn't support the capability to fetch an arbitrary sha1, the submodule command fails, with a message such as error: no such remote ref $sha1 Fetched in submodule path '', but it did not contain $sha1. Direct fetching of that commit failed. So if it breaks for you now, I would suggest not using that switch, I don't think there is a quick workaround. > > Thanks, >Istvan Thanks, Stefan > > -- > To unsubscribe from this list: send the line "unsubscribe git" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html