Re: [CMake] Parallel builds do not work correctly when using "cmake -E copy" to copy files
On 2008-02-06 22:37-0500 Bill Hoffman wrote: Alan W. Irwin wrote: No file level depends are done mostly a build time. This is a performance issue. Some generators like VS IDE do file level depends by themselves. With the makefiles cmake computes the depends, but at build time not cmake time. The custom command stuff output input is known at cmake time, and maybe enough for what you want. Probably, since it is usually there where the build-system developer makes mistakes in the dependencies. But if you have a file foo.c with #include , cmake does not know that foo.c depends on foo.h until build time. Right, but hopefully those automatically generated depends would be ok. Alan. __ Alan W. Irwin Astronomical research affiliation with Department of Physics and Astronomy, University of Victoria (astrowww.phys.uvic.ca). Programming affiliations with the FreeEOS equation-of-state implementation for stellar interiors (freeeos.sf.net); PLplot scientific plotting software package (plplot.org); the libLASi project (unifont.org/lasi); the Loads of Linux Links project (loll.sf.net); and the Linux Brochure Project (lbproject.sf.net). __ Linux-powered Science __ ___ CMake mailing list CMake@cmake.org http://www.cmake.org/mailman/listinfo/cmake
Re: [CMake] Parallel builds do not work correctly when using "cmake -E copy" to copy files
On 2008-02-06 21:04-0500 Brad King wrote: Alan W. Irwin wrote: On 2007-12-14 09:53-0800 Alan W. Irwin wrote: On 2007-12-14 10:32-0500 Brad King wrote: CMake employs a 2-level make recursion system that is independent of the directory structure. The first level never builds anything...it just evaluates target-level dependencies with phony targets. That determines the order in which targets must be built. The second level is the build.make for each target. This is where file-level dependencies are evaluated. In your example the file1...fileN rules are showing up in target1's build.make and target2's build.make but they should never be evaluated in the second target. They are pulled in through the additional_file rule's dependencies on them (see below), but they should always be up to date if target2 doesn't build until after target1 finishes. Then only the additional_file rule will be invoked. However if there is no dependency from target2->target1 then both build.make files may be built simultaneously and you get race conditions causing the double evaluations. CMake traces through the dependencies of custom commands in each target. When it is constructing target2 it doesn't know that target1 will also provide rules for the files. If you place the targets in different directories it would not be able to make this extra connection, but then the build would not work correctly unless you add the target-level dependency. Any further explanation here will just duplicate my previous message so I'll stop. That's fine. Your combined explanation now makes sense and completely confirms my working hypothesis that the make recursion system of CMake is responsible for the parallel build issues I was encountering. I hope I can work around these PLplot parallel build issues (note the double copy issue was only the most obvious one) by using extra target dependencies. The problem is that parallel build issues tend to appear and disappear depending on load, the N level (for -j N), and hardware. Thus, even if a whole flock of PLplot developers confirm success for parallel builds, there could be some subtle dependency issue left that we have missed, and some user down the road is going to come up with a combination of load, N, and hardware that triggers the parallel build problem because of that dependency issue. As a PLplot developer, I don't like being in such an uncertain position! I thought it important to resurrect this two-month old thread because today I _finally_ got success (at least no obvious issues, see comment below) with parallel builds of PLplot on my particular platform. That's the good news. The bad news is it took so much effort. Plplot is not that big a piece of software, but there are a large number of different components with complex dependencies between them. Therefore I had several tries in the two months to get parallel builds to work that failed miserably. This last successful effort of getting "make -J N" to work for many different N values took at least several days of isolating the problem by enabling/disabling various PLplot components until I was finally able to find and fix the last two dependency issues that showed up on my system. Even worse news is I caught the last problem only by accident. That problem only showed up intermittently for N = 4 for a very specific PLplot configuration. N=2 and N=8 never showed any problems for that configuration for my two-processor hardware! So from that experience it is unlikely I caught all issues. To help to sort out such difficult dependency issues with CMake (which affect parallel builds on Unix system and I understand also certain kinds of builds on Windows), I have a feature request I would like to discuss here before I make a formal feature request on the kitware bug system. I already made one for this: http://public.kitware.com/Bug/view.php?id=6285 That is great that you are considering automatically putting in target depends if two targets depend on the same file. That new feature would address the original issue that started this thread, and I am all in favour of this feature for that reason. However, during my dependency hell I discovered other issues with the PLplot depends such as missing dependencies between custom commands. Those missing dependencies didn't matter for the non-parallel build case because the order of the custom commands was deliberately chosen (back in our autotools days and simply copied to our CMake build system without much thought) so that the files were built in the correct order, but of course that doesn't happen for parallel builds. So some sort of output that emphasizes targets or files without many depends (which mean they are suspects for missing dependencies) is needed as well. Bill's idea of adding file depends to the graphviz output file would probably satisfy that need since isolated files/targets would really stand out. Alan
Re: [CMake] Parallel builds do not work correctly when using "cmake -E copy" to copy files
Alan W. Irwin wrote: My first interpretation was "that" referred to graphviz, but in fact the file was produced at cmake time, and it was a simple matter to process it by hand using the "dot" command-line tool (even though I had never heard of that tool or graphviz before). "gv" has errors for both the ps and pdf results, but I think that is because the latest gv is extra careful about non-standard ps and pdf files. xpdf could understand the pdf output, but I have to say the result is black with dependency lines to a frightening extent. I can send the pdf file to Brad and/or you off-list if either of you is interested in being frightened by the PLplot dependencies as well. :-) Seriously, I am fairly impressed with the graphviz result, and adding in the file depends would add a lot of value to the result. If your "that" refers to file depends instead of graphviz, I don't understand your comment since surely file depend information is available at cmake time? No file level depends are done mostly a build time. This is a performance issue. Some generators like VS IDE do file level depends by themselves. With the makefiles cmake computes the depends, but at build time not cmake time. The custom command stuff output input is known at cmake time, and maybe enough for what you want. But if you have a file foo.c with #include , cmake does not know that foo.c depends on foo.h until build time. -Bill ___ CMake mailing list CMake@cmake.org http://www.cmake.org/mailman/listinfo/cmake
Re: [CMake] Parallel builds do not work correctly when using "cmake -E copy" to copy files
On 2008-02-06 21:05-0500 Bill Hoffman wrote: Could we have a cmake command-line option to evaluate/diagnose the complete list of file and target dependencies as understood by cmake? You could start with a print out of complete target dependency chains and file dependency chains as cmake understands them. As part of that printout it would be useful to highlight files or targets that are built with few dependencies since that might be a sign of missing dependencies. And also highlight chains of file depends that include files that are part of other chains of file depends. You could put in some error analysis as well (in case two targets which do not target-depend on each other file-depend on the same file, for example.) Anyhow, as I went through this dependency hell for PLplot I kept wishing for such a diagnostic tool, and I think it would be useful for others as well that are dealing with projects like PLplot with complex dependency chains spread over quite a few different directories. What do you think? You could try this: cmake --graphviz=[file] = Generate graphviz of dependencies. It will only show the target level stuff. It would be another project to get the file level depend stuff to show up. The problem is that is done at build time and not a cmake time. My first interpretation was "that" referred to graphviz, but in fact the file was produced at cmake time, and it was a simple matter to process it by hand using the "dot" command-line tool (even though I had never heard of that tool or graphviz before). "gv" has errors for both the ps and pdf results, but I think that is because the latest gv is extra careful about non-standard ps and pdf files. xpdf could understand the pdf output, but I have to say the result is black with dependency lines to a frightening extent. I can send the pdf file to Brad and/or you off-list if either of you is interested in being frightened by the PLplot dependencies as well. :-) Seriously, I am fairly impressed with the graphviz result, and adding in the file depends would add a lot of value to the result. If your "that" refers to file depends instead of graphviz, I don't understand your comment since surely file depend information is available at cmake time? Alan __ Alan W. Irwin Astronomical research affiliation with Department of Physics and Astronomy, University of Victoria (astrowww.phys.uvic.ca). Programming affiliations with the FreeEOS equation-of-state implementation for stellar interiors (freeeos.sf.net); PLplot scientific plotting software package (plplot.org); the libLASi project (unifont.org/lasi); the Loads of Linux Links project (loll.sf.net); and the Linux Brochure Project (lbproject.sf.net). __ Linux-powered Science __ ___ CMake mailing list CMake@cmake.org http://www.cmake.org/mailman/listinfo/cmake
Re: [CMake] Parallel builds do not work correctly when using "cmake -E copy" to copy files
Alan W. Irwin wrote: On 2007-12-14 09:53-0800 Alan W. Irwin wrote: On 2007-12-14 10:32-0500 Brad King wrote: CMake employs a 2-level make recursion system that is independent of the directory structure. The first level never builds anything...it just evaluates target-level dependencies with phony targets. That determines the order in which targets must be built. The second level is the build.make for each target. This is where file-level dependencies are evaluated. In your example the file1...fileN rules are showing up in target1's build.make and target2's build.make but they should never be evaluated in the second target. They are pulled in through the additional_file rule's dependencies on them (see below), but they should always be up to date if target2 doesn't build until after target1 finishes. Then only the additional_file rule will be invoked. However if there is no dependency from target2->target1 then both build.make files may be built simultaneously and you get race conditions causing the double evaluations. CMake traces through the dependencies of custom commands in each target. When it is constructing target2 it doesn't know that target1 will also provide rules for the files. If you place the targets in different directories it would not be able to make this extra connection, but then the build would not work correctly unless you add the target-level dependency. Any further explanation here will just duplicate my previous message so I'll stop. That's fine. Your combined explanation now makes sense and completely confirms my working hypothesis that the make recursion system of CMake is responsible for the parallel build issues I was encountering. I hope I can work around these PLplot parallel build issues (note the double copy issue was only the most obvious one) by using extra target dependencies. The problem is that parallel build issues tend to appear and disappear depending on load, the N level (for -j N), and hardware. Thus, even if a whole flock of PLplot developers confirm success for parallel builds, there could be some subtle dependency issue left that we have missed, and some user down the road is going to come up with a combination of load, N, and hardware that triggers the parallel build problem because of that dependency issue. As a PLplot developer, I don't like being in such an uncertain position! I thought it important to resurrect this two-month old thread because today I _finally_ got success (at least no obvious issues, see comment below) with parallel builds of PLplot on my particular platform. That's the good news. The bad news is it took so much effort. Plplot is not that big a piece of software, but there are a large number of different components with complex dependencies between them. Therefore I had several tries in the two months to get parallel builds to work that failed miserably. This last successful effort of getting "make -J N" to work for many different N values took at least several days of isolating the problem by enabling/disabling various PLplot components until I was finally able to find and fix the last two dependency issues that showed up on my system. Even worse news is I caught the last problem only by accident. That problem only showed up intermittently for N = 4 for a very specific PLplot configuration. N=2 and N=8 never showed any problems for that configuration for my two-processor hardware! So from that experience it is unlikely I caught all issues. To help to sort out such difficult dependency issues with CMake (which affect parallel builds on Unix system and I understand also certain kinds of builds on Windows), I have a feature request I would like to discuss here before I make a formal feature request on the kitware bug system. Could we have a cmake command-line option to evaluate/diagnose the complete list of file and target dependencies as understood by cmake? You could start with a print out of complete target dependency chains and file dependency chains as cmake understands them. As part of that printout it would be useful to highlight files or targets that are built with few dependencies since that might be a sign of missing dependencies. And also highlight chains of file depends that include files that are part of other chains of file depends. You could put in some error analysis as well (in case two targets which do not target-depend on each other file-depend on the same file, for example.) Anyhow, as I went through this dependency hell for PLplot I kept wishing for such a diagnostic tool, and I think it would be useful for others as well that are dealing with projects like PLplot with complex dependency chains spread over quite a few different directories. What do you think? You could try this: cmake --graphviz=[file] = Generate graphviz of dependencies. It will only show the target level stuff. It would be another project to get the file level depend stuff to
Re: [CMake] Parallel builds do not work correctly when using "cmake -E copy" to copy files
Alan W. Irwin wrote: On 2007-12-14 09:53-0800 Alan W. Irwin wrote: On 2007-12-14 10:32-0500 Brad King wrote: CMake employs a 2-level make recursion system that is independent of the directory structure. The first level never builds anything...it just evaluates target-level dependencies with phony targets. That determines the order in which targets must be built. The second level is the build.make for each target. This is where file-level dependencies are evaluated. In your example the file1...fileN rules are showing up in target1's build.make and target2's build.make but they should never be evaluated in the second target. They are pulled in through the additional_file rule's dependencies on them (see below), but they should always be up to date if target2 doesn't build until after target1 finishes. Then only the additional_file rule will be invoked. However if there is no dependency from target2->target1 then both build.make files may be built simultaneously and you get race conditions causing the double evaluations. CMake traces through the dependencies of custom commands in each target. When it is constructing target2 it doesn't know that target1 will also provide rules for the files. If you place the targets in different directories it would not be able to make this extra connection, but then the build would not work correctly unless you add the target-level dependency. Any further explanation here will just duplicate my previous message so I'll stop. That's fine. Your combined explanation now makes sense and completely confirms my working hypothesis that the make recursion system of CMake is responsible for the parallel build issues I was encountering. I hope I can work around these PLplot parallel build issues (note the double copy issue was only the most obvious one) by using extra target dependencies. The problem is that parallel build issues tend to appear and disappear depending on load, the N level (for -j N), and hardware. Thus, even if a whole flock of PLplot developers confirm success for parallel builds, there could be some subtle dependency issue left that we have missed, and some user down the road is going to come up with a combination of load, N, and hardware that triggers the parallel build problem because of that dependency issue. As a PLplot developer, I don't like being in such an uncertain position! I thought it important to resurrect this two-month old thread because today I _finally_ got success (at least no obvious issues, see comment below) with parallel builds of PLplot on my particular platform. That's the good news. The bad news is it took so much effort. Plplot is not that big a piece of software, but there are a large number of different components with complex dependencies between them. Therefore I had several tries in the two months to get parallel builds to work that failed miserably. This last successful effort of getting "make -J N" to work for many different N values took at least several days of isolating the problem by enabling/disabling various PLplot components until I was finally able to find and fix the last two dependency issues that showed up on my system. Even worse news is I caught the last problem only by accident. That problem only showed up intermittently for N = 4 for a very specific PLplot configuration. N=2 and N=8 never showed any problems for that configuration for my two-processor hardware! So from that experience it is unlikely I caught all issues. To help to sort out such difficult dependency issues with CMake (which affect parallel builds on Unix system and I understand also certain kinds of builds on Windows), I have a feature request I would like to discuss here before I make a formal feature request on the kitware bug system. I already made one for this: http://public.kitware.com/Bug/view.php?id=6285 -Brad ___ CMake mailing list CMake@cmake.org http://www.cmake.org/mailman/listinfo/cmake
Re: [CMake] Parallel builds do not work correctly when using "cmake -E copy" to copy files
On 2007-12-14 09:53-0800 Alan W. Irwin wrote: On 2007-12-14 10:32-0500 Brad King wrote: CMake employs a 2-level make recursion system that is independent of the directory structure. The first level never builds anything...it just evaluates target-level dependencies with phony targets. That determines the order in which targets must be built. The second level is the build.make for each target. This is where file-level dependencies are evaluated. In your example the file1...fileN rules are showing up in target1's build.make and target2's build.make but they should never be evaluated in the second target. They are pulled in through the additional_file rule's dependencies on them (see below), but they should always be up to date if target2 doesn't build until after target1 finishes. Then only the additional_file rule will be invoked. However if there is no dependency from target2->target1 then both build.make files may be built simultaneously and you get race conditions causing the double evaluations. CMake traces through the dependencies of custom commands in each target. When it is constructing target2 it doesn't know that target1 will also provide rules for the files. If you place the targets in different directories it would not be able to make this extra connection, but then the build would not work correctly unless you add the target-level dependency. Any further explanation here will just duplicate my previous message so I'll stop. That's fine. Your combined explanation now makes sense and completely confirms my working hypothesis that the make recursion system of CMake is responsible for the parallel build issues I was encountering. I hope I can work around these PLplot parallel build issues (note the double copy issue was only the most obvious one) by using extra target dependencies. The problem is that parallel build issues tend to appear and disappear depending on load, the N level (for -j N), and hardware. Thus, even if a whole flock of PLplot developers confirm success for parallel builds, there could be some subtle dependency issue left that we have missed, and some user down the road is going to come up with a combination of load, N, and hardware that triggers the parallel build problem because of that dependency issue. As a PLplot developer, I don't like being in such an uncertain position! I thought it important to resurrect this two-month old thread because today I _finally_ got success (at least no obvious issues, see comment below) with parallel builds of PLplot on my particular platform. That's the good news. The bad news is it took so much effort. Plplot is not that big a piece of software, but there are a large number of different components with complex dependencies between them. Therefore I had several tries in the two months to get parallel builds to work that failed miserably. This last successful effort of getting "make -J N" to work for many different N values took at least several days of isolating the problem by enabling/disabling various PLplot components until I was finally able to find and fix the last two dependency issues that showed up on my system. Even worse news is I caught the last problem only by accident. That problem only showed up intermittently for N = 4 for a very specific PLplot configuration. N=2 and N=8 never showed any problems for that configuration for my two-processor hardware! So from that experience it is unlikely I caught all issues. To help to sort out such difficult dependency issues with CMake (which affect parallel builds on Unix system and I understand also certain kinds of builds on Windows), I have a feature request I would like to discuss here before I make a formal feature request on the kitware bug system. Could we have a cmake command-line option to evaluate/diagnose the complete list of file and target dependencies as understood by cmake? You could start with a print out of complete target dependency chains and file dependency chains as cmake understands them. As part of that printout it would be useful to highlight files or targets that are built with few dependencies since that might be a sign of missing dependencies. And also highlight chains of file depends that include files that are part of other chains of file depends. You could put in some error analysis as well (in case two targets which do not target-depend on each other file-depend on the same file, for example.) Anyhow, as I went through this dependency hell for PLplot I kept wishing for such a diagnostic tool, and I think it would be useful for others as well that are dealing with projects like PLplot with complex dependency chains spread over quite a few different directories. What do you think? Alan __ Alan W. Irwin Astronomical research affiliation with Department of Physics and Astronomy, University of Victoria (astrowww.phys.uvic.ca). Programming affiliations with the FreeEOS equation-of-state implementation f
Re: [CMake] Parallel builds do not work correctly when using "cmake -E copy" to copy files
Alan W. Irwin wrote: So let me rephrase the question. Are the CMake developers happy with the present state of the dependencies system or are you considering some major changes there because of such issues as the difficulties in getting parallel builds to work properly for projects like PLplot which (necessarily) have complicated chains of dependencies? I personally don't have any problems with the current state. I do parallel builds of big projects all the time. There will be no major changes. If you want to submit a feature request to the bug tracker you're welcome to do so. -Brad ___ CMake mailing list CMake@cmake.org http://www.cmake.org/mailman/listinfo/cmake
Re: [CMake] Parallel builds do not work correctly when using "cmake -E copy" to copy files
On 2007-12-15 12:57-0500 Brad King wrote: Alan W. Irwin wrote: Well, it turns out I had to add four different target dependencies to the CMake-based PLplot build system to get rid of the parallel build problems I was having on my Core Duo box. One of them was pretty subtle so I missed it for my first review of the dependencies. Nevertheless, these changes were not as extensive as I thought they would be so there is some hope that I didn't miss anything that will show up as strange parallel build problems for PLplot on other machines. Great, I'm glad you got it working. Well, I thought so, but my previous test was without the (docbook) documentation build. Now, that I have included that, the parallel build errors out. For the last few hours I have been going through the complicated dependencies in our documentation build, but I just cannot see what is causing the trouble. Perhaps if I sleep on it, it will become obvious tomorrow. Is that complete rework actually going to happen for 2.6.x or is it currently just a gleam in the CMake developer's eyes? To what message are you referring? I was sure I remembered a discussion of reworking the CMake depends system on this list in the past year, but I have been unable to find it so perhaps I was misremembering (or perhaps my searching skills are not good enough). So let me rephrase the question. Are the CMake developers happy with the present state of the dependencies system or are you considering some major changes there because of such issues as the difficulties in getting parallel builds to work properly for projects like PLplot which (necessarily) have complicated chains of dependencies? Alan __ Alan W. Irwin Astronomical research affiliation with Department of Physics and Astronomy, University of Victoria (astrowww.phys.uvic.ca). Programming affiliations with the FreeEOS equation-of-state implementation for stellar interiors (freeeos.sf.net); PLplot scientific plotting software package (plplot.org); the libLASi project (unifont.org/lasi); the Loads of Linux Links project (loll.sf.net); and the Linux Brochure Project (lbproject.sf.net). __ Linux-powered Science __ ___ CMake mailing list CMake@cmake.org http://www.cmake.org/mailman/listinfo/cmake
Re: [CMake] Parallel builds do not work correctly when using "cmake -E copy" to copy files
Alan W. Irwin wrote: Well, it turns out I had to add four different target dependencies to the CMake-based PLplot build system to get rid of the parallel build problems I was having on my Core Duo box. One of them was pretty subtle so I missed it for my first review of the dependencies. Nevertheless, these changes were not as extensive as I thought they would be so there is some hope that I didn't miss anything that will show up as strange parallel build problems for PLplot on other machines. Great, I'm glad you got it working. Is that complete rework actually going to happen for 2.6.x or is it currently just a gleam in the CMake developer's eyes? To what message are you referring? -Brad ___ CMake mailing list CMake@cmake.org http://www.cmake.org/mailman/listinfo/cmake
Re: [CMake] Parallel builds do not work correctly when using "cmake -E copy" to copy files
On 2007-12-14 09:53-0800 Alan W. Irwin wrote: Obviously, CMake 2.4.x users are stuck with these file dependency issues and their workarounds, but for obvious reasons and especially for the parallel build case I hope the complete rework of the CMake dependency system that has been mentioned previously on list will remove these limitations. Well, it turns out I had to add four different target dependencies to the CMake-based PLplot build system to get rid of the parallel build problems I was having on my Core Duo box. One of them was pretty subtle so I missed it for my first review of the dependencies. Nevertheless, these changes were not as extensive as I thought they would be so there is some hope that I didn't miss anything that will show up as strange parallel build problems for PLplot on other machines. I am still interested in the answer to the question below. Alan Is that complete rework actually going to happen for 2.6.x or is it currently just a gleam in the CMake developer's eyes? __ Alan W. Irwin Astronomical research affiliation with Department of Physics and Astronomy, University of Victoria (astrowww.phys.uvic.ca). Programming affiliations with the FreeEOS equation-of-state implementation for stellar interiors (freeeos.sf.net); PLplot scientific plotting software package (plplot.org); the libLASi project (unifont.org/lasi); the Loads of Linux Links project (loll.sf.net); and the Linux Brochure Project (lbproject.sf.net). __ Linux-powered Science __ ___ CMake mailing list CMake@cmake.org http://www.cmake.org/mailman/listinfo/cmake
Re: [CMake] Parallel builds do not work correctly when using "cmake -E copy" to copy files
On 2007-12-14 12:49-0500 Bill Hoffman wrote: You might also want to consider visual studio builds. It will build two targets at the same time if there is no dependency between them, and would have the same issue. Currently, we have had no reports about such problems. However, our windows developers (and users) tend to use just the core of PLplot mostly because that was all that was available in the past for our previous home-brew build system for windows, and installing the extra libraries needed for the rest of PLplot (additional language interfaces and additional plot device and plot file drivers) can be an issue for windows users. Thus, it is quite possible our windows developers have so far fortuitously skated by the issue, and that testing of a complete PLplot build on windows would show similar dependency issues to what I am getting now with parallel builds of a complete PLplot under Linux. Hopefully, a complete dependency review and deploying the appropriate workarounds will make all these PLplot parallel build (and potentially windows build) problems go away, but I am definitely concerned the review might miss some complex target- or file-dependency chains that may only cause intermittent and difficult-to-reproduce parallel build problems. Alan __ Alan W. Irwin Astronomical research affiliation with Department of Physics and Astronomy, University of Victoria (astrowww.phys.uvic.ca). Programming affiliations with the FreeEOS equation-of-state implementation for stellar interiors (freeeos.sf.net); PLplot scientific plotting software package (plplot.org); the libLASi project (unifont.org/lasi); the Loads of Linux Links project (loll.sf.net); and the Linux Brochure Project (lbproject.sf.net). __ Linux-powered Science __ ___ CMake mailing list CMake@cmake.org http://www.cmake.org/mailman/listinfo/cmake
Re: [CMake] Parallel builds do not work correctly when using "cmake -E copy" to copy files
On 2007-12-14 10:32-0500 Brad King wrote: Alan W. Irwin wrote: I am struggling with understanding the recursive make system that CMake normally employs CMake employs a 2-level make recursion system that is independent of the directory structure. The first level never builds anything...it just evaluates target-level dependencies with phony targets. That determines the order in which targets must be built. The second level is the build.make for each target. This is where file-level dependencies are evaluated. In your example the file1...fileN rules are showing up in target1's build.make and target2's build.make but they should never be evaluated in the second target. They are pulled in through the additional_file rule's dependencies on them (see below), but they should always be up to date if target2 doesn't build until after target1 finishes. Then only the additional_file rule will be invoked. However if there is no dependency from target2->target1 then both build.make files may be built simultaneously and you get race conditions causing the double evaluations. CMake traces through the dependencies of custom commands in each target. When it is constructing target2 it doesn't know that target1 will also provide rules for the files. If you place the targets in different directories it would not be able to make this extra connection, but then the build would not work correctly unless you add the target-level dependency. Any further explanation here will just duplicate my previous message so I'll stop. That's fine. Your combined explanation now makes sense and completely confirms my working hypothesis that the make recursion system of CMake is responsible for the parallel build issues I was encountering. I hope I can work around these PLplot parallel build issues (note the double copy issue was only the most obvious one) by using extra target dependencies. The problem is that parallel build issues tend to appear and disappear depending on load, the N level (for -j N), and hardware. Thus, even if a whole flock of PLplot developers confirm success for parallel builds, there could be some subtle dependency issue left that we have missed, and some user down the road is going to come up with a combination of load, N, and hardware that triggers the parallel build problem because of that dependency issue. As a PLplot developer, I don't like being in such an uncertain position! I wonder if really large projects such as KDE have attempted to deal with parallel build issues for CMake or whether they have just given up on parallel builds because the symptoms can be so intermittent and non-reproducible. I was well aware of the CMake file dependency limitation for separate directories before, but I did not realize that limitation extends to file dependencies in the _same_ directory when using parallel builds. I personally think these limitations of your 2-level make recursion system are pretty ugly since they require CMake users to deploy nonintuitive (at least from the make perspective) additional target-level dependencies as a workaround for these issues. The problem is especially pernicious for the parallel builds case since the symptoms are inherently difficult to reproduce. Obviously, CMake 2.4.x users are stuck with these file dependency issues and their workarounds, but for obvious reasons and especially for the parallel build case I hope the complete rework of the CMake dependency system that has been mentioned previously on list will remove these limitations. Is that complete rework actually going to happen for 2.6.x or is it currently just a gleam in the CMake developer's eyes? Alan __ Alan W. Irwin Astronomical research affiliation with Department of Physics and Astronomy, University of Victoria (astrowww.phys.uvic.ca). Programming affiliations with the FreeEOS equation-of-state implementation for stellar interiors (freeeos.sf.net); PLplot scientific plotting software package (plplot.org); the libLASi project (unifont.org/lasi); the Loads of Linux Links project (loll.sf.net); and the Linux Brochure Project (lbproject.sf.net). __ Linux-powered Science __ ___ CMake mailing list CMake@cmake.org http://www.cmake.org/mailman/listinfo/cmake
Re: [CMake] Parallel builds do not work correctly when using "cmake -E copy" to copy files
You might also want to consider visual studio builds. It will build two targets at the same time if there is no dependency between them, and would have the same issue. -Bill ___ CMake mailing list CMake@cmake.org http://www.cmake.org/mailman/listinfo/cmake
Re: [CMake] Parallel builds do not work correctly when using "cmake -E copy" to copy files
Alan W. Irwin wrote: > I am struggling with understanding the recursive make system that > CMake normally employs CMake employs a 2-level make recursion system that is independent of the directory structure. The first level never builds anything...it just evaluates target-level dependencies with phony targets. That determines the order in which targets must be built. The second level is the build.make for each target. This is where file-level dependencies are evaluated. In your example the file1...fileN rules are showing up in target1's build.make and target2's build.make but they should never be evaluated in the second target. They are pulled in through the additional_file rule's dependencies on them (see below), but they should always be up to date if target2 doesn't build until after target1 finishes. Then only the additional_file rule will be invoked. However if there is no dependency from target2->target1 then both build.make files may be built simultaneously and you get race conditions causing the double evaluations. CMake traces through the dependencies of custom commands in each target. When it is constructing target2 it doesn't know that target1 will also provide rules for the files. If you place the targets in different directories it would not be able to make this extra connection, but then the build would not work correctly unless you add the target-level dependency. Any further explanation here will just duplicate my previous message so I'll stop. -Brad ___ CMake mailing list CMake@cmake.org http://www.cmake.org/mailman/listinfo/cmake
Re: [CMake] Parallel builds do not work correctly when using "cmake -E copy" to copy files
On 2007-12-13 18:45-0800 Alan W. Irwin wrote: Brad, I am struggling with understanding the recursive make system that CMake normally employs so I am having trouble following the complete Makefile logic that my simple example creates. However, CMakeFiles/tclIndex_examples_tcl2.dir/build.make generated by my simple CMake example seems to follow the above OPTION A scenario. Indeed, if I execute that Makefile directly from the commmand line, e.g., make -f CMakeFiles/tclIndex_examples_tcl2.dir/build.make \ CMakeFiles/tclIndex_examples_tcl2.dir/clean make -j 2 -f CMakeFiles/tclIndex_examples_tcl2.dir/build.make \ CMakeFiles/tclIndex_examples_tcl2.dir/build there are never double copy problems, while if I run make clean make -j 2 there are always double copy problems. (You should try this for yourself to be sure you can replicate my experience.) So my current working hypothesis is there is a parallel build issue for OPTION A that CMake artificially introduces when it recursively invokes make (i.e., the result of the above "make -j 2" command). That last sentence was poorly written. Replace it with the following: So my current working hypothesis is there is a parallel build issue for OPTION A that CMake artificially introduces through its generated recursive make system. That generated recursive make system is invoked with the above "make -j 2" command, but bypassed with the "make -j 2 -f CMakeFiles/..." command above. Alan __ Alan W. Irwin Astronomical research affiliation with Department of Physics and Astronomy, University of Victoria (astrowww.phys.uvic.ca). Programming affiliations with the FreeEOS equation-of-state implementation for stellar interiors (freeeos.sf.net); PLplot scientific plotting software package (plplot.org); the libLASi project (unifont.org/lasi); the Loads of Linux Links project (loll.sf.net); and the Linux Brochure Project (lbproject.sf.net). __ Linux-powered Science __ ___ CMake mailing list CMake@cmake.org http://www.cmake.org/mailman/listinfo/cmake
Re: [CMake] Parallel builds do not work correctly when using "cmake -E copy" to copy files
On 2007-12-13 19:15-0500 Brad King wrote: Alan W. Irwin wrote: So the rule seems to be that parallel builds do not work if there are two or more separate custom targets that file depend directly or indirectly (via some custom command file dependency chain) on the same output files. Another way of summarizing these results is that file depends must be minimized and/or custom target depends maximized in order for parallel builds to work properly. This is correct. My guess is I should be able to work around this CMake issue by appropriate changes to the PLplot build system although I have a number of these parallel build issues and the copy problem was only the most obvious. I do regard this as a CMake issue. Normally, the shoe is on the other foot, and the build system developer is desperately trying to make sure that all the CMake file depends are obviously in place locally rather than depending on a long easily-broken chain of dependencies to do it for them in a minimalist way. So the big question is whether CMake can be modified so that minimalist file depends and/or maximal (and unintuitive) target depends are not required in order for parallel builds to work properly. I don't see how CMake can automatically fix this. If two targets think they know how to build the same file how is CMake supposed to know which one is the correct target to build first? I am not completely convinced by that reasoning. Let me create an abstract case in Makefile terms that we can discuss (at first) strictly from the GNU make point of view. Putting my simple test case in Makefile terms we have the following rules: all: target1 target2 file1: custom command to create file1 fileN: custom command to create fileN target1: file1, file2,fileN additional_file: file1, file2,fileN custom command to create additional_file There are two alternatives for the target2 dependencies Either target2: additional_file (OPTION A) or target2: target1 additional_file (OPTION B) all, target1, and target2 arephony targets that do not correspond to actual files. OPTION A is what my simple example (and current PLplot) uses. OPTION B is one fixup you discussed where you made target2 depend on target1. I know OPTION A always works for serial builds. The reason is each target knows independently how to build what it needs. target1 file-depends directly on the files from a variety of file1 through fileN custom rules. So it knows how to build exactly what it needs. target2 (with OPTION A) file-depends on additional_file which file depends on the file1-fileN rules. So target2 knows how to build exactly what it needs as well. Thus, if you remove target1, target2 would build without problems and vice versa. Of course once you introduce parallel builds it gets complicated, and I would appreciate your input on that. I have always assumed that if one processor was busy creating file1 through the target1 chain of dependencies the make command kept track that a build of file1 was in progress so the other processor would not attempt to duplicate that build regardless of whether it needed it via the target1 or target2 dependency chains. Indeed, OPTION A works well for parallel builds (see below where I non-recursively invoke the appropriate Makefile that is generated by CMake). You claim that OPTION B must be used for parallel builds (at least if I have understood you correctly and if my mental model of how cmake dependencies translate to Makefile dependencies is correct). I just don't see the necessity of OPTION B for parallel builds for non-recursive Makefiles, but I am willing to be educated. :-) If GNU make does parallel builds without problems for non-recursive OPTION A (which appears to be the case, see below), then my concern is that CMake is introducing some additional make issues via the make recursion that it normally employs that screws up parallel builds and OPTION B is simply a workaround that bypasses that recursion issue. Brad, I am struggling with understanding the recursive make system that CMake normally employs so I am having trouble following the complete Makefile logic that my simple example creates. However, CMakeFiles/tclIndex_examples_tcl2.dir/build.make generated by my simple CMake example seems to follow the above OPTION A scenario. Indeed, if I execute that Makefile directly from the commmand line, e.g., make -f CMakeFiles/tclIndex_examples_tcl2.dir/build.make \ CMakeFiles/tclIndex_examples_tcl2.dir/clean make -j 2 -f CMakeFiles/tclIndex_examples_tcl2.dir/build.make \ CMakeFiles/tclIndex_examples_tcl2.dir/build there are never double copy problems, while if I run make clean make -j 2 there are always double copy problems. (You should try this for yourself to be sure you can replicate my experience.) So my current working hypothesis is there is a parallel build issue for OPTION A that CMake artificially introduces when it recursively invokes make (i.e., the res
Re: [CMake] Parallel builds do not work correctly when using "cmake -E copy" to copy files
Alan W. Irwin wrote: > So the rule seems to be that parallel builds do not work if there are two > or more separate custom targets that file depend directly or indirectly > (via > some custom command file dependency chain) on the same output files. > > Another way of summarizing these results is that file depends must be > minimized and/or custom target depends maximized in order for parallel > builds to work properly. This is correct. > My guess is I should be able to work around this CMake issue by appropriate > changes to the PLplot build system although I have a number of these > parallel build issues and the copy problem was only the most obvious. > > I do regard this as a CMake issue. Normally, the shoe is on the other > foot, > and the build system developer is desperately trying to make sure that all > the CMake file depends are obviously in place locally rather than depending > on a long easily-broken chain of dependencies to do it for them in a > minimalist way. So the big question is whether CMake can be modified so > that minimalist file depends and/or maximal (and unintuitive) target > depends > are not required in order for parallel builds to work properly. I don't see how CMake can automatically fix this. If two targets think they know how to build the same file how is CMake supposed to know which one is the correct target to build first? Consider this: 1.) Split your example into two separate directories 2.) Put the first-level custom commands in a target in one dir 3.) Put the second-level custom commands in a target in the other dir 4.) Try to build Without explicit dependence between the two targets there is no way the target in the second directory can know it must wait for the target in the first directory to build. It doesn't even know about the first-level custom command rules or that the input to one if its second-level custom commands is a generated file. The second directory's target will try to build and complain that input files are missing. This problem arises because CMake does not track file-level dependencies globally. It actually can't because some target build environments like the VS and Xcode IDEs do not provide this capability. File-level dependencies must be divided into targets. In your case the target-level dependency that must be added is a logical high-level statement of dependence: "I need all the 'tcl_examples' to be ready before I use them for anything" -Brad ___ CMake mailing list CMake@cmake.org http://www.cmake.org/mailman/listinfo/cmake
Re: [CMake] Parallel builds do not work correctly when using "cmake -E copy" to copy files
On 2007-12-13 17:07-0500 Brad King wrote: Alan W. Irwin wrote: So just keeping narrowly focussed on that fragment there is only one "ALL" custom target and ADD_DEPENDENCIES would not help since it only works on targets. Thus, I doubt there is anything locally wrong with dependencies there. It is possible some other dependency is making a dependency pattern that triggers the bug, but I should know more about that when I have a simpler example that triggers the bug (or not). I was able to reproduce the problem with the code below. It is fixed by uncommenting the ADD_DEPENDENCIES line. You must be putting those output files into another target that does not depend on the tcl_examples target. Perhaps the make_documentation target? -Brad PROJECT(FOO) FOREACH(f f1 f2 f3 f4 f5 f6 f7 f8 f9) LIST(APPEND DEPS ${FOO_BINARY_DIR}/${f}) ADD_CUSTOM_COMMAND(OUTPUT ${FOO_BINARY_DIR}/${f} COMMAND echo ${f} > ${FOO_BINARY_DIR}/${f} ) ENDFOREACH(f) ADD_CUSTOM_TARGET(examples ALL DEPENDS ${DEPS}) ADD_CUSTOM_TARGET(examples2 ALL DEPENDS ${DEPS}) #ADD_DEPENDENCIES(examples2 examples) Good example, Brad! Working from the PLplot case, I came up with another simple test case (complete tarball attached including the required small files to be copied for those who want to play with it). In the PLplot case (and also the attached simple test case) there is an additional custom command that file depends on the copied files. In addition there is a custom target that depends on the additional custom command output file, and a custom target that depends on the copied files. So the rule seems to be that parallel builds do not work if there are two or more separate custom targets that file depend directly or indirectly (via some custom command file dependency chain) on the same output files. Another way of summarizing these results is that file depends must be minimized and/or custom target depends maximized in order for parallel builds to work properly. My guess is I should be able to work around this CMake issue by appropriate changes to the PLplot build system although I have a number of these parallel build issues and the copy problem was only the most obvious. I do regard this as a CMake issue. Normally, the shoe is on the other foot, and the build system developer is desperately trying to make sure that all the CMake file depends are obviously in place locally rather than depending on a long easily-broken chain of dependencies to do it for them in a minimalist way. So the big question is whether CMake can be modified so that minimalist file depends and/or maximal (and unintuitive) target depends are not required in order for parallel builds to work properly. Alan __ Alan W. Irwin Astronomical research affiliation with Department of Physics and Astronomy, University of Victoria (astrowww.phys.uvic.ca). Programming affiliations with the FreeEOS equation-of-state implementation for stellar interiors (freeeos.sf.net); PLplot scientific plotting software package (plplot.org); the libLASi project (unifont.org/lasi); the Loads of Linux Links project (loll.sf.net); and the Linux Brochure Project (lbproject.sf.net). __ Linux-powered Science __ test_parallel.tar.gz Description: complete CMake test case for bad parallel builds ___ CMake mailing list CMake@cmake.org http://www.cmake.org/mailman/listinfo/cmake
Re: [CMake] Parallel builds do not work correctly when using "cmake -E copy" to copy files
Alan W. Irwin wrote: > So just keeping narrowly focussed on that fragment there is only one "ALL" > custom target and ADD_DEPENDENCIES would not help since it only works on > targets. Thus, I doubt there is anything locally wrong with dependencies > there. It is possible some other dependency is making a dependency pattern > that triggers the bug, but I should know more about that when I have a > simpler example that triggers the bug (or not). I was able to reproduce the problem with the code below. It is fixed by uncommenting the ADD_DEPENDENCIES line. You must be putting those output files into another target that does not depend on the tcl_examples target. Perhaps the make_documentation target? -Brad PROJECT(FOO) FOREACH(f f1 f2 f3 f4 f5 f6 f7 f8 f9) LIST(APPEND DEPS ${FOO_BINARY_DIR}/${f}) ADD_CUSTOM_COMMAND(OUTPUT ${FOO_BINARY_DIR}/${f} COMMAND echo ${f} > ${FOO_BINARY_DIR}/${f} ) ENDFOREACH(f) ADD_CUSTOM_TARGET(examples ALL DEPENDS ${DEPS}) ADD_CUSTOM_TARGET(examples2 ALL DEPENDS ${DEPS}) #ADD_DEPENDENCIES(examples2 examples) ___ CMake mailing list CMake@cmake.org http://www.cmake.org/mailman/listinfo/cmake
Re: [CMake] Parallel builds do not work correctly when using "cmake -E copy" to copy files
Alan W. Irwin wrote: > It was good to hear that make -j N normally works with CMake. Yes indeed. I frequently run make -j70 across a 35-host dual-CPU cluster using distcc, and every time I've updated CMake's files, it's correctly rebuilt the makefiles before continuing. http://www.cmake.org/mailman/listinfo/cmake
Re: [CMake] Parallel builds do not work correctly when using "cmake -E copy" to copy files
On 2007-12-13 15:39-0500 Bill Hoffman wrote: Alan W. Irwin wrote: My obvious next step is to try and make a simple CMake example that reliably reproduces the bug, but this is such an important bug (at least for those with access to multiprocessors who want to use parallel builds) that I thought the above result was worth reporting immediately since it tends to point the finger at something CMake is doing rather than some bug in GNU make. We use -j N builds all the time at Kitware for VTK, ParaView and CMake. It is however, possible to create input to CMake that will not work in a parallel environment. A simple example would be the best way to figure out if there is a way around the issue you are having. One thing you might want to look at is the add_dependancy command, and make sure that your custom targets are built in some order. From your email, I am not exactly sure what targets are involved and what files are created at what time. It was good to hear that make -j N normally works with CMake. To answer your question, from the CMake language fragment in the first e-mail on this issue, there is a custom command (with OUTPUT signature with full pathname) for each file to be copied, and then an overall "ALL" custom target that file depends (with full pathname) on those OUTPUT files. So just keeping narrowly focussed on that fragment there is only one "ALL" custom target and ADD_DEPENDENCIES would not help since it only works on targets. Thus, I doubt there is anything locally wrong with dependencies there. It is possible some other dependency is making a dependency pattern that triggers the bug, but I should know more about that when I have a simpler example that triggers the bug (or not). Alan __ Alan W. Irwin Astronomical research affiliation with Department of Physics and Astronomy, University of Victoria (astrowww.phys.uvic.ca). Programming affiliations with the FreeEOS equation-of-state implementation for stellar interiors (freeeos.sf.net); PLplot scientific plotting software package (plplot.org); the libLASi project (unifont.org/lasi); the Loads of Linux Links project (loll.sf.net); and the Linux Brochure Project (lbproject.sf.net). __ Linux-powered Science __ ___ CMake mailing list CMake@cmake.org http://www.cmake.org/mailman/listinfo/cmake
Re: [CMake] Parallel builds do not work correctly when using "cmake -E copy" to copy files
Alan W. Irwin wrote: My obvious next step is to try and make a simple CMake example that reliably reproduces the bug, but this is such an important bug (at least for those with access to multiprocessors who want to use parallel builds) that I thought the above result was worth reporting immediately since it tends to point the finger at something CMake is doing rather than some bug in GNU make. We use -j N builds all the time at Kitware for VTK, ParaView and CMake. It is however, possible to create input to CMake that will not work in a parallel environment. A simple example would be the best way to figure out if there is a way around the issue you are having. One thing you might want to look at is the add_dependancy command, and make sure that your custom targets are built in some order. From your email, I am not exactly sure what targets are involved and what files are created at what time. -Bill ___ CMake mailing list CMake@cmake.org http://www.cmake.org/mailman/listinfo/cmake
Re: [CMake] Parallel builds do not work correctly when using "cmake -E copy" to copy files
On 2007-12-12 17:10-0800 Alan W. Irwin wrote: A set of custom rules to copy files from the source tree to the build tree is screwing up for parallel builds on Debian testing with cmake 2.4.7. Here is part of the "make -j 2" output: make -f examples/tcl/CMakeFiles/tclIndex_examples_tcl.dir/build.make examples/tcl/CMakeFiles/tclIndex_examples_tcl.dir/build [...] make[2]: *** [examples/tcl/x05] Error 1 make[2]: Leaving directory `/home/software/plplot_cvs/HEAD/build_dir' /usr/bin/cmake -E cmake_progress_report /home/software/plplot_cvs/HEAD/build_dir/CMakeFiles make[1]: *** [examples/tcl/CMakeFiles/tclIndex_examples_tcl.dir/all] Error 2 make[1]: *** Waiting for unfinished jobs Note, the above make command is generated recursively from the overall "make -j 2" command. I have no idea how the -j option was propagated in that case, but I assume it was via some Makefile variable. Now here is the really strange part. If I directly run make -f examples/tcl/CMakeFiles/tclIndex_examples_tcl.dir/build.make \ examples/tcl/CMakeFiles/tclIndex_examples_tcl.dir/clean make -j 2 -f examples/tcl/CMakeFiles/tclIndex_examples_tcl.dir/build.make \ examples/tcl/CMakeFiles/tclIndex_examples_tcl.dir/build from the command line there are no extra copying operations and no errors. Also, all other attempts to reproduce this bug from a hand-crafted Makefile have failed. However, the double copy is reliably reproduced in the build tree by make clean make -j 2 and intermittently that above sequence produces other "*** Waiting for unfinished jobs..." errors as well, but those are more complicated than the simple double copy errors I have documented so I won't go into details. In sum, so far it appears I need CMake-generated Makefiles that are recursively run with "make -j 2" in order to see parallel build errors. Thus, my working hypothesis is these bad parallel build results are due to some CMake error (a Makefile variable that is set or propagated incorrectly?) in the way it does make recursion. My obvious next step is to try and make a simple CMake example that reliably reproduces the bug, but this is such an important bug (at least for those with access to multiprocessors who want to use parallel builds) that I thought the above result was worth reporting immediately since it tends to point the finger at something CMake is doing rather than some bug in GNU make. Alan __ Alan W. Irwin Astronomical research affiliation with Department of Physics and Astronomy, University of Victoria (astrowww.phys.uvic.ca). Programming affiliations with the FreeEOS equation-of-state implementation for stellar interiors (freeeos.sf.net); PLplot scientific plotting software package (plplot.org); the libLASi project (unifont.org/lasi); the Loads of Linux Links project (loll.sf.net); and the Linux Brochure Project (lbproject.sf.net). __ Linux-powered Science __ ___ CMake mailing list CMake@cmake.org http://www.cmake.org/mailman/listinfo/cmake
Re: [CMake] Parallel builds do not work correctly when using "cmake -E copy" to copy files
On 2007-12-12 17:10-0800 Alan W. Irwin wrote: A set of custom rules to copy files from the source tree to the build tree is screwing up for parallel builds on Debian testing with cmake 2.4.7. The parallel builds are done with "make -j 2" on a core duo system (Intel E6550 2.33 MHz). Before anybody else spots that, I have to say that system sure would be slow! I meant GHz, of course. :-) Alan __ Alan W. Irwin Astronomical research affiliation with Department of Physics and Astronomy, University of Victoria (astrowww.phys.uvic.ca). Programming affiliations with the FreeEOS equation-of-state implementation for stellar interiors (freeeos.sf.net); PLplot scientific plotting software package (plplot.org); the libLASi project (unifont.org/lasi); the Loads of Linux Links project (loll.sf.net); and the Linux Brochure Project (lbproject.sf.net). __ Linux-powered Science __ ___ CMake mailing list CMake@cmake.org http://www.cmake.org/mailman/listinfo/cmake