Hey Matt, We already require java/mvn/protoc/cmake/forrest (forrest is hopefully on its way out with the move of docs to APT)
Why not do a maven-plugin to do that? Colin already has something to simplify all the cmake calls from the builds using a maven-plugin (https://issues.apache.org/jira/browse/HADOOP-8887) We could do the same with protoc, thus simplifying the POMs. The saveVersion.sh seems like another prime candidate for a maven plugin, and in this case it would not require external tools. Does this make sense? Thx On Wed, Nov 21, 2012 at 11:15 AM, Matt Foley <ma...@apache.org> wrote: > This discussion started in > HADOOP-8924<https://issues.apache.org/jira/browse/HADOOP-8924> > , where it was proposed to replace the build-time utility "saveVersion.sh" > with a python script. This would require Python as a build-time > dependency. Here's the background: > > Those of us involved in the branch-1-win port of Hadoop to Windows without > use of Cygwin, have faced the issue of frequent use of shell scripts > throughout the system, both in build time (eg, the utility > "saveVersion.sh"), > and run time (config files like "hadoop-env.sh" and the start/stop scripts > in "bin/*" ). Similar usages exist throughout the Hadoop stack, in all > projects. > > The vast majority of these shell scripts do not do anything platform > specific; they can be expressed in a posix-conforming way. Therefore, it > seems to us that it makes sense to start using a cross-platform scripting > language, such as python, in place of shell for these purposes. For those > rare occasions where platform-specific functionality really is needed, > python also supports quite a lot of platform-specific functionality on both > Linux and Windows; but where that is inadequate, one could still > conditionally invoke a platform-specific module written in shell (for > Linux/*nix) or powershell or bat (for Windows). > > The primary motive for moving to a cross-platform scripting language is > maintainability. The alternative would be to maintain two complete suites > of scripts, one for Linux and one for Windows (and perhaps others in the > future). We want to avoid the need to update dual modules in two different > languages when functionality changes, especially given that many Linux > developers are not familiar with powershell or bat, and many Windows > developers are not familiar with shell or bash. > > Regarding the choice of python: > > - There are already a few instances of python usage in Hadoop, such as > the utility (currently broken) "relnotes.py", and massive usage of > python > in the examples/ and contrib/ directories. > - Python is also used in Bigtop build-time. > - The Python language is available for free on essentially all > platforms, under an Apache-compatible > license<http://www.apache.org/legal/resolved.html>. > > - It is supported in Eclipse and similar IDEs. > - Most importantly, it is widely accepted as a reasonably good OO > scripting language, and it is easily learned by anyone who already knows > shell or perl, or other common scripting languages. > - On the Tiobe index of programming language > popularity< > http://www.tiobe.com/index.php/content/paperinfo/tpci/index.html>, > which seeks to measure the relative number of software engineers who > know > and use each language, Python far exceeds Perl and Ruby. The only more > well-known scripting languages are PHP and Visual Basic, neither of > which > seems a prime candidate for this use. > > For build-time usage, I think we should immediately approve python as a > build-time dependency, and allow people who are motivated to do so, to open > jiras for migrating existing build-time shell scripts to python. > > For run-time, there is likely to be a lot more discussion. Lots of folks, > including me, aren't real happy with use of active scripts for > configuration, and various others, including I believe some of the Bigtop > folks, have issues with the way the start/stop scripts work. Nevertheless, > all those scripts exist today and are widely used. And they present an > impediment to porting to Windows-without-cygwin. > > Nothing about run-time use of scripts has changed significantly over the > past three years, and I don't think we should hold up the Windows port > while we have a huge discussion about issues that veer dangerously into > religious/aesthetic domains. It would be fun to have that discussion, but I > don't want this decision to be dependent on it! > > So I propose that we go ahead and also approve python as a run-time > dependency, and allow the inclusion of python scripts in place of current > shell-based functionality. The unpleasant alternative is to spawn a bunch > of powershell scripts in parallel to the current shell scripts, with a very > negative impact on maintainability. The Windows port must, after all, be > allowed to proceed. > > Let's have a discussion, and then I'll put both issues, separately, to a > vote (unless we miraculously achieve consensus without a vote :-) > > I also encourage members of the other Hadoop-related projects, to carry > this discussion into those forums. It would be very cool to agree on a > whole-stack solution for the scripting problem. > > Best regards, > --Matt > -- Alejandro