Ditto...

On Wed, Nov 21, 2012 at 01:14PM, Matt Foley wrote:
> Cos,
> Please see in-line.
> 
> On Wed, Nov 21, 2012 at 12:00 PM, Konstantin Boudnik <c...@apache.org> wrote:
> 
> > I like Alejandro's idea about Maven for a few of reasons:
> >   - bringing in a scripting environment which is known for its
> >   inter-version idiosyncrasies just because Windows can't handle trivial
> >   shell scripting looks like an overkill to me
> 
> Excuse me?  Can we at least try not to belittle other people's platforms on
> a public Apache forum?  There's nothing trivial about implementing shell on
> Windows, as cygwin regrettably proved.

Belittle? Hardly ;) Because we all know very well why shell is so awkward to
implement on any Windows system.

> >   - relative to above, there's a chance that Python's pre-requisites used
> >   in Hadoop might get into a conflict with some other components in the
> >   stack.  This will be a nightmare for the integrator projects i.e. Bigtop
> 
> Said Bigtop project actually uses python, does it not?

It does, Matt. The main concern I have is at some point Hadoop's Python might
all of a sudden be of a different version than the one in BigTop. And all the
hell will break lose compatibility wise. What would be the solution then?

> >   - Maven is de-facto standard for Java stacks
> >
> 
> Sure -- except for when Ant was the de-facto standard for Java stacks.  And

Arguable. Yet beyond the point.

> let's remember what maven and ant are/were the de-facto standard for:
>  Doing builds.  Not scripting everything that needs scripting.

Arguable as well, due to the very definition of a build system.

> >   - Maven has built-in scripting language (Groovy) if some plugins aren't
> >     sufficient for achieving whatever goals
> 
> Are you proposing Groovy as a better scripting language than Python?

I am proposing Groovy is a better language than Python. Because, in part, it
goes far beyond scripting. And doesn't have permanent runtime backward
compatibility issues. What was the last time JDK had backward compatibility
problems?

> > Addressing Matt's later point about non-Mavenized Hadoop-1 line: it uses
> > Maven
> > stuff suchs as deploy/install via custom ant tasks. Same approach would
> > work
> > for saveVersion.sh and others, I am sure.
> 
> Current ant scripts in Hadoop seem to use maven only for artifact
> management via the maven repository.  If I'm missing something, please
> point it out.  The ant build task currently calls out to saveVersion.sh.
> Having it call out to maven, which then calls out to a plug-in and/or a
> Groovy script, doesn't sound like an improvement to me.  And it's a way

At least it it guaranteed to work everywhere. And all we need in this case is
an extra jar file that can be pulled down through the same ivy/maven
dependency mechanism.

In case of Python you'd have to make sure that you're having the right version
of the interpreter and runtime. And you will have to do it manually or have an
extra requirement expressed via a system maintenance DSL.

> different use of maven than currently in the Hadoop-1 line, not a
> continuation of established practice.

The main point of my argument expressed in a lesser than 100 words: adding
Python that is inconsistent across different Linux distros and has a history
of backward incompatibilities (2.6 vs 2.5, 3.0 vs earlier, etc.) doesn't seem
to leverage the benefit of having a somewhat easier build in Windows.

Perhaps, we can do a more format benefit analysis by just comparing the
number of Hadoop installations on MS Win vs. Unix's.

Cos

> > On Wed, Nov 21, 2012 at 11:25AM, Alejandro Abdelnur wrote:
> > > Hey Matt,
> > >
> > > We already require java/mvn/protoc/cmake/forrest (forrest is hopefully on
> > > its way out with the move of docs to APT)
> > >
> > > Why not do a maven-plugin to do that?
> > >
> > > Colin already has something to simplify all the cmake calls from the
> > builds
> > > using a maven-plugin (https://issues.apache.org/jira/browse/HADOOP-8887)
> > >
> > > We could do the same with protoc, thus simplifying the POMs.
> > >
> > > The saveVersion.sh seems like another prime candidate for a maven plugin,
> > > and in this case it would not require external tools.
> > >
> > > Does this make sense?
> > >
> > > Thx
> > >
> > > On Wed, Nov 21, 2012 at 11:15 AM, Matt Foley <ma...@apache.org> wrote:
> > >
> > > > This discussion started in
> > > > HADOOP-8924<https://issues.apache.org/jira/browse/HADOOP-8924>
> > > > , where it was proposed to replace the build-time utility
> > "saveVersion.sh"
> > > > with a python script.  This would require Python as a build-time
> > > > dependency.  Here's the background:
> > > >
> > > > Those of us involved in the branch-1-win port of Hadoop to Windows
> > without
> > > > use of Cygwin, have faced the issue of frequent use of shell scripts
> > > > throughout the system, both in build time (eg, the utility
> > > > "saveVersion.sh"),
> > > > and run time (config files like "hadoop-env.sh" and the start/stop
> > scripts
> > > > in "bin/*" ).  Similar usages exist throughout the Hadoop stack, in all
> > > > projects.
> > > >
> > > > The vast majority of these shell scripts do not do anything platform
> > > > specific; they can be expressed in a posix-conforming way.  Therefore,
> > it
> > > > seems to us that it makes sense to start using a cross-platform
> > scripting
> > > > language, such as python, in place of shell for these purposes.  For
> > those
> > > > rare occasions where platform-specific functionality really is needed,
> > > > python also supports quite a lot of platform-specific functionality on
> > both
> > > > Linux and Windows; but where that is inadequate, one could still
> > > > conditionally invoke a platform-specific module written in shell (for
> > > > Linux/*nix) or powershell or bat (for Windows).
> > > >
> > > > The primary motive for moving to a cross-platform scripting language is
> > > > maintainability.  The alternative would be to maintain two complete
> > suites
> > > > of scripts, one for Linux and one for Windows (and perhaps others in
> > the
> > > > future).  We want to avoid the need to update dual modules in two
> > different
> > > > languages when functionality changes, especially given that many Linux
> > > > developers are not familiar with powershell or bat, and many Windows
> > > > developers are not familiar with shell or bash.
> > > >
> > > > Regarding the choice of python:
> > > >
> > > >    - There are already a few instances of python usage in Hadoop, such
> > as
> > > >    the utility (currently broken) "relnotes.py", and massive usage of
> > > > python
> > > >    in the examples/ and contrib/ directories.
> > > >    - Python is also used in Bigtop build-time.
> > > >    - The Python language is available for free on essentially all
> > > >    platforms, under an Apache-compatible
> > > > license<http://www.apache.org/legal/resolved.html>.
> > > >
> > > >    - It is supported in Eclipse and similar IDEs.
> > > >    - Most importantly, it is widely accepted as a reasonably good OO
> > > >    scripting language, and it is easily learned by anyone who already
> > knows
> > > >    shell or perl, or other common scripting languages.
> > > >    - On the Tiobe index of programming language
> > > > popularity<
> > > > http://www.tiobe.com/index.php/content/paperinfo/tpci/index.html>,
> > > >    which seeks to measure the relative number of software engineers who
> > > > know
> > > >    and use each language, Python far exceeds Perl and Ruby.  The only
> > more
> > > >    well-known scripting languages are PHP and Visual Basic, neither of
> > > > which
> > > >    seems a prime candidate for this use.
> > > >
> > > > For build-time usage, I think we should immediately approve python as a
> > > > build-time dependency, and allow people who are motivated to do so, to
> > open
> > > > jiras for migrating existing build-time shell scripts to python.
> > > >
> > > > For run-time, there is likely to be a lot more discussion.  Lots of
> > folks,
> > > > including me, aren't real happy with use of active scripts for
> > > > configuration, and various others, including I believe some of the
> > Bigtop
> > > > folks, have issues with the way the start/stop scripts work.
> >  Nevertheless,
> > > > all those scripts exist today and are widely used.  And they present an
> > > > impediment to porting to Windows-without-cygwin.
> > > >
> > > > Nothing about run-time use of scripts has changed significantly over
> > the
> > > > past three years, and I don't think we should hold up the Windows port
> > > > while we have a huge discussion about issues that veer dangerously into
> > > > religious/aesthetic domains. It would be fun to have that discussion,
> > but I
> > > > don't want this decision to be dependent on it!
> > > >
> > > > So I propose that we go ahead and also approve python as a run-time
> > > > dependency, and allow the inclusion of python scripts in place of
> > current
> > > > shell-based functionality.  The unpleasant alternative is to spawn a
> > bunch
> > > > of powershell scripts in parallel to the current shell scripts, with a
> > very
> > > > negative impact on maintainability.  The Windows port must, after all,
> > be
> > > > allowed to proceed.
> > > >
> > > > Let's have a discussion, and then I'll put both issues, separately, to
> > a
> > > > vote (unless we miraculously achieve consensus without a vote :-)
> > > >
> > > > I also encourage members of the other Hadoop-related projects, to carry
> > > > this discussion into those forums.  It would be very cool to agree on a
> > > > whole-stack solution for the scripting problem.
> > > >
> > > > Best regards,
> > > > --Matt
> > > >
> > >
> > >
> > >
> > > --
> > > Alejandro
> >

Attachment: signature.asc
Description: Digital signature

Reply via email to