+1, +1, +1 (non-binding)
Supporting Comments:
Build-time scripts: Using a platform independent language such as python (or
maven in certain cases) will greatly help in reducing build breaks and improve
on build script maintainability.
Run-time scripts: Most run-time scripts are end-user visible and are scripts
that are needed to be run by admin such as starting/stop Hadoop cluster
(hadoop-daemons) or by developers submitting a job (hadoop.cmd). There seem to
be two types of script files:
- Scripts intended for a cluster admin or an IT admin:
- It is desirable to use a common set of python scripts that work
across all platforms. However, in a Windows enterprise environment IT admins
won't like it if they have to run python scripts to start/stop a cluster. So
for these, there should be a PowerShell interface wrapper that can accept the
right parameters and pass it down to the python script. Hopefully, the
power-shell layer can be a simple pass-thru. This way the python scripts is
like any other Java code hidden behind a well-known API surface. IT Admins
can't debug it or modify it easily, but this is fine since for scripts like the
aforementioned there isn't a requirement that IT Admins should be able to
easily be able to view/modify the underlying code.
- For Windows specific things not supported by Python natively, such as
setting ACLs, starting/stopping windows services it should be possible to
re-factor the code appropriately. But a little bit of powershell/cmd for these
call outs would be unavoidable.
- Scripts intended for developers/cluster users:
- Most of these scripts (e.g. hadoop.cmd) would be behind other API
surface such as WebHDFS, ODBC, JDBC, Templeton etc. So the advantage of having
a common script across platforms outweighs the use of cmd/powershell as a
native windows feature. Again, it should also be possible to provide simple
powershell wrappers for a windows environment.
Thanks, Mahadevan.
-----Original Message-----
From: Ivan Mitic [mailto:[email protected]]
Sent: Thursday, November 29, 2012 3:41 PM
To: [email protected]; [email protected]
Subject: RE: [VOTE] introduce Python as build-time and run-time dependency for
Hadoop and throughout Hadoop stack
+1, +1, +1 (some comments inline)
-----Original Message-----
From: [email protected] [mailto:[email protected]] On Behalf Of Matt
Foley
Sent: Saturday, November 24, 2012 12:13 PM
To: [email protected]
Subject: [VOTE] introduce Python as build-time and run-time dependency for
Hadoop and throughout Hadoop stack
For discussion, please see previous thread "[PROPOSAL] introduce Python as
build-time and run-time dependency for Hadoop and throughout Hadoop stack".
This vote consists of three separate items:
1. Contributors shall be allowed to use Python as a platform-independent
scripting language for build-time tasks, and add Python as a build-time
dependency.
Please vote +1, 0, -1.
2. Contributors shall be encouraged to use Maven tasks in combination with
either plug-ins or Groovy scripts to do cross-platform build-time tasks, even
under ant in Hadoop-1.
Please vote +1, 0, -1.
>>> I believe 1&2 in combination make a total sense. I ported a few scripts to
>>> Python, and thus far, it showed to be up to the task and satisfy the
>>> cross-platform requirements. In my option, it is also important to agree on
>>> the version, as I've run into some breaking changes in version 3+.
3. Contributors shall be allowed to use Python as a platform-independent
scripting language for run-time tasks, and add Python as a run-time dependency.
>>> This is a great aspirational goal! Maintaining two sets of scripts would be
>>> a real challenge.
Please vote +1, 0, -1.
Note that voting -1 on #1 and +1 on #2 essentially REQUIRES contributors to use
Maven plug-ins or Groovy as the only means of cross-platform build-time tasks,
or to simply continue using platform-dependent scripts as is being done today.
Vote closes at 12:30pm PST on Saturday 1 December.
---------
Personally, my vote is +1, +1, +1.
I think #2 is preferable to #1, but still has many unknowns in it, and until
those are worked out I don't want to delay moving to cross-platform scripts for
build-time tasks.
Best regards,
--Matt