Today, Spark's CLI is spread across 20+ scripts in bin/ and sbin/. This 
fragmented interface makes discovery more difficult for users and maintenance 
more difficult for contributors.

I would like to propose adding a new command line tool named spark that unifies 
Spark's command line interface. It would be a pure Python script that sticks to 
the standard library and acts simply (at least at first) as a dispatcher. It 
would offer users a single entry point with a clear --help that makes it easy 
to discover and understand what's available.

The commands I would like to implement are as follows:

spark
    connect
        start
        status
        stop
    history
        start
        status
        stop
    master
        start
        status
        stop
    pipelines
    python
    scala
    sql
    submit
    thrift
        start
        status
        stop
    worker[s]
        decommission
        run
        start
        status
        stop
These commands all dispatch to existing scripts. The unified Python CLI would 
only implement sub-commands itself when the underlying script does not already 
offer its own handling of CLI arguments. For example, pipelines accepts various 
commands, but those are handled already by its own CLI so the unified CLI 
doesn't need to implement them. connect, on the other hand, implements start 
and stop as scripts, so we need to implement those in our unified CLI and route 
the arguments to the appropriate script.

There are several things still to hash out, like:

Packaging of this unified CLI so it's placed on the PATH correctly.
The exact naming and structure of the commands.
Whether we eventually want to fold any Bash scripts directly into this new 
Python CLI.
But before we get into that, I first want to see if this idea is attractive to 
the community.

Shall I flesh this out into a ticket and publish a working prototype, or does 
the idea have some critical flaw?

Nick

Reply via email to