Re: [DISCUSS] Update Roadmap

Amos Elberg Wed, 06 Apr 2016 19:33:07 -0700

A few suggestions for the roadmap:

1. Increase unit test coverage.  I suggest we set thresholds -- say, 70% for 
0.6, 85% for 0.7, and aim for 95% before 1.0.


2. Language support.  Right now, interpreters essentially have to be written 
in Java, or at least have java wrappers.  This is because the current design 
has each interpreter class call a `static class` method when the class is 
loaded, to register the Interpreter with zeppelin.  In the long term, using 
static class methods will inevitably be a source of architectural problems.  
(People have been saying that the feature should be removed entirely from Java 
since 1998.)  In the short term, if we fix this, then it would be easy for 
people to write interpreters in other jvm languages, such as Scala, Clojure, 
Python (by Jython), Elixir (by whatever the Elxir jvm converter is called), 
Groovy, etc.  

3. Remove Spark-under-zeppelin-home.  Many, many, many of our issues, 
including many CI issues, trace back to the old system of installing Spark 
under Zeppelin-home.  This is essentially a legacy thing from when Zeppelin 
was a PR submitted as an add-on to Spark.  Right now, it doesn't buy us 
anything -- but it does complicate the build process, create dependency 
conflicts, and lead to user support issues.  

I suggest we deprecate this ASAP, and remove it entirely before 0.7, or 0.8 at 
the latest.  

4. Drop support for Spark before 1.3, or better yet before 1.4.  Jeff 
Steinmetz suggested this the other day.  It would simplify CI and the build 
process, as well as maintenance as Spark heads toward 2.0.  I can't imagine 
more than a tiny number of people who use zeppelin are using it with Spark 
1.2, or even 1.3. 

5.  Reform the configuration system.  Right now, Zeppelin configuration is set 
in:  
        - ZeppelinConfiguration.java (developers must edit)
        - The xml configuration (administrator must edit)
        - The env configuration file (administrator must edit)
        - Multiple json files such as interpreter.json (edited through the 
interface)

The result is kind of a mish-mash, and it creates user support issues when 
people enter conflicting configurations or configurations in the wrong place.

It's also a developer issue because we haven't defined what takes precedence 
over what.  

I suggest we introduce a part of the architecture which acts as an arbitrator 
for all configuraiton issues -- when any class needs to access or change 
configuration, it can go through one place.  Then we can figure out how we 
want to present configuration to the users. 

6.  Disable most interpreters other than Spark-related (and MD) by default.   
At this point, we've proliferated so many interpreters, that it complicates 
the build cycle and, well, just isn't necessary.

On Monday, February 29, 2016 8:04:09 AM EDT Prasad Wagle wrote:
> This is a great list.
> 
> In the enterprise ready section, what do you think about adding "High
> Availability and Disaster Recovery"? We can start with updating the
> documentation with best practices and scripts for a cold standby solution
> and work towards active-active
> <https://www.ibm.com/developerworks/community/blogs/RohitShetty/entry/high_a
> vailability_cold_warm_hot?lang=en> solution.
> 
> Another suggestion is to store meta-data for notes like creator, last
> updated (time and user) and number of views. We can show this information
> in the top level page in a table format with ability to sort by any column.
> 
> On Mon, Feb 29, 2016 at 7:15 AM, Benjamin Kim <[email protected]> wrote:
> > I concur with this suggestion. In the enterprise, management would like to
> > see scheduled runs to be tracked, monitored, and given SLA constraints for
> > the mission critical. Alerts and notifications are crucial for DevOps to
> > respond with error clarification within it. If the Zeppelin notebooks can
> > be executed by a third party scheduling application, such as Oozie, then
> > this requirement can be satisfied if there are no immediate plans for a
> > built-in one.
> > 
> > On Feb 29, 2016, at 1:17 AM, Eran Witkon <[email protected]> wrote:
> > 
> > @Vinayak Agrawal I would suggest adding the ability to connect zeppelin
> > to existing scheduling tools\workflow tools such as
> > https://oozie.apache.org/. this requires betters hooks and status
> > reporting but doesn't make zeppeling and ETL\scheduler tool by itself/
> > 
> > 
> > On Mon, Feb 29, 2016 at 10:21 AM Vinayak Agrawal <
> > 
> > [email protected]> wrote:
> >> Moon,
> >> The new roadmap looks very promising. I am very happy to see security in
> >> the list.
> >> I have some suggestions regarding Enterprise Ready features:
> >> 
> >> 1. Job Scheduler - Can this be improved?
> >> Currently the scheduler can be used with Cron expression or a pre-set
> >> time. But in an enterprise solution, a notebook might be one piece of the
> >> workflow. Can we look towards the functionality of scheduling notebook's
> >> based on other notebooks finishing their job successfully?
> >> This requirement would arise in any ETL workflow, where all the
> >> downstream users wait for the ETL notebook to finish successfully. Only
> >> after that, other business oriented notebooks can be executed.
> >> 
> >> 2. Importing a notebook - Is there a current requirement or future plan
> >> to implement a feature that allows import-notebook-from-github? This
> >> would
> >> allow users to share notebooks seamlessly.
> >> 
> >> Thanks
> >> Vinayak
> >> 
> >> On Sun, Feb 28, 2016 at 11:22 PM, moon soo Lee <[email protected]> wrote:
> >>> Zhong Wang,
> >>> Right, Folder support would be quite useful. Thanks for the opinion.
> >> 
> >> Hope i can finish the work pr-190
> >> 
> >>> <https://github.com/apache/incubator-zeppelin/pull/190>.
> >>> 
> >>> 
> >>> Sourav,
> >>> Regarding concurrent running, Zeppelin doesn't have limitation of run
> >>> paragraph/query concurrently. Interpreter can implement it's own
> >>> scheduling
> >>> policy. For example, SparkSQL interpreter and ShellInterpreter can
> >>> already
> >>> run paragraph/query concurrently.
> >>> 
> >>> SparkInterpreter is implemented with FIFO scheduler considering nature
> >>> of scala compiler. That's why user can not run multiple paragraph
> >>> concurrently when they work with SparkInterpreter.
> >>> But as Zhong Wang mentioned, pr-703 enables each notebook will have
> >>> separate scala compiler so paragraphs run concurrently, while they're in
> >>> different notebooks.
> >>> Thanks for the feedback!
> >>> 
> >>> Best,
> >>> moon
> >> 
> >> On Sat, Feb 27, 2016 at 8:59 PM Zhong Wang <[email protected]>
> >> 
> >>> wrote:
> >> Sourav: I think this newly merged PR can help you
> >> 
> >>>> https://github.com/apache/incubator-zeppelin/pull/703#issuecomment-1855
> >>>> 82537
> >>>> 
> >>>> On Sat, Feb 27, 2016 at 1:46 PM, Sourav Mazumder <
> >>> 
> >>>> [email protected]> wrote:
> >>> Hi Moon,
> >>> 
> >>>>> This looks great.
> >>>>> 
> >>>>> My only suggestion would be to include a PR/feature - Support for
> >>>>> Running Concurrent paragraphs/queries in Zeppelin.
> >>>>> 
> >>>>> Right now if more than one user tries to run paragraphs in multiple
> >>>>> notebooks concurrently through a single Zeppelin instance (and single
> >>>>> interpreter instance) the performance is very slow. It is obvious that
> >>>>> the
> >>>>> queue gets built up within the zeppelin process and interpreter
> >>>>> process in
> >>>>> that scenario as the time taken to move the status from start to
> >>>>> pending
> >>>>> and pending to running is very high compared to the actual running
> >>>>> time of
> >>>>> a paragraph.
> >>>>> 
> >>>>> Without this the multi tenancy support would be meaningless as no one
> >>>>> can practically use it in a situation where multiple users are trying
> >>>>> to
> >>>>> connect to the same instance of Zeppelin (and the related
> >>>>> interpreter). A
> >>>>> possible solution would be to spawn separate instance of the same
> >>>>> interpreter at every notebook/user level.
> >>>>> 
> >>>>> Regards,
> >>>>> Sourav
> >>>> 
> >>>> On Sat, Feb 27, 2016 at 12:48 PM, moon soo Lee <[email protected]> wrote:
> >>>> 
> >>>> Hi Zeppelin users and developers,
> >>>> 
> >>>>>> The roadmap we have published at
> >>>>>> https://cwiki.apache.org/confluence/display/ZEPPELIN/Zeppelin+Roadmap
> >>>>>> is almost 9 month old, and it doesn't reflect where the community
> >>>>>> goes anymore. It's time to update.
> >>>>>> 
> >>>>>> Based on mailing list, jira issues, pullrequests, feedbacks from
> >>>>>> users, conferences and meetings, I could summarize the major interest
> >>>>>> of
> >>>>>> users and developers in 7 categories. Enterprise ready, Usability
> >>>>>> improvement, Pluggability, Documentation, Backend integration,
> >>>>>> Notebook
> >>>>>> storage, and Visualization.
> >>>>>> 
> >>>>>> And i could list related subjects under each categories.
> >>>>>> 
> >>>>>>    - Enterprise ready
> >>>>>>    
> >>>>>>       - Authentication
> >>>>>>       
> >>>>>>          - Shiro authentication ZEPPELIN-548
> >>>>>>          <https://issues.apache.org/jira/browse/ZEPPELIN-548>
> >>>>>>       
> >>>>>>       - Authorization
> >>>>>>       
> >>>>>>          - Notebook authorization PR-681
> >>>>>>          <https://github.com/apache/incubator-zeppelin/pull/681>
> >>>>>>       
> >>>>>>       - Security
> >>>>>>       - Multi-tenancy
> >>>>>>       - Stability
> >>>>>>    
> >>>>>>    - Usability Improvement
> >>>>>>    
> >>>>>>    
> >>>>>>    - UX improvement
> >>>>>>    
> >>>>>>       - Better Table data support
> >>>>>>    
> >>>>>>    - Download data as csv, etc PR-725
> >>>>>>    
> >>>>>>          <https://github.com/apache/incubator-zeppelin/pull/725>,
> >>>>>>          PR-714
> >>>>>>          <https://github.com/apache/incubator-zeppelin/pull/714>,
> >>>>>>          PR-6 <https://github.com/apache/incubator-zeppelin/pull/6>,
> >>>>>>          PR-89 <https://github.com/apache/incubator-zeppelin/pull/89>
> >>>>>>    
> >>>>>>    - Featureful table data display (pagenation, etc)
> >>>>>>    
> >>>>>>    
> >>>>>>    - Pluggability ZEPPELIN-533
> >>>>>>    <https://issues.apache.org/jira/browse/ZEPPELIN-533>
> >>>>>>    
> >>>>>>       - Pluggable visualization
> >>>>>>    
> >>>>>>    - Dynamic Interpreter, notebook, visualization loading
> >>>>>>    
> >>>>>>    
> >>>>>>    - Repository and registry for pluggable components
> >>>>>>    
> >>>>>>    
> >>>>>>    - Improve documentation
> >>>>>>    
> >>>>>>       - Improve contents and readability
> >>>>>>       - more tutorials, examples
> >>>>>>    
> >>>>>>    - Interpreter
> >>>>>>    
> >>>>>>       - Generic JDBC Interpreter
> >>>>>>       - (spark)R Interpreter
> >>>>>>       - Cluster manager for interpreter (Proposal
> >>>>>>       <https://cwiki.apache.org/confluence/display/ZEPPELIN/Cluster+M
> >>>>>>       anager+Proposal> )
> >>>>>>       - more interpreters
> >>>>>>    
> >>>>>>    - Notebook storage
> >>>>>>    
> >>>>>>       - Versioning ZEPPELIN-540
> >>>>>>       <http://issues.apache.org/jira/browse/ZEPPELIN-540>
> >>>>>>       - more notebook storages
> >>>>>>    
> >>>>>>    - Visualization
> >>>>>>    
> >>>>>>    
> >>>>>>    - More visualizations PR-152
> >>>>>>    
> >>>>>>       <https://github.com/apache/incubator-zeppelin/pull/152>, PR-728
> >>>>>>       <https://github.com/apache/incubator-zeppelin/pull/728>, PR-336
> >>>>>>       <https://github.com/apache/incubator-zeppelin/pull/336>, PR-321
> >>>>>>       <https://github.com/apache/incubator-zeppelin/pull/321>
> >>>>>>    
> >>>>>>    - Customize graph (show/hide label, color, etc)
> >>>>>> 
> >>>>>> It will help anyone quickly get overall interest of project and the
> >>>>>> direction. And based on this roadmap, we can discuss and re-define
> >>>>>> the next
> >>>>>> release 0.6.0 scope and it's schedule.
> >>>>>> 
> >>>>>> What do you think? Any feedback would be appreciated.
> >>>>>> 
> >>>>>> Thanks,
> >>>>>> moon
> >> 
> >> --
> >> Vinayak Agrawal
> >> 
> >> 
> >> "To Strive, To Seek, To Find and Not to Yield!"
> >> ~Lord Alfred Tennyson

Re: [DISCUSS] Update Roadmap

Reply via email to