Re: [DISCUSS] Update Roadmap

Jeff Steinmetz Wed, 06 Apr 2016 22:50:49 -0700

Comments:

Regarding #2: Language support.  It would be great to see more Scala (once up 
to speed with Scala I never wanted to look back at Java)
Regarding #3: Drop old SPARK support.  Seems like low hanging fruit, low impact 
& high reward.
Regarding #5: Configuration Files. We could take a queue from other great open 
source (Apache license) projects, like ElasticSearch, and migrate to .yml files 
instead of verbose XML files and leave Environment variables for per-machine 
settings & global settings related to the java runtime, JVM memory configs and 
directories paths such as [FOO]_HOME.
An alternative to .yml is HOCON.  The Play Framework and Spark Job Server make 
use of easy to read HOCON style files, which is a a JSON superset.
https://github.com/typesafehub/config/blob/master/HOCON.md


Typesafe licenses their entire config library under the Apache library, and 
uses plain Java with no dependencies:
https://github.com/typesafehub/config


Regarding #6: Excluding the more esoteric interpreters by default seems 
reasonable

Addition:  Create a common installer that also bundles a service manager 
upstart script for Debian or CentOS (not sure about Windows).  Install via 
Debian package with a simple `dpkg -i` command.
Addition:  Build tools,  Does anybody have history with Gradle?  Is a Switch 
from Maven to Gradle worth it - I admit I am not an XML fan and realize this is 
not a simple task.  Gradle may make it easier to organize the builds if 
interpreters ever became plugins.  Each plugin could have its own build.gradle 
file

"Improve documentation” is always a big yes.


Regards,
Jeff Steinmetz








On 4/6/16, 7:32 PM, "Amos Elberg" <amos.elb...@gmail.com> wrote:

>A few suggestions for the roadmap:
>
>1. Increase unit test coverage.  I suggest we set thresholds -- say, 70% for 
>0.6, 85% for 0.7, and aim for 95% before 1.0.
>
>2. Language support.  Right now, interpreters essentially have to be written 
>in Java, or at least have java wrappers.  This is because the current design 
>has each interpreter class call a `static class` method when the class is 
>loaded, to register the Interpreter with zeppelin.  In the long term, using 
>static class methods will inevitably be a source of architectural problems.  
>(People have been saying that the feature should be removed entirely from Java 
>since 1998.)  In the short term, if we fix this, then it would be easy for 
>people to write interpreters in other jvm languages, such as Scala, Clojure, 
>Python (by Jython), Elixir (by whatever the Elxir jvm converter is called), 
>Groovy, etc.  
>
>3. Remove Spark-under-zeppelin-home.  Many, many, many of our issues, 
>including many CI issues, trace back to the old system of installing Spark 
>under Zeppelin-home.  This is essentially a legacy thing from when Zeppelin 
>was a PR submitted as an add-on to Spark.  Right now, it doesn't buy us 
>anything -- but it does complicate the build process, create dependency 
>conflicts, and lead to user support issues.  
>
>I suggest we deprecate this ASAP, and remove it entirely before 0.7, or 0.8 at 
>the latest.  
>
>4. Drop support for Spark before 1.3, or better yet before 1.4.  Jeff 
>Steinmetz suggested this the other day.  It would simplify CI and the build 
>process, as well as maintenance as Spark heads toward 2.0.  I can't imagine 
>more than a tiny number of people who use zeppelin are using it with Spark 
>1.2, or even 1.3. 
>
>5.  Reform the configuration system.  Right now, Zeppelin configuration is set 
>in:  
>       - ZeppelinConfiguration.java (developers must edit)
>       - The xml configuration (administrator must edit)
>       - The env configuration file (administrator must edit)
>       - Multiple json files such as interpreter.json (edited through the 
>interface)
>
>The result is kind of a mish-mash, and it creates user support issues when 
>people enter conflicting configurations or configurations in the wrong place.
>
>It's also a developer issue because we haven't defined what takes precedence 
>over what.  
>
>I suggest we introduce a part of the architecture which acts as an arbitrator 
>for all configuraiton issues -- when any class needs to access or change 
>configuration, it can go through one place.  Then we can figure out how we 
>want to present configuration to the users. 
>
>6.  Disable most interpreters other than Spark-related (and MD) by default.   
>At this point, we've proliferated so many interpreters, that it complicates 
>the build cycle and, well, just isn't necessary.
>
>On Monday, February 29, 2016 8:04:09 AM EDT Prasad Wagle wrote:
>> This is a great list.
>> 
>> In the enterprise ready section, what do you think about adding "High
>> Availability and Disaster Recovery"? We can start with updating the
>> documentation with best practices and scripts for a cold standby solution
>> and work towards active-active
>> <https://www.ibm.com/developerworks/community/blogs/RohitShetty/entry/high_a
>> vailability_cold_warm_hot?lang=en> solution.
>> 
>> Another suggestion is to store meta-data for notes like creator, last
>> updated (time and user) and number of views. We can show this information
>> in the top level page in a table format with ability to sort by any column.
>> 
>> On Mon, Feb 29, 2016 at 7:15 AM, Benjamin Kim <bbuil...@gmail.com> wrote:
>> > I concur with this suggestion. In the enterprise, management would like to
>> > see scheduled runs to be tracked, monitored, and given SLA constraints for
>> > the mission critical. Alerts and notifications are crucial for DevOps to
>> > respond with error clarification within it. If the Zeppelin notebooks can
>> > be executed by a third party scheduling application, such as Oozie, then
>> > this requirement can be satisfied if there are no immediate plans for a
>> > built-in one.
>> > 
>> > On Feb 29, 2016, at 1:17 AM, Eran Witkon <eranwit...@gmail.com> wrote:
>> > 
>> > @Vinayak Agrawal I would suggest adding the ability to connect zeppelin
>> > to existing scheduling tools\workflow tools such as
>> > https://oozie.apache.org/. this requires betters hooks and status
>> > reporting but doesn't make zeppeling and ETL\scheduler tool by itself/
>> > 
>> > 
>> > On Mon, Feb 29, 2016 at 10:21 AM Vinayak Agrawal <
>> > 
>> > vinayakagrawa...@gmail.com> wrote:
>> >> Moon,
>> >> The new roadmap looks very promising. I am very happy to see security in
>> >> the list.
>> >> I have some suggestions regarding Enterprise Ready features:
>> >> 
>> >> 1. Job Scheduler - Can this be improved?
>> >> Currently the scheduler can be used with Cron expression or a pre-set
>> >> time. But in an enterprise solution, a notebook might be one piece of the
>> >> workflow. Can we look towards the functionality of scheduling notebook's
>> >> based on other notebooks finishing their job successfully?
>> >> This requirement would arise in any ETL workflow, where all the
>> >> downstream users wait for the ETL notebook to finish successfully. Only
>> >> after that, other business oriented notebooks can be executed.
>> >> 
>> >> 2. Importing a notebook - Is there a current requirement or future plan
>> >> to implement a feature that allows import-notebook-from-github? This
>> >> would
>> >> allow users to share notebooks seamlessly.
>> >> 
>> >> Thanks
>> >> Vinayak
>> >> 
>> >> On Sun, Feb 28, 2016 at 11:22 PM, moon soo Lee <m...@apache.org> wrote:
>> >>> Zhong Wang,
>> >>> Right, Folder support would be quite useful. Thanks for the opinion.
>> >> 
>> >> Hope i can finish the work pr-190
>> >> 
>> >>> <https://github.com/apache/incubator-zeppelin/pull/190>.
>> >>> 
>> >>> 
>> >>> Sourav,
>> >>> Regarding concurrent running, Zeppelin doesn't have limitation of run
>> >>> paragraph/query concurrently. Interpreter can implement it's own
>> >>> scheduling
>> >>> policy. For example, SparkSQL interpreter and ShellInterpreter can
>> >>> already
>> >>> run paragraph/query concurrently.
>> >>> 
>> >>> SparkInterpreter is implemented with FIFO scheduler considering nature
>> >>> of scala compiler. That's why user can not run multiple paragraph
>> >>> concurrently when they work with SparkInterpreter.
>> >>> But as Zhong Wang mentioned, pr-703 enables each notebook will have
>> >>> separate scala compiler so paragraphs run concurrently, while they're in
>> >>> different notebooks.
>> >>> Thanks for the feedback!
>> >>> 
>> >>> Best,
>> >>> moon
>> >> 
>> >> On Sat, Feb 27, 2016 at 8:59 PM Zhong Wang <wangzhong....@gmail.com>
>> >> 
>> >>> wrote:
>> >> Sourav: I think this newly merged PR can help you
>> >> 
>> >>>> https://github.com/apache/incubator-zeppelin/pull/703#issuecomment-1855
>> >>>> 82537
>> >>>> 
>> >>>> On Sat, Feb 27, 2016 at 1:46 PM, Sourav Mazumder <
>> >>> 
>> >>>> sourav.mazumde...@gmail.com> wrote:
>> >>> Hi Moon,
>> >>> 
>> >>>>> This looks great.
>> >>>>> 
>> >>>>> My only suggestion would be to include a PR/feature - Support for
>> >>>>> Running Concurrent paragraphs/queries in Zeppelin.
>> >>>>> 
>> >>>>> Right now if more than one user tries to run paragraphs in multiple
>> >>>>> notebooks concurrently through a single Zeppelin instance (and single
>> >>>>> interpreter instance) the performance is very slow. It is obvious that
>> >>>>> the
>> >>>>> queue gets built up within the zeppelin process and interpreter
>> >>>>> process in
>> >>>>> that scenario as the time taken to move the status from start to
>> >>>>> pending
>> >>>>> and pending to running is very high compared to the actual running
>> >>>>> time of
>> >>>>> a paragraph.
>> >>>>> 
>> >>>>> Without this the multi tenancy support would be meaningless as no one
>> >>>>> can practically use it in a situation where multiple users are trying
>> >>>>> to
>> >>>>> connect to the same instance of Zeppelin (and the related
>> >>>>> interpreter). A
>> >>>>> possible solution would be to spawn separate instance of the same
>> >>>>> interpreter at every notebook/user level.
>> >>>>> 
>> >>>>> Regards,
>> >>>>> Sourav
>> >>>> 
>> >>>> On Sat, Feb 27, 2016 at 12:48 PM, moon soo Lee <m...@apache.org> wrote:
>> >>>> 
>> >>>> Hi Zeppelin users and developers,
>> >>>> 
>> >>>>>> The roadmap we have published at
>> >>>>>> https://cwiki.apache.org/confluence/display/ZEPPELIN/Zeppelin+Roadmap
>> >>>>>> is almost 9 month old, and it doesn't reflect where the community
>> >>>>>> goes anymore. It's time to update.
>> >>>>>> 
>> >>>>>> Based on mailing list, jira issues, pullrequests, feedbacks from
>> >>>>>> users, conferences and meetings, I could summarize the major interest
>> >>>>>> of
>> >>>>>> users and developers in 7 categories. Enterprise ready, Usability
>> >>>>>> improvement, Pluggability, Documentation, Backend integration,
>> >>>>>> Notebook
>> >>>>>> storage, and Visualization.
>> >>>>>> 
>> >>>>>> And i could list related subjects under each categories.
>> >>>>>> 
>> >>>>>>    - Enterprise ready
>> >>>>>>    
>> >>>>>>       - Authentication
>> >>>>>>       
>> >>>>>>          - Shiro authentication ZEPPELIN-548
>> >>>>>>          <https://issues.apache.org/jira/browse/ZEPPELIN-548>
>> >>>>>>       
>> >>>>>>       - Authorization
>> >>>>>>       
>> >>>>>>          - Notebook authorization PR-681
>> >>>>>>          <https://github.com/apache/incubator-zeppelin/pull/681>
>> >>>>>>       
>> >>>>>>       - Security
>> >>>>>>       - Multi-tenancy
>> >>>>>>       - Stability
>> >>>>>>    
>> >>>>>>    - Usability Improvement
>> >>>>>>    
>> >>>>>>    
>> >>>>>>    - UX improvement
>> >>>>>>    
>> >>>>>>       - Better Table data support
>> >>>>>>    
>> >>>>>>    - Download data as csv, etc PR-725
>> >>>>>>    
>> >>>>>>          <https://github.com/apache/incubator-zeppelin/pull/725>,
>> >>>>>>          PR-714
>> >>>>>>          <https://github.com/apache/incubator-zeppelin/pull/714>,
>> >>>>>>          PR-6 <https://github.com/apache/incubator-zeppelin/pull/6>,
>> >>>>>>          PR-89 <https://github.com/apache/incubator-zeppelin/pull/89>
>> >>>>>>    
>> >>>>>>    - Featureful table data display (pagenation, etc)
>> >>>>>>    
>> >>>>>>    
>> >>>>>>    - Pluggability ZEPPELIN-533
>> >>>>>>    <https://issues.apache.org/jira/browse/ZEPPELIN-533>
>> >>>>>>    
>> >>>>>>       - Pluggable visualization
>> >>>>>>    
>> >>>>>>    - Dynamic Interpreter, notebook, visualization loading
>> >>>>>>    
>> >>>>>>    
>> >>>>>>    - Repository and registry for pluggable components
>> >>>>>>    
>> >>>>>>    
>> >>>>>>    - Improve documentation
>> >>>>>>    
>> >>>>>>       - Improve contents and readability
>> >>>>>>       - more tutorials, examples
>> >>>>>>    
>> >>>>>>    - Interpreter
>> >>>>>>    
>> >>>>>>       - Generic JDBC Interpreter
>> >>>>>>       - (spark)R Interpreter
>> >>>>>>       - Cluster manager for interpreter (Proposal
>> >>>>>>       <https://cwiki.apache.org/confluence/display/ZEPPELIN/Cluster+M
>> >>>>>>       anager+Proposal> )
>> >>>>>>       - more interpreters
>> >>>>>>    
>> >>>>>>    - Notebook storage
>> >>>>>>    
>> >>>>>>       - Versioning ZEPPELIN-540
>> >>>>>>       <http://issues.apache.org/jira/browse/ZEPPELIN-540>
>> >>>>>>       - more notebook storages
>> >>>>>>    
>> >>>>>>    - Visualization
>> >>>>>>    
>> >>>>>>    
>> >>>>>>    - More visualizations PR-152
>> >>>>>>    
>> >>>>>>       <https://github.com/apache/incubator-zeppelin/pull/152>, PR-728
>> >>>>>>       <https://github.com/apache/incubator-zeppelin/pull/728>, PR-336
>> >>>>>>       <https://github.com/apache/incubator-zeppelin/pull/336>, PR-321
>> >>>>>>       <https://github.com/apache/incubator-zeppelin/pull/321>
>> >>>>>>    
>> >>>>>>    - Customize graph (show/hide label, color, etc)
>> >>>>>> 
>> >>>>>> It will help anyone quickly get overall interest of project and the
>> >>>>>> direction. And based on this roadmap, we can discuss and re-define
>> >>>>>> the next
>> >>>>>> release 0.6.0 scope and it's schedule.
>> >>>>>> 
>> >>>>>> What do you think? Any feedback would be appreciated.
>> >>>>>> 
>> >>>>>> Thanks,
>> >>>>>> moon
>> >> 
>> >> --
>> >> Vinayak Agrawal
>> >> 
>> >> 
>> >> "To Strive, To Seek, To Find and Not to Yield!"
>> >> ~Lord Alfred Tennyson
>
>

Re: [DISCUSS] Update Roadmap

Reply via email to