Comments: Regarding #2: Language support. It would be great to see more Scala (once up to speed with Scala I never wanted to look back at Java) Regarding #3: Drop old SPARK support. Seems like low hanging fruit, low impact & high reward. Regarding #5: Configuration Files. We could take a queue from other great open source (Apache license) projects, like ElasticSearch, and migrate to .yml files instead of verbose XML files and leave Environment variables for per-machine settings & global settings related to the java runtime, JVM memory configs and directories paths such as [FOO]_HOME. An alternative to .yml is HOCON. The Play Framework and Spark Job Server make use of easy to read HOCON style files, which is a a JSON superset. https://github.com/typesafehub/config/blob/master/HOCON.md
Typesafe licenses their entire config library under the Apache library, and uses plain Java with no dependencies: https://github.com/typesafehub/config Regarding #6: Excluding the more esoteric interpreters by default seems reasonable Addition: Create a common installer that also bundles a service manager upstart script for Debian or CentOS (not sure about Windows). Install via Debian package with a simple `dpkg -i` command. Addition: Build tools, Does anybody have history with Gradle? Is a Switch from Maven to Gradle worth it - I admit I am not an XML fan and realize this is not a simple task. Gradle may make it easier to organize the builds if interpreters ever became plugins. Each plugin could have its own build.gradle file "Improve documentation” is always a big yes. Regards, Jeff Steinmetz On 4/6/16, 7:32 PM, "Amos Elberg" <amos.elb...@gmail.com> wrote: >A few suggestions for the roadmap: > >1. Increase unit test coverage. I suggest we set thresholds -- say, 70% for >0.6, 85% for 0.7, and aim for 95% before 1.0. > >2. Language support. Right now, interpreters essentially have to be written >in Java, or at least have java wrappers. This is because the current design >has each interpreter class call a `static class` method when the class is >loaded, to register the Interpreter with zeppelin. In the long term, using >static class methods will inevitably be a source of architectural problems. >(People have been saying that the feature should be removed entirely from Java >since 1998.) In the short term, if we fix this, then it would be easy for >people to write interpreters in other jvm languages, such as Scala, Clojure, >Python (by Jython), Elixir (by whatever the Elxir jvm converter is called), >Groovy, etc. > >3. Remove Spark-under-zeppelin-home. Many, many, many of our issues, >including many CI issues, trace back to the old system of installing Spark >under Zeppelin-home. This is essentially a legacy thing from when Zeppelin >was a PR submitted as an add-on to Spark. Right now, it doesn't buy us >anything -- but it does complicate the build process, create dependency >conflicts, and lead to user support issues. > >I suggest we deprecate this ASAP, and remove it entirely before 0.7, or 0.8 at >the latest. > >4. Drop support for Spark before 1.3, or better yet before 1.4. Jeff >Steinmetz suggested this the other day. It would simplify CI and the build >process, as well as maintenance as Spark heads toward 2.0. I can't imagine >more than a tiny number of people who use zeppelin are using it with Spark >1.2, or even 1.3. > >5. Reform the configuration system. Right now, Zeppelin configuration is set >in: > - ZeppelinConfiguration.java (developers must edit) > - The xml configuration (administrator must edit) > - The env configuration file (administrator must edit) > - Multiple json files such as interpreter.json (edited through the >interface) > >The result is kind of a mish-mash, and it creates user support issues when >people enter conflicting configurations or configurations in the wrong place. > >It's also a developer issue because we haven't defined what takes precedence >over what. > >I suggest we introduce a part of the architecture which acts as an arbitrator >for all configuraiton issues -- when any class needs to access or change >configuration, it can go through one place. Then we can figure out how we >want to present configuration to the users. > >6. Disable most interpreters other than Spark-related (and MD) by default. >At this point, we've proliferated so many interpreters, that it complicates >the build cycle and, well, just isn't necessary. > >On Monday, February 29, 2016 8:04:09 AM EDT Prasad Wagle wrote: >> This is a great list. >> >> In the enterprise ready section, what do you think about adding "High >> Availability and Disaster Recovery"? We can start with updating the >> documentation with best practices and scripts for a cold standby solution >> and work towards active-active >> <https://www.ibm.com/developerworks/community/blogs/RohitShetty/entry/high_a >> vailability_cold_warm_hot?lang=en> solution. >> >> Another suggestion is to store meta-data for notes like creator, last >> updated (time and user) and number of views. We can show this information >> in the top level page in a table format with ability to sort by any column. >> >> On Mon, Feb 29, 2016 at 7:15 AM, Benjamin Kim <bbuil...@gmail.com> wrote: >> > I concur with this suggestion. In the enterprise, management would like to >> > see scheduled runs to be tracked, monitored, and given SLA constraints for >> > the mission critical. Alerts and notifications are crucial for DevOps to >> > respond with error clarification within it. If the Zeppelin notebooks can >> > be executed by a third party scheduling application, such as Oozie, then >> > this requirement can be satisfied if there are no immediate plans for a >> > built-in one. >> > >> > On Feb 29, 2016, at 1:17 AM, Eran Witkon <eranwit...@gmail.com> wrote: >> > >> > @Vinayak Agrawal I would suggest adding the ability to connect zeppelin >> > to existing scheduling tools\workflow tools such as >> > https://oozie.apache.org/. this requires betters hooks and status >> > reporting but doesn't make zeppeling and ETL\scheduler tool by itself/ >> > >> > >> > On Mon, Feb 29, 2016 at 10:21 AM Vinayak Agrawal < >> > >> > vinayakagrawa...@gmail.com> wrote: >> >> Moon, >> >> The new roadmap looks very promising. I am very happy to see security in >> >> the list. >> >> I have some suggestions regarding Enterprise Ready features: >> >> >> >> 1. Job Scheduler - Can this be improved? >> >> Currently the scheduler can be used with Cron expression or a pre-set >> >> time. But in an enterprise solution, a notebook might be one piece of the >> >> workflow. Can we look towards the functionality of scheduling notebook's >> >> based on other notebooks finishing their job successfully? >> >> This requirement would arise in any ETL workflow, where all the >> >> downstream users wait for the ETL notebook to finish successfully. Only >> >> after that, other business oriented notebooks can be executed. >> >> >> >> 2. Importing a notebook - Is there a current requirement or future plan >> >> to implement a feature that allows import-notebook-from-github? This >> >> would >> >> allow users to share notebooks seamlessly. >> >> >> >> Thanks >> >> Vinayak >> >> >> >> On Sun, Feb 28, 2016 at 11:22 PM, moon soo Lee <m...@apache.org> wrote: >> >>> Zhong Wang, >> >>> Right, Folder support would be quite useful. Thanks for the opinion. >> >> >> >> Hope i can finish the work pr-190 >> >> >> >>> <https://github.com/apache/incubator-zeppelin/pull/190>. >> >>> >> >>> >> >>> Sourav, >> >>> Regarding concurrent running, Zeppelin doesn't have limitation of run >> >>> paragraph/query concurrently. Interpreter can implement it's own >> >>> scheduling >> >>> policy. For example, SparkSQL interpreter and ShellInterpreter can >> >>> already >> >>> run paragraph/query concurrently. >> >>> >> >>> SparkInterpreter is implemented with FIFO scheduler considering nature >> >>> of scala compiler. That's why user can not run multiple paragraph >> >>> concurrently when they work with SparkInterpreter. >> >>> But as Zhong Wang mentioned, pr-703 enables each notebook will have >> >>> separate scala compiler so paragraphs run concurrently, while they're in >> >>> different notebooks. >> >>> Thanks for the feedback! >> >>> >> >>> Best, >> >>> moon >> >> >> >> On Sat, Feb 27, 2016 at 8:59 PM Zhong Wang <wangzhong....@gmail.com> >> >> >> >>> wrote: >> >> Sourav: I think this newly merged PR can help you >> >> >> >>>> https://github.com/apache/incubator-zeppelin/pull/703#issuecomment-1855 >> >>>> 82537 >> >>>> >> >>>> On Sat, Feb 27, 2016 at 1:46 PM, Sourav Mazumder < >> >>> >> >>>> sourav.mazumde...@gmail.com> wrote: >> >>> Hi Moon, >> >>> >> >>>>> This looks great. >> >>>>> >> >>>>> My only suggestion would be to include a PR/feature - Support for >> >>>>> Running Concurrent paragraphs/queries in Zeppelin. >> >>>>> >> >>>>> Right now if more than one user tries to run paragraphs in multiple >> >>>>> notebooks concurrently through a single Zeppelin instance (and single >> >>>>> interpreter instance) the performance is very slow. It is obvious that >> >>>>> the >> >>>>> queue gets built up within the zeppelin process and interpreter >> >>>>> process in >> >>>>> that scenario as the time taken to move the status from start to >> >>>>> pending >> >>>>> and pending to running is very high compared to the actual running >> >>>>> time of >> >>>>> a paragraph. >> >>>>> >> >>>>> Without this the multi tenancy support would be meaningless as no one >> >>>>> can practically use it in a situation where multiple users are trying >> >>>>> to >> >>>>> connect to the same instance of Zeppelin (and the related >> >>>>> interpreter). A >> >>>>> possible solution would be to spawn separate instance of the same >> >>>>> interpreter at every notebook/user level. >> >>>>> >> >>>>> Regards, >> >>>>> Sourav >> >>>> >> >>>> On Sat, Feb 27, 2016 at 12:48 PM, moon soo Lee <m...@apache.org> wrote: >> >>>> >> >>>> Hi Zeppelin users and developers, >> >>>> >> >>>>>> The roadmap we have published at >> >>>>>> https://cwiki.apache.org/confluence/display/ZEPPELIN/Zeppelin+Roadmap >> >>>>>> is almost 9 month old, and it doesn't reflect where the community >> >>>>>> goes anymore. It's time to update. >> >>>>>> >> >>>>>> Based on mailing list, jira issues, pullrequests, feedbacks from >> >>>>>> users, conferences and meetings, I could summarize the major interest >> >>>>>> of >> >>>>>> users and developers in 7 categories. Enterprise ready, Usability >> >>>>>> improvement, Pluggability, Documentation, Backend integration, >> >>>>>> Notebook >> >>>>>> storage, and Visualization. >> >>>>>> >> >>>>>> And i could list related subjects under each categories. >> >>>>>> >> >>>>>> - Enterprise ready >> >>>>>> >> >>>>>> - Authentication >> >>>>>> >> >>>>>> - Shiro authentication ZEPPELIN-548 >> >>>>>> <https://issues.apache.org/jira/browse/ZEPPELIN-548> >> >>>>>> >> >>>>>> - Authorization >> >>>>>> >> >>>>>> - Notebook authorization PR-681 >> >>>>>> <https://github.com/apache/incubator-zeppelin/pull/681> >> >>>>>> >> >>>>>> - Security >> >>>>>> - Multi-tenancy >> >>>>>> - Stability >> >>>>>> >> >>>>>> - Usability Improvement >> >>>>>> >> >>>>>> >> >>>>>> - UX improvement >> >>>>>> >> >>>>>> - Better Table data support >> >>>>>> >> >>>>>> - Download data as csv, etc PR-725 >> >>>>>> >> >>>>>> <https://github.com/apache/incubator-zeppelin/pull/725>, >> >>>>>> PR-714 >> >>>>>> <https://github.com/apache/incubator-zeppelin/pull/714>, >> >>>>>> PR-6 <https://github.com/apache/incubator-zeppelin/pull/6>, >> >>>>>> PR-89 <https://github.com/apache/incubator-zeppelin/pull/89> >> >>>>>> >> >>>>>> - Featureful table data display (pagenation, etc) >> >>>>>> >> >>>>>> >> >>>>>> - Pluggability ZEPPELIN-533 >> >>>>>> <https://issues.apache.org/jira/browse/ZEPPELIN-533> >> >>>>>> >> >>>>>> - Pluggable visualization >> >>>>>> >> >>>>>> - Dynamic Interpreter, notebook, visualization loading >> >>>>>> >> >>>>>> >> >>>>>> - Repository and registry for pluggable components >> >>>>>> >> >>>>>> >> >>>>>> - Improve documentation >> >>>>>> >> >>>>>> - Improve contents and readability >> >>>>>> - more tutorials, examples >> >>>>>> >> >>>>>> - Interpreter >> >>>>>> >> >>>>>> - Generic JDBC Interpreter >> >>>>>> - (spark)R Interpreter >> >>>>>> - Cluster manager for interpreter (Proposal >> >>>>>> <https://cwiki.apache.org/confluence/display/ZEPPELIN/Cluster+M >> >>>>>> anager+Proposal> ) >> >>>>>> - more interpreters >> >>>>>> >> >>>>>> - Notebook storage >> >>>>>> >> >>>>>> - Versioning ZEPPELIN-540 >> >>>>>> <http://issues.apache.org/jira/browse/ZEPPELIN-540> >> >>>>>> - more notebook storages >> >>>>>> >> >>>>>> - Visualization >> >>>>>> >> >>>>>> >> >>>>>> - More visualizations PR-152 >> >>>>>> >> >>>>>> <https://github.com/apache/incubator-zeppelin/pull/152>, PR-728 >> >>>>>> <https://github.com/apache/incubator-zeppelin/pull/728>, PR-336 >> >>>>>> <https://github.com/apache/incubator-zeppelin/pull/336>, PR-321 >> >>>>>> <https://github.com/apache/incubator-zeppelin/pull/321> >> >>>>>> >> >>>>>> - Customize graph (show/hide label, color, etc) >> >>>>>> >> >>>>>> It will help anyone quickly get overall interest of project and the >> >>>>>> direction. And based on this roadmap, we can discuss and re-define >> >>>>>> the next >> >>>>>> release 0.6.0 scope and it's schedule. >> >>>>>> >> >>>>>> What do you think? Any feedback would be appreciated. >> >>>>>> >> >>>>>> Thanks, >> >>>>>> moon >> >> >> >> -- >> >> Vinayak Agrawal >> >> >> >> >> >> "To Strive, To Seek, To Find and Not to Yield!" >> >> ~Lord Alfred Tennyson > >