Re: [DISCUSS] Update Roadmap

Benjamin Kim Tue, 01 Mar 2016 09:57:02 -0800

I see in the Enterprise section that multi-tenancy will be included, will this 
have user impersonation too? In this way, the user executing will be the user 
owning the process.


> On Mar 1, 2016, at 12:51 AM, Shabeel Syed <shabeels...@gmail.com> wrote:
> 
> +1
> 
> Hi Tamas,
>    Pluggable external visualization is really a GREAT feature to have. I'm 
> looking forward to this :)
> 
> Regards
> Shabeel
> 
> On Tue, Mar 1, 2016 at 2:16 PM, Tamas Szuromi <tamas.szur...@odigeo.com 
> <mailto:tamas.szur...@odigeo.com>> wrote:
> Hey,
> 
> Really promising roadmap.
> 
> I'd only push more visualization options. I agree built in visualization is 
> needed with limited charting options but I think we also need somehow 
> 'inject' external js visualizations also. 
> 
> 
> For scheduling Zeppelin notebooks  we use https://github.com/airbnb/airflow 
> <https://github.com/airbnb/airflow> through the job rest api. It's an 
> enterprise ready and very robust solution right now.
> 
> Tamas
> 
> 
> On 1 March 2016 at 09:12, Eran Witkon <eranwit...@gmail.com 
> <mailto:eranwit...@gmail.com>> wrote:
> One point to clarify, I don't want to suggest Oozie in specific, I want to 
> think about which features we develop and which ones we integrate external, 
> preferred Apache, technology? We don't think about building our own storage 
> services so why build our own scheduler?
> Eran 
> On Tue, 1 Mar 2016 at 09:49 moon soo Lee <m...@apache.org 
> <mailto:m...@apache.org>> wrote:
> @Vinayak, @Eran, @Benjamin, @Guilherme, @Sourav, @Rick
> Now I can see a lot of demands around enterprise level job scheduling. Either 
> external or built-in, I completely agree having enterprise level job 
> scheduling support on the roadmap.
> ZEPPELIN-137 <https://issues.apache.org/jira/browse/ZEPPELIN-137>, 
> ZEPPELIN-531 <https://issues.apache.org/jira/browse/ZEPPELIN-531> are related 
> issues i can find in our JIRA.
> 
> @Vinayak
> Regarding importing notebook from github, Zeppelin has pluggable notebook 
> storage layer (see related package 
> <https://github.com/apache/incubator-zeppelin/tree/master/zeppelin-zengine/src/main/java/org/apache/zeppelin/notebook/repo>).
>  So, github notebook sync can be implemented easily.
> 
> @Shabeel
> Right, we need better manage management to prevent such OOM.
> And i think table is one of the most frequently used way of displaying data. 
> So definitely, we'll need more features like filter, sort, etc.
> After this roadmap discussion, discussion for the next release will follow. 
> Then we'll get idea when those features will be available.
> 
> @Prasad
> Thanks for mentioning HA and DR. They're really important subject for 
> enterprise use. Definitely Zeppelin will need to address them.
> And displaying meta information of notebook on top level page is good idea.
> 
> It's really great to hear many opinions and ideas.
> And thanks @Rick for sharing valuable view to Zeppelin project.
> 
> Thanks,
> moon
> 
> 
> On Mon, Feb 29, 2016 at 11:14 PM Rick Moritz <rah...@gmail.com 
> <mailto:rah...@gmail.com>> wrote:
> Hi,
> 
> For one, I know that there is rudimentary scheduling built into Zeppelin 
> already (at least I fixed a bug in the test for a scheduling feature a few 
> months ago).
> But another point is, that Zeppelin should also focus on quality, 
> reproduceability and portability.
> Although this doesn't offer exciting new features, it would make development 
> much easier.
> 
> Cross-platform testability, Tests that pass when run sequentially, 
> compatibility with Firefox, and many more open issues that make it so much 
> harder to enhance Zeppelin and add features should be addressed soon, 
> preferably before more features are added. Already Zeppelin is suffering - in 
> my opinion - from quite a lot of feature creep, and we should avoid putting 
> in the kitchen sink, at the cost of quality and maintainability. Instead 
> modularity (ZEPPELIN-533 in particular) should be targeted.
> 
> Oozie, in my opinion, is a dead end - it may de-facto still be in use on many 
> clusters, but it's not getting the love it needs, and I wouldn't bet on it, 
> when it comes to integrating scheduling. Instead, any external tool should be 
> able to use the REST-API to trigger executions, if you want external 
> scheduling.
> 
> So, in conclusion, if we take Moon's list as a list of descending priorities, 
> I fully agree, under the condition that code quality is included as a subset 
> of enterprise-readyness. Auth* is paramount (Kerberos SPNEGO SSO support is 
> what we really want) with user and group rights assignment on the notebook 
> level. We probably also need Knox-integration (ODP-Members looking at 
> integrating Zeppelin should consider contributing this), and integration of 
> something like Spree (https://github.com/hammerlab/spree 
> <https://github.com/hammerlab/spree>) to be able to profile jobs.
> 
> I'm hopeful that soon I can resume contributing some quality-oriented code, 
> to drive this "necessary evil" forward ;)
> 
> On Mon, Feb 29, 2016 at 8:27 PM, Sourav Mazumder <sourav.mazumde...@gmail.com 
> <mailto:sourav.mazumde...@gmail.com>> wrote:
> I do agree with Vinayak. It need not be coupled with Oozie.
> 
> Rather one should be able to call it from any scheduler typically used in 
> enterprise level. May be support for BPML.
> 
> I believe the existing ability to call/execute a Zeppelin Notebook or a 
> specific paragraph within a notebook using REST API should take care of this 
> requirement to some extent.
> 
> Regards,
> Sourav
> 
> On Mon, Feb 29, 2016 at 11:23 AM, Vinayak Agrawal <vinayakagrawa...@gmail.com 
> <mailto:vinayakagrawa...@gmail.com>> wrote:
> @Eran Witkon, 
> Thanks for the suggestion Eran. I concur with your thought. 
> If Zepplin can be integrated with oozie, that would be wonderful. Users will 
> also be able to leverage their Oozie skills. 
> This would be promising for now. 
> However, in the future Hadoop might not necessarily be installed in Spark 
> Cluster and Oozie (since its installs with Hadoop Distribution) might not be 
> available.
> So perhaps we should give a thought about this feature for the future. Should 
> it depend on oozie or should Zeppelin have its owns scheduling?
> 
> As Benjamin has iterated, Databrick notebook has this as a core notebook 
> feature. 
> 
> 
> Also, would anybody give any suggestions regarding "sync with github" feature?
> -Exporting notebook to Github
> -Importing notebook from Github
> 
> Thanks 
> Vinayak  
>  
> 
> On Mon, Feb 29, 2016 at 4:17 AM, Eran Witkon <eranwit...@gmail.com 
> <mailto:eranwit...@gmail.com>> wrote:
> @Vinayak Agrawal I would suggest adding the ability to connect zeppelin to 
> existing scheduling tools\workflow tools such as  https://oozie.apache.org/ 
> <https://oozie.apache.org/>. this requires betters hooks and status reporting 
> but doesn't make zeppeling and ETL\scheduler tool by itself/
> 
> 
> On Mon, Feb 29, 2016 at 10:21 AM Vinayak Agrawal <vinayakagrawa...@gmail.com 
> <mailto:vinayakagrawa...@gmail.com>> wrote:
> Moon,
> The new roadmap looks very promising. I am very happy to see security in the 
> list.
> I have some suggestions regarding Enterprise Ready features:
> 
> 1. Job Scheduler - Can this be improved? 
> Currently the scheduler can be used with Cron expression or a pre-set time. 
> But in an enterprise solution, a notebook might be one piece of the workflow. 
> Can we look towards the functionality of scheduling notebook's based on other 
> notebooks finishing their job successfully?
> This requirement would arise in any ETL workflow, where all the downstream 
> users wait for the ETL notebook to finish successfully. Only after that, 
> other business oriented notebooks can be executed.  
> 
> 2. Importing a notebook - Is there a current requirement or future plan to 
> implement a feature that allows import-notebook-from-github? This would allow 
> users to share notebooks seamlessly. 
> 
> Thanks 
> Vinayak
> 
> On Sun, Feb 28, 2016 at 11:22 PM, moon soo Lee <m...@apache.org 
> <mailto:m...@apache.org>> wrote:
> Zhong Wang, 
> Right, Folder support would be quite useful. Thanks for the opinion. 
> Hope i can finish the work pr-190 
> <https://github.com/apache/incubator-zeppelin/pull/190>.
> 
> Sourav,
> Regarding concurrent running, Zeppelin doesn't have limitation of run 
> paragraph/query concurrently. Interpreter can implement it's own scheduling 
> policy. For example, SparkSQL interpreter and ShellInterpreter can already 
> run paragraph/query concurrently.
> 
> SparkInterpreter is implemented with FIFO scheduler considering nature of 
> scala compiler. That's why user can not run multiple paragraph concurrently 
> when they work with SparkInterpreter.
> But as Zhong Wang mentioned, pr-703 enables each notebook will have separate 
> scala compiler so paragraphs run concurrently, while they're in different 
> notebooks.
> Thanks for the feedback!
> 
> Best,
> moon
> On Sat, Feb 27, 2016 at 8:59 PM Zhong Wang <wangzhong....@gmail.com 
> <mailto:wangzhong....@gmail.com>> wrote:
> Sourav: I think this newly merged PR can help you 
> https://github.com/apache/incubator-zeppelin/pull/703#issuecomment-185582537 
> <https://github.com/apache/incubator-zeppelin/pull/703#issuecomment-185582537>
> 
> On Sat, Feb 27, 2016 at 1:46 PM, Sourav Mazumder <sourav.mazumde...@gmail.com 
> <mailto:sourav.mazumde...@gmail.com>> wrote:
> Hi Moon,
> 
> This looks great.
> 
> My only suggestion would be to include a PR/feature - Support for Running 
> Concurrent paragraphs/queries in Zeppelin. 
> 
> Right now if more than one user tries to run paragraphs in multiple notebooks 
> concurrently through a single Zeppelin instance (and single interpreter 
> instance) the performance is very slow. It is obvious that the queue gets 
> built up within the zeppelin process and interpreter process in that scenario 
> as the time taken to move the status from start to pending and pending to 
> running is very high compared to the actual running time of a paragraph.
> 
> Without this the multi tenancy support would be meaningless as no one can 
> practically use it in a situation where multiple users are trying to connect 
> to the same instance of Zeppelin (and the related interpreter). A possible 
> solution would be to spawn separate instance of the same interpreter at every 
> notebook/user level.
> 
> Regards,
> Sourav
> On Sat, Feb 27, 2016 at 12:48 PM, moon soo Lee <m...@apache.org 
> <mailto:m...@apache.org>> wrote:
> Hi Zeppelin users and developers,
> 
> The roadmap we have published at
> https://cwiki.apache.org/confluence/display/ZEPPELIN/Zeppelin+Roadmap 
> <https://cwiki.apache.org/confluence/display/ZEPPELIN/Zeppelin+Roadmap>
> is almost 9 month old, and it doesn't reflect where the community goes 
> anymore. It's time to update.
> 
> Based on mailing list, jira issues, pullrequests, feedbacks from users, 
> conferences and meetings, I could summarize the major interest of users and 
> developers in 7 categories. Enterprise ready, Usability improvement, 
> Pluggability, Documentation, Backend integration, Notebook storage, and 
> Visualization.
> 
> And i could list related subjects under each categories.
> Enterprise ready
> Authentication 
> Shiro authentication ZEPPELIN-548 
> <https://issues.apache.org/jira/browse/ZEPPELIN-548>
> Authorization 
> Notebook authorization PR-681 
> <https://github.com/apache/incubator-zeppelin/pull/681>
> Security
> Multi-tenancy
> Stability
> Usability Improvement
> UX improvement
> Better Table data support
> Download data as csv, etc PR-725 
> <https://github.com/apache/incubator-zeppelin/pull/725>, PR-714 
> <https://github.com/apache/incubator-zeppelin/pull/714>, PR-6 
> <https://github.com/apache/incubator-zeppelin/pull/6>, PR-89 
> <https://github.com/apache/incubator-zeppelin/pull/89>
> Featureful table data display (pagenation, etc)
> Pluggability ZEPPELIN-533 <https://issues.apache.org/jira/browse/ZEPPELIN-533>
> Pluggable visualization
> Dynamic Interpreter, notebook, visualization loading
> Repository and registry for pluggable components
> Improve documentation
> Improve contents and readability
> more tutorials, examples
> Interpreter
> Generic JDBC Interpreter
> (spark)R Interpreter
> Cluster manager for interpreter (Proposal 
> <https://cwiki.apache.org/confluence/display/ZEPPELIN/Cluster+Manager+Proposal>)
> more interpreters
> Notebook storage
> Versioning ZEPPELIN-540 <http://issues.apache.org/jira/browse/ZEPPELIN-540>
> more notebook storages
> Visualization
> More visualizations PR-152 
> <https://github.com/apache/incubator-zeppelin/pull/152>, PR-728 
> <https://github.com/apache/incubator-zeppelin/pull/728>, PR-336 
> <https://github.com/apache/incubator-zeppelin/pull/336>, PR-321 
> <https://github.com/apache/incubator-zeppelin/pull/321>
> Customize graph (show/hide label, color, etc)
> It will help anyone quickly get overall interest of project and the 
> direction. And based on this roadmap, we can discuss and re-define the next 
> release 0.6.0 scope and it's schedule.
> 
> What do you think? Any feedback would be appreciated.
> 
> Thanks,
> moon
> 
> 
> 
> 
> -- 
> Vinayak Agrawal
> 
> 
> "To Strive, To Seek, To Find and Not to Yield!" 
> ~Lord Alfred Tennyson
> 
> 
> 
> -- 
> Vinayak Agrawal
> Big Data Analytics
> IBM
> 
> "To Strive, To Seek, To Find and Not to Yield!" 
> ~Lord Alfred Tennyson
> 
> 
> 
>

Re: [DISCUSS] Update Roadmap

Reply via email to