I see in the Enterprise section that multi-tenancy will be included, will this have user impersonation too? In this way, the user executing will be the user owning the process.
> On Mar 1, 2016, at 12:51 AM, Shabeel Syed <shabeels...@gmail.com> wrote: > > +1 > > Hi Tamas, > Pluggable external visualization is really a GREAT feature to have. I'm > looking forward to this :) > > Regards > Shabeel > > On Tue, Mar 1, 2016 at 2:16 PM, Tamas Szuromi <tamas.szur...@odigeo.com > <mailto:tamas.szur...@odigeo.com>> wrote: > Hey, > > Really promising roadmap. > > I'd only push more visualization options. I agree built in visualization is > needed with limited charting options but I think we also need somehow > 'inject' external js visualizations also. > > > For scheduling Zeppelin notebooks we useĀ https://github.com/airbnb/airflow > <https://github.com/airbnb/airflow> through the job rest api. It's an > enterprise ready and very robust solution right now. > > Tamas > > > On 1 March 2016 at 09:12, Eran Witkon <eranwit...@gmail.com > <mailto:eranwit...@gmail.com>> wrote: > One point to clarify, I don't want to suggest Oozie in specific, I want to > think about which features we develop and which ones we integrate external, > preferred Apache, technology? We don't think about building our own storage > services so why build our own scheduler? > Eran > On Tue, 1 Mar 2016 at 09:49 moon soo Lee <m...@apache.org > <mailto:m...@apache.org>> wrote: > @Vinayak, @Eran, @Benjamin, @Guilherme, @Sourav, @Rick > Now I can see a lot of demands around enterprise level job scheduling. Either > external or built-in, I completely agree having enterprise level job > scheduling support on the roadmap. > ZEPPELIN-137 <https://issues.apache.org/jira/browse/ZEPPELIN-137>, > ZEPPELIN-531 <https://issues.apache.org/jira/browse/ZEPPELIN-531> are related > issues i can find in our JIRA. > > @Vinayak > Regarding importing notebook from github, Zeppelin has pluggable notebook > storage layer (see related package > <https://github.com/apache/incubator-zeppelin/tree/master/zeppelin-zengine/src/main/java/org/apache/zeppelin/notebook/repo>). > So, github notebook sync can be implemented easily. > > @Shabeel > Right, we need better manage management to prevent such OOM. > And i think table is one of the most frequently used way of displaying data. > So definitely, we'll need more features like filter, sort, etc. > After this roadmap discussion, discussion for the next release will follow. > Then we'll get idea when those features will be available. > > @Prasad > Thanks for mentioning HA and DR. They're really important subject for > enterprise use. Definitely Zeppelin will need to address them. > And displaying meta information of notebook on top level page is good idea. > > It's really great to hear many opinions and ideas. > And thanks @Rick for sharing valuable view to Zeppelin project. > > Thanks, > moon > > > On Mon, Feb 29, 2016 at 11:14 PM Rick Moritz <rah...@gmail.com > <mailto:rah...@gmail.com>> wrote: > Hi, > > For one, I know that there is rudimentary scheduling built into Zeppelin > already (at least I fixed a bug in the test for a scheduling feature a few > months ago). > But another point is, that Zeppelin should also focus on quality, > reproduceability and portability. > Although this doesn't offer exciting new features, it would make development > much easier. > > Cross-platform testability, Tests that pass when run sequentially, > compatibility with Firefox, and many more open issues that make it so much > harder to enhance Zeppelin and add features should be addressed soon, > preferably before more features are added. Already Zeppelin is suffering - in > my opinion - from quite a lot of feature creep, and we should avoid putting > in the kitchen sink, at the cost of quality and maintainability. Instead > modularity (ZEPPELIN-533 in particular) should be targeted. > > Oozie, in my opinion, is a dead end - it may de-facto still be in use on many > clusters, but it's not getting the love it needs, and I wouldn't bet on it, > when it comes to integrating scheduling. Instead, any external tool should be > able to use the REST-API to trigger executions, if you want external > scheduling. > > So, in conclusion, if we take Moon's list as a list of descending priorities, > I fully agree, under the condition that code quality is included as a subset > of enterprise-readyness. Auth* is paramount (Kerberos SPNEGO SSO support is > what we really want) with user and group rights assignment on the notebook > level. We probably also need Knox-integration (ODP-Members looking at > integrating Zeppelin should consider contributing this), and integration of > something like Spree (https://github.com/hammerlab/spree > <https://github.com/hammerlab/spree>) to be able to profile jobs. > > I'm hopeful that soon I can resume contributing some quality-oriented code, > to drive this "necessary evil" forward ;) > > On Mon, Feb 29, 2016 at 8:27 PM, Sourav Mazumder <sourav.mazumde...@gmail.com > <mailto:sourav.mazumde...@gmail.com>> wrote: > I do agree with Vinayak. It need not be coupled with Oozie. > > Rather one should be able to call it from any scheduler typically used in > enterprise level. May be support for BPML. > > I believe the existing ability to call/execute a Zeppelin Notebook or a > specific paragraph within a notebook using REST API should take care of this > requirement to some extent. > > Regards, > Sourav > > On Mon, Feb 29, 2016 at 11:23 AM, Vinayak Agrawal <vinayakagrawa...@gmail.com > <mailto:vinayakagrawa...@gmail.com>> wrote: > @Eran Witkon, > Thanks for the suggestion Eran. I concur with your thought. > If Zepplin can be integrated with oozie, that would be wonderful. Users will > also be able to leverage their Oozie skills. > This would be promising for now. > However, in the future Hadoop might not necessarily be installed in Spark > Cluster and Oozie (since its installs with Hadoop Distribution) might not be > available. > So perhaps we should give a thought about this feature for the future. Should > it depend on oozie or should Zeppelin have its owns scheduling? > > As Benjamin has iterated, Databrick notebook has this as a core notebook > feature. > > > Also, would anybody give any suggestions regarding "sync with github" feature? > -Exporting notebook to Github > -Importing notebook from Github > > Thanks > Vinayak > > > On Mon, Feb 29, 2016 at 4:17 AM, Eran Witkon <eranwit...@gmail.com > <mailto:eranwit...@gmail.com>> wrote: > @Vinayak Agrawal I would suggest adding the ability to connect zeppelin to > existing scheduling tools\workflow tools such as https://oozie.apache.org/ > <https://oozie.apache.org/>. this requires betters hooks and status reporting > but doesn't make zeppeling and ETL\scheduler tool by itself/ > > > On Mon, Feb 29, 2016 at 10:21 AM Vinayak Agrawal <vinayakagrawa...@gmail.com > <mailto:vinayakagrawa...@gmail.com>> wrote: > Moon, > The new roadmap looks very promising. I am very happy to see security in the > list. > I have some suggestions regarding Enterprise Ready features: > > 1. Job Scheduler - Can this be improved? > Currently the scheduler can be used with Cron expression or a pre-set time. > But in an enterprise solution, a notebook might be one piece of the workflow. > Can we look towards the functionality of scheduling notebook's based on other > notebooks finishing their job successfully? > This requirement would arise in any ETL workflow, where all the downstream > users wait for the ETL notebook to finish successfully. Only after that, > other business oriented notebooks can be executed. > > 2. Importing a notebook - Is there a current requirement or future plan to > implement a feature that allows import-notebook-from-github? This would allow > users to share notebooks seamlessly. > > Thanks > Vinayak > > On Sun, Feb 28, 2016 at 11:22 PM, moon soo Lee <m...@apache.org > <mailto:m...@apache.org>> wrote: > Zhong Wang, > Right, Folder support would be quite useful. Thanks for the opinion. > Hope i can finish the work pr-190 > <https://github.com/apache/incubator-zeppelin/pull/190>. > > Sourav, > Regarding concurrent running, Zeppelin doesn't have limitation of run > paragraph/query concurrently. Interpreter can implement it's own scheduling > policy. For example, SparkSQL interpreter and ShellInterpreter can already > run paragraph/query concurrently. > > SparkInterpreter is implemented with FIFO scheduler considering nature of > scala compiler. That's why user can not run multiple paragraph concurrently > when they work with SparkInterpreter. > But as Zhong Wang mentioned, pr-703 enables each notebook will have separate > scala compiler so paragraphs run concurrently, while they're in different > notebooks. > Thanks for the feedback! > > Best, > moon > On Sat, Feb 27, 2016 at 8:59 PM Zhong Wang <wangzhong....@gmail.com > <mailto:wangzhong....@gmail.com>> wrote: > Sourav: I think this newly merged PR can help you > https://github.com/apache/incubator-zeppelin/pull/703#issuecomment-185582537 > <https://github.com/apache/incubator-zeppelin/pull/703#issuecomment-185582537> > > On Sat, Feb 27, 2016 at 1:46 PM, Sourav Mazumder <sourav.mazumde...@gmail.com > <mailto:sourav.mazumde...@gmail.com>> wrote: > Hi Moon, > > This looks great. > > My only suggestion would be to include a PR/feature - Support for Running > Concurrent paragraphs/queries in Zeppelin. > > Right now if more than one user tries to run paragraphs in multiple notebooks > concurrently through a single Zeppelin instance (and single interpreter > instance) the performance is very slow. It is obvious that the queue gets > built up within the zeppelin process and interpreter process in that scenario > as the time taken to move the status from start to pending and pending to > running is very high compared to the actual running time of a paragraph. > > Without this the multi tenancy support would be meaningless as no one can > practically use it in a situation where multiple users are trying to connect > to the same instance of Zeppelin (and the related interpreter). A possible > solution would be to spawn separate instance of the same interpreter at every > notebook/user level. > > Regards, > Sourav > On Sat, Feb 27, 2016 at 12:48 PM, moon soo Lee <m...@apache.org > <mailto:m...@apache.org>> wrote: > Hi Zeppelin users and developers, > > The roadmap we have published at > https://cwiki.apache.org/confluence/display/ZEPPELIN/Zeppelin+Roadmap > <https://cwiki.apache.org/confluence/display/ZEPPELIN/Zeppelin+Roadmap> > is almost 9 month old, and it doesn't reflect where the community goes > anymore. It's time to update. > > Based on mailing list, jira issues, pullrequests, feedbacks from users, > conferences and meetings, I could summarize the major interest of users and > developers in 7 categories. Enterprise ready, Usability improvement, > Pluggability, Documentation, Backend integration, Notebook storage, and > Visualization. > > And i could list related subjects under each categories. > Enterprise ready > Authentication > Shiro authentication ZEPPELIN-548 > <https://issues.apache.org/jira/browse/ZEPPELIN-548> > Authorization > Notebook authorization PR-681 > <https://github.com/apache/incubator-zeppelin/pull/681> > Security > Multi-tenancy > Stability > Usability Improvement > UX improvement > Better Table data support > Download data as csv, etc PR-725 > <https://github.com/apache/incubator-zeppelin/pull/725>, PR-714 > <https://github.com/apache/incubator-zeppelin/pull/714>, PR-6 > <https://github.com/apache/incubator-zeppelin/pull/6>, PR-89 > <https://github.com/apache/incubator-zeppelin/pull/89> > Featureful table data display (pagenation, etc) > Pluggability ZEPPELIN-533 <https://issues.apache.org/jira/browse/ZEPPELIN-533> > Pluggable visualization > Dynamic Interpreter, notebook, visualization loading > Repository and registry for pluggable components > Improve documentation > Improve contents and readability > more tutorials, examples > Interpreter > Generic JDBC Interpreter > (spark)R Interpreter > Cluster manager for interpreter (Proposal > <https://cwiki.apache.org/confluence/display/ZEPPELIN/Cluster+Manager+Proposal>) > more interpreters > Notebook storage > Versioning ZEPPELIN-540 <http://issues.apache.org/jira/browse/ZEPPELIN-540> > more notebook storages > Visualization > More visualizations PR-152 > <https://github.com/apache/incubator-zeppelin/pull/152>, PR-728 > <https://github.com/apache/incubator-zeppelin/pull/728>, PR-336 > <https://github.com/apache/incubator-zeppelin/pull/336>, PR-321 > <https://github.com/apache/incubator-zeppelin/pull/321> > Customize graph (show/hide label, color, etc) > It will help anyone quickly get overall interest of project and the > direction. And based on this roadmap, we can discuss and re-define the next > release 0.6.0 scope and it's schedule. > > What do you think? Any feedback would be appreciated. > > Thanks, > moon > > > > > -- > Vinayak Agrawal > > > "To Strive, To Seek, To Find and Not to Yield!" > ~Lord Alfred Tennyson > > > > -- > Vinayak Agrawal > Big Data Analytics > IBM > > "To Strive, To Seek, To Find and Not to Yield!" > ~Lord Alfred Tennyson > > > >