Re: [DISCUSS] Release package size
Decision making taking more time than I expected and I think this shouldn't be blocker for 0.7.0. We can take more time deciding which interpreters can be included or excluded. Until then, I am just going to go with our current one: zeppelin-bin-all, zeppelin-bin-netinst. Moon's suggestion looks good too. Here I summarized interpreter lists that can be included for each option: a. Min package includes interpreters, binary size less than 10MB > angular, bigquery, hdfs, kylin, livy, md, postgresql, python, sh b. Min package includes interpreters 5 or more JIRA issue created per month. > Need to track. This can be overload for release process. c. Min package includes/exclude interpreter that community decide via formal vote. > md, jdbc, spark (based on this mailing thread) On Fri, Jan 20, 2017 at 5:57 PM moon soo Leewrote: > Hi, > > I think we need to have some policy to decide which interpreter goes into > zeppelin-bin-min package. And make applying that policy as a part of > release process. > Because i can not see any consistent rule except for "it seems" or "i > guess". And i have no idea how i can explain if somebody ask 'why python is > not in min package?' 'why xxx is not in min package?'. > > If we really want to min package, we must have a policy that gives > everyone same expectation which goes to min package and which goes not. > Once we agree on policy we can make it part of the release process. > > So, why don't we try define policy together? Here's some idea i can throw. > > a. Min package includes interpreters, binary size less than 10MB > b. Min package includes interpreters 5 or more JIRA issue created per > month. > c. Min package includes/exclude interpreter that community decide via > formal vote. > > "10MB", "5 or more" they are number i just made up. We can change them to > more reasonable numbers. > Also a,b,c are possible examples. We can refine them, we can use only one, > we can use all three, we can add more. > > My point is, we need to give everyone the same expectation which goes min > package, which goes not. > What do you think? > > Thanks, > moon > > On Thu, Jan 19, 2017 at 12:47 AM Mina Lee wrote: > > Thank you for sharing your opinion guys. > > I like Eric's approach. > We are planning to provide official docker managed by community. > There is ongoing work [1] around it, I can focus on this after 0.7.0 > release. > > It seems that majority prefers binary package with top used interpreters > such as spark, md, jdbc. > I think we can gradually move to providing only netinst package once > docker is ready. > For upcoming 0.7.0 release, I'd like to distribute two binary packages: > - zeppelin-bin-min(spark, jdbc, md) > - zeppelin-bin-netinst(spark only) > > [1] https://github.com/apache/zeppelin/pull/1761 > > Thanks, > Mina > > On Thu, Jan 19, 2017 at 1:57 AM Jongyoul Lee wrote: > > I like to deploy netinst only. And it's good idea that Apache Zeppelin > supports official docker image with all possible interpreters. > > On Wed, Jan 18, 2017 at 7:42 PM, Eric Pugh < > ep...@opensourceconnections.com> wrote: > > Can I throw out an alternate approach? I feel like the key value of the > “-all” option is to simplify the life of someone who is new to Zeppelin. > If you’re a sophisticated Zeppelin user, then picking and choosing > interpreters is easy, and you you grok why you want to do that…. > > However, for myself, when I want to demo Zeppelin, I go straight to one of > the Docker images, specifically > https://github.com/dylanmei/docker-zeppelin because it bundles in > everything. > > Would providing a similar Docker image on the “Get Zeppelin” page that > bundles in all the dependencies and interpreters solve the “how do I try > Zeppelin in 5 minutes” challenge? The “Get Zeppelin” page is rather > daunting page! > > Eric > > > On Jan 18, 2017, at 12:00 AM, Mohit Jaggi wrote: > > Including ALL interpreters is not feasible, not due to download size as > that is easily increased but because we wouldn't want to couple the release > cycles as pointed out by Jeff. IMHO a few of the most popular ones should > be included. Yes it is just one extra step but if a computer can do it why > make a human suffer? :-) > Re: spark-packages, Spark does include important and mature functionality > in its assembly e.g. Csv parser was merged into core spark when it matured. > I believe Z should do the same. > > Sent from my iPhone > > On Jan 17, 2017, at 8:05 PM, Jeff Zhang wrote: > > > Another thing I'd like to talk is that should we move most of interpreters > out of zeppelin project to somewhere else just like spark do for > spark-packages, 2 benefits: > > 1. Keep the zeppelin project much smaller > 2. Each interpreter's improvements won't be blocked by the release of > zeppelin. Interpreters can has its own release cycle as long as > zeppelin-interpreter
Re: InvalidClassException using Zeppelin (master) and spark-2.1 on a standalone spark cluster
Hi, Antoine, this issue was being tracked in https://issues.apache.org/jira/browse/ZEPPELIN-1977, but it is now resolved as of yesterday (looks like about 18 hours ago). Maybe you need to pull from master again and rebuild? ~ Jonathan On Fri, Jan 20, 2017 at 1:19 PM Antoinewrote: > Hei, > > I'm trying to use Zeppelin from the master git branch with spark-2.1 and I > get an invalid class exeption when I use a standalone spark cluster. > > java.io.InvalidClassException: > org.apache.commons.lang3.time.FastDateParser; local class incompatible: > stream classdesc serialVersionUID = 2, local class serialVersionUID = 3 > > To reproduce the error, I configure the spark interpreter to connect to a > standalone cluster, it works with a local spark, and to load a file. For > example, spark.read.json("/data/file.json"). > > I'm building and running Zeppelin with docker, based on the > dylanmei/docker-zeppelin docker image, but with a few changes to build the > master branch (npm must be installed and bower needs to be configured to > run with the root user), and using spark 2.1 instead of spark 2.0. > > Can you reproduce the error ? Is there something I'm missing when I build > Zeppelin ? > > Thanks >
InvalidClassException using Zeppelin (master) and spark-2.1 on a standalone spark cluster
Hei, I'm trying to use Zeppelin from the master git branch with spark-2.1 and I get an invalid class exeption when I use a standalone spark cluster. java.io.InvalidClassException: org.apache.commons.lang3.time.FastDateParser; local class incompatible: stream classdesc serialVersionUID = 2, local class serialVersionUID = 3 To reproduce the error, I configure the spark interpreter to connect to a standalone cluster, it works with a local spark, and to load a file. For example, spark.read.json("/data/file.json"). I'm building and running Zeppelin with docker, based on the dylanmei/docker-zeppelin docker image, but with a few changes to build the master branch (npm must be installed and bower needs to be configured to run with the root user), and using spark 2.1 instead of spark 2.0. Can you reproduce the error ? Is there something I'm missing when I build Zeppelin ? Thanks
Re: [Discuss] Move some interpreters out of zeppelin project
Thanks Jeff for staring the thread. Here's my thoughts 1. Do we need to do this yes. 2. If the answer is yes, which interpreters should be moved out If Zeppelin community has no problem maintaining certain interpreter, then no reason to remove contribution from community. However, if Zeppelin community can not maintain well (e.g. not catching up target system version update, bug report is not taken care, etc), then we can consider move out non-maintainable code from community. 3. How do we integrate these interpreters into zeppelin Helium package description [1] already reserved package type 'INTERPRETER' for it. And i hope 'helium' becomes a place finding/installing/uninstalling/upgrading all pluggable modules in Zeppelin. I can make pullrequest quickly to support INTERPRETER installation through helium gui menu. 4. How does zeppelin work with these third party interpreters In the point of view of encouraging 3rd party interpreter, after 3) is done, Zeppelin-netinst package will display community managed interpreters and 3rd party interpreters together in helium menu. And their installation procedure will be exactly the same. (click 'enable' button and click 'ok' on confirm dialog). So, user will not see any difference between using community managed interpreter and using 3rd party interpreter. And this encourage develop more 3rd party interpreters than community managed interpreters, i think. Thanks, moon [1] https://github.com/apache/zeppelin/blob/master/zeppelin-interpreter/src/main/java/org/apache/zeppelin/helium/HeliumPackage.java#L40 On Fri, Jan 20, 2017 at 6:39 AM Jongyoul Leewrote: > Hi Jeff, > > Thanks for starting this issue. > > It increases flexibility of improving interpreters itself but it can also > decreases stability of interpreters. I'm worried about this side-effect. As > you mentioned, it's hard for me to review new interpreter that I didn't use > but it couldn't be a reason why we divide some code from Zeppelin. We have > to make more ppl as committers to review various interpreters. Thus I don't > want some interpreters out of Zeppelin. > > But I, totally, agree about #3, #4. If we deploy minimum package of > Zeppelin, we have to provide GUI for install/uninstall. If it's done, > bin-all-pkg is meaningless and bin-min-pkg is enough. > > On Fri, Jan 20, 2017 at 7:14 PM, Jeff Zhang wrote: > > > As we talk in another thread [1] about moving some interpreters out of > > zeppelin project. I open this thread to discuss it in more details. I'd > > like to raise 4 questions for this. > > > > 1. Do we need to do this > > 2. If the answer is yes, which interpreters should be moved out > > 3. How do we integrate these interpreters into zeppelin > > 4. How does zeppelin work with these third party interpreters > > > > I will first give my inputs on this. > > > > *1. Do we need to do this ?* > > Personally, I strongly +1 on this. Several reasons: > > > >- Keep the zeppelin project much smaller > >- Each interpreter's improvements won't be blocked by the release of > >zeppelin. Interpreters can has its own release cycle as long as > >zeppelin-interpreter doesn't break the compatibility. > >- Zeppelin developer don't have the knowledge of all interpreters. > >Sometimes it is very difficult for zeppelin committers to review a new > >interpreter that he doesn't know. > > > > > > 2. Which interpreters should be moved out ? > > We can discuss it in another thread about the min package. > > > > 3. How do we integrate these interpreters into zeppelin > > Currently, user can install third party interpreter by running script ( > > http://zeppelin.apache.org/docs/0.7.0-SNAPSHOT/manual/ > > interpreterinstallation.html#3rd-party-interpreters), but this is not > > convienient, and it is hard to let every user to be aware of this > feature. > > So I think we should do that in zeppelin UI. We should allow user to > > install/uninstall/upgrade/downgrade third party interpreters in the > > interpreter page. > > > > 4. How does zeppelin work with these third party interpreters > > Besides the interface zeppelin expose to the third party interpreter to > be > > install/uninstall/upgrade/downgrade, it is third party interpreter's own > > responsibility to develop and make new release. > > > > Please help comment on these 4 questions and feel free to add any things > > that I miss. > > > > > > [1] https://lists.apache.org/thread.html/69f606409790d7ba11422e8c6df941 > > a75c5dfae0aca63eccf2f840bf@%3Cusers.zeppelin.apache.org%3E > > > > > > -- > 이종열, Jongyoul Lee, 李宗烈 > http://madeng.net >
Re: [Discuss] Move some interpreters out of zeppelin project
Hi Jeff, Thanks for starting this issue. It increases flexibility of improving interpreters itself but it can also decreases stability of interpreters. I'm worried about this side-effect. As you mentioned, it's hard for me to review new interpreter that I didn't use but it couldn't be a reason why we divide some code from Zeppelin. We have to make more ppl as committers to review various interpreters. Thus I don't want some interpreters out of Zeppelin. But I, totally, agree about #3, #4. If we deploy minimum package of Zeppelin, we have to provide GUI for install/uninstall. If it's done, bin-all-pkg is meaningless and bin-min-pkg is enough. On Fri, Jan 20, 2017 at 7:14 PM, Jeff Zhangwrote: > As we talk in another thread [1] about moving some interpreters out of > zeppelin project. I open this thread to discuss it in more details. I'd > like to raise 4 questions for this. > > 1. Do we need to do this > 2. If the answer is yes, which interpreters should be moved out > 3. How do we integrate these interpreters into zeppelin > 4. How does zeppelin work with these third party interpreters > > I will first give my inputs on this. > > *1. Do we need to do this ?* > Personally, I strongly +1 on this. Several reasons: > >- Keep the zeppelin project much smaller >- Each interpreter's improvements won't be blocked by the release of >zeppelin. Interpreters can has its own release cycle as long as >zeppelin-interpreter doesn't break the compatibility. >- Zeppelin developer don't have the knowledge of all interpreters. >Sometimes it is very difficult for zeppelin committers to review a new >interpreter that he doesn't know. > > > 2. Which interpreters should be moved out ? > We can discuss it in another thread about the min package. > > 3. How do we integrate these interpreters into zeppelin > Currently, user can install third party interpreter by running script ( > http://zeppelin.apache.org/docs/0.7.0-SNAPSHOT/manual/ > interpreterinstallation.html#3rd-party-interpreters), but this is not > convienient, and it is hard to let every user to be aware of this feature. > So I think we should do that in zeppelin UI. We should allow user to > install/uninstall/upgrade/downgrade third party interpreters in the > interpreter page. > > 4. How does zeppelin work with these third party interpreters > Besides the interface zeppelin expose to the third party interpreter to be > install/uninstall/upgrade/downgrade, it is third party interpreter's own > responsibility to develop and make new release. > > Please help comment on these 4 questions and feel free to add any things > that I miss. > > > [1] https://lists.apache.org/thread.html/69f606409790d7ba11422e8c6df941 > a75c5dfae0aca63eccf2f840bf@%3Cusers.zeppelin.apache.org%3E > -- 이종열, Jongyoul Lee, 李宗烈 http://madeng.net
[Discuss] Move some interpreters out of zeppelin project
As we talk in another thread [1] about moving some interpreters out of zeppelin project. I open this thread to discuss it in more details. I'd like to raise 4 questions for this. 1. Do we need to do this 2. If the answer is yes, which interpreters should be moved out 3. How do we integrate these interpreters into zeppelin 4. How does zeppelin work with these third party interpreters I will first give my inputs on this. *1. Do we need to do this ?* Personally, I strongly +1 on this. Several reasons: - Keep the zeppelin project much smaller - Each interpreter's improvements won't be blocked by the release of zeppelin. Interpreters can has its own release cycle as long as zeppelin-interpreter doesn't break the compatibility. - Zeppelin developer don't have the knowledge of all interpreters. Sometimes it is very difficult for zeppelin committers to review a new interpreter that he doesn't know. 2. Which interpreters should be moved out ? We can discuss it in another thread about the min package. 3. How do we integrate these interpreters into zeppelin Currently, user can install third party interpreter by running script ( http://zeppelin.apache.org/docs/0.7.0-SNAPSHOT/manual/interpreterinstallation.html#3rd-party-interpreters), but this is not convienient, and it is hard to let every user to be aware of this feature. So I think we should do that in zeppelin UI. We should allow user to install/uninstall/upgrade/downgrade third party interpreters in the interpreter page. 4. How does zeppelin work with these third party interpreters Besides the interface zeppelin expose to the third party interpreter to be install/uninstall/upgrade/downgrade, it is third party interpreter's own responsibility to develop and make new release. Please help comment on these 4 questions and feel free to add any things that I miss. [1] https://lists.apache.org/thread.html/69f606409790d7ba11422e8c6df941a75c5dfae0aca63eccf2f840bf@%3Cusers.zeppelin.apache.org%3E
Re: [DISCUSS] Release package size
Hi, I think we need to have some policy to decide which interpreter goes into zeppelin-bin-min package. And make applying that policy as a part of release process. Because i can not see any consistent rule except for "it seems" or "i guess". And i have no idea how i can explain if somebody ask 'why python is not in min package?' 'why xxx is not in min package?'. If we really want to min package, we must have a policy that gives everyone same expectation which goes to min package and which goes not. Once we agree on policy we can make it part of the release process. So, why don't we try define policy together? Here's some idea i can throw. a. Min package includes interpreters, binary size less than 10MB b. Min package includes interpreters 5 or more JIRA issue created per month. c. Min package includes/exclude interpreter that community decide via formal vote. "10MB", "5 or more" they are number i just made up. We can change them to more reasonable numbers. Also a,b,c are possible examples. We can refine them, we can use only one, we can use all three, we can add more. My point is, we need to give everyone the same expectation which goes min package, which goes not. What do you think? Thanks, moon On Thu, Jan 19, 2017 at 12:47 AM Mina Leewrote: > Thank you for sharing your opinion guys. > > I like Eric's approach. > We are planning to provide official docker managed by community. > There is ongoing work [1] around it, I can focus on this after 0.7.0 > release. > > It seems that majority prefers binary package with top used interpreters > such as spark, md, jdbc. > I think we can gradually move to providing only netinst package once > docker is ready. > For upcoming 0.7.0 release, I'd like to distribute two binary packages: > - zeppelin-bin-min(spark, jdbc, md) > - zeppelin-bin-netinst(spark only) > > [1] https://github.com/apache/zeppelin/pull/1761 > > Thanks, > Mina > > On Thu, Jan 19, 2017 at 1:57 AM Jongyoul Lee wrote: > > I like to deploy netinst only. And it's good idea that Apache Zeppelin > supports official docker image with all possible interpreters. > > On Wed, Jan 18, 2017 at 7:42 PM, Eric Pugh < > ep...@opensourceconnections.com> wrote: > > Can I throw out an alternate approach? I feel like the key value of the > “-all” option is to simplify the life of someone who is new to Zeppelin. > If you’re a sophisticated Zeppelin user, then picking and choosing > interpreters is easy, and you you grok why you want to do that…. > > However, for myself, when I want to demo Zeppelin, I go straight to one of > the Docker images, specifically > https://github.com/dylanmei/docker-zeppelin because it bundles in > everything. > > Would providing a similar Docker image on the “Get Zeppelin” page that > bundles in all the dependencies and interpreters solve the “how do I try > Zeppelin in 5 minutes” challenge? The “Get Zeppelin” page is rather > daunting page! > > Eric > > > On Jan 18, 2017, at 12:00 AM, Mohit Jaggi wrote: > > Including ALL interpreters is not feasible, not due to download size as > that is easily increased but because we wouldn't want to couple the release > cycles as pointed out by Jeff. IMHO a few of the most popular ones should > be included. Yes it is just one extra step but if a computer can do it why > make a human suffer? :-) > Re: spark-packages, Spark does include important and mature functionality > in its assembly e.g. Csv parser was merged into core spark when it matured. > I believe Z should do the same. > > Sent from my iPhone > > On Jan 17, 2017, at 8:05 PM, Jeff Zhang wrote: > > > Another thing I'd like to talk is that should we move most of interpreters > out of zeppelin project to somewhere else just like spark do for > spark-packages, 2 benefits: > > 1. Keep the zeppelin project much smaller > 2. Each interpreter's improvements won't be blocked by the release of > zeppelin. Interpreters can has its own release cycle as long as > zeppelin-interpreter doesn't break the compatibility. > > If it make sense, I can open another thread to discuss it. > > > > > Jun Kim 于2017年1月18日周三 上午11:55写道: > > +1 for Jeff's idea! I also use the three interpreters mainly :) > > 2017년 1월 18일 (수) 오후 12:52, Jeff Zhang 님이 작성: > > > How about also include markdown and jdbc interpreter if this won't cause > binary distribution much bigger ? I guess spark, markdown, and jdbc > interpreters are the top 3 interpreters in zeppelin. > > > > Ahyoung Ryu 于2017年1月18日周三 上午11:33写道: > > Thanks Mina always! > +1 for releasing only netinst package. > > On Wed, Jan 18, 2017 at 12:29 PM, Prabhjyot Singh < > prabhjyotsi...@apache.org> wrote: > > +1 > > I don't think it's a problem now, but if it keeps increasing then in the > subsequent releases we can ship Zeppelin with few interpreters, and mark > others as plugins that can be downloaded later with