Re: [DISCUSS] Release package size

2017-01-20 Thread Mina Lee
Decision making taking more time than I expected and
I think this shouldn't be blocker for 0.7.0.

We can take more time deciding which interpreters can be included or
excluded.
Until then, I am just going to go with our current one: zeppelin-bin-all,
zeppelin-bin-netinst.

Moon's suggestion looks good too.
Here I summarized interpreter lists that can be included for each option:
 a. Min package includes interpreters, binary size less than 10MB
  > angular, bigquery, hdfs, kylin, livy, md, postgresql, python, sh
 b. Min package includes interpreters 5 or more JIRA issue created per
month.
  > Need to track. This can be overload for release process.
 c. Min package includes/exclude interpreter that community decide via
formal vote.
 > md, jdbc, spark (based on this mailing thread)



On Fri, Jan 20, 2017 at 5:57 PM moon soo Lee  wrote:

> Hi,
>
> I think we need to have some policy to decide which interpreter goes into
> zeppelin-bin-min package. And make applying that policy as a part of
> release process.
> Because i can not see any consistent rule except for "it seems" or "i
> guess". And i have no idea how i can explain if somebody ask 'why python is
> not in min package?' 'why xxx is not in min package?'.
>
> If we really want to min package, we must have a policy that gives
> everyone same expectation which goes to min package and which goes not.
> Once we agree on policy we can make it part of the release process.
>
> So, why don't we try define policy together? Here's some idea i can throw.
>
>  a. Min package includes interpreters, binary size less than 10MB
>  b. Min package includes interpreters 5 or more JIRA issue created per
> month.
>  c. Min package includes/exclude interpreter that community decide via
> formal vote.
>
> "10MB", "5 or more" they are number i just made up. We can change them to
> more reasonable numbers.
> Also a,b,c are possible examples. We can refine them, we can use only one,
> we can use all three, we can add more.
>
> My point is, we need to give everyone the same expectation which goes min
> package, which goes not.
> What do you think?
>
> Thanks,
> moon
>
> On Thu, Jan 19, 2017 at 12:47 AM Mina Lee  wrote:
>
> Thank you for sharing your opinion guys.
>
> I like Eric's approach.
> We are planning to provide official docker managed by community.
> There is ongoing work [1] around it, I can focus on this after 0.7.0
> release.
>
> It seems that majority prefers binary package with top used interpreters
> such as spark, md, jdbc.
> I think we can gradually move to providing only netinst package once
> docker is ready.
> For upcoming 0.7.0 release, I'd like to distribute two binary packages:
>   - zeppelin-bin-min(spark, jdbc, md)
>   - zeppelin-bin-netinst(spark only)
>
> [1] https://github.com/apache/zeppelin/pull/1761
>
> Thanks,
> Mina
>
> On Thu, Jan 19, 2017 at 1:57 AM Jongyoul Lee  wrote:
>
> I like to deploy netinst only. And it's good idea that Apache Zeppelin
> supports official docker image with all possible interpreters.
>
> On Wed, Jan 18, 2017 at 7:42 PM, Eric Pugh <
> ep...@opensourceconnections.com> wrote:
>
> Can I throw out an alternate approach?   I feel like the key value of the
> “-all” option is to simplify the life of someone who is new to Zeppelin.
>  If you’re a sophisticated Zeppelin user, then picking and choosing
> interpreters is easy, and you you grok why you want to do that….
>
> However, for myself, when I want to demo Zeppelin, I go straight to one of
> the Docker images, specifically
> https://github.com/dylanmei/docker-zeppelin because it bundles in
> everything.
>
> Would providing a similar Docker image on the “Get Zeppelin” page that
> bundles in all the dependencies and interpreters solve the “how do I try
> Zeppelin in 5 minutes” challenge?  The “Get Zeppelin” page is rather
> daunting page!
>
> Eric
>
>
> On Jan 18, 2017, at 12:00 AM, Mohit Jaggi  wrote:
>
>  Including ALL interpreters is not feasible, not due to download size as
> that is easily increased but because we wouldn't want to couple the release
> cycles as pointed out by Jeff. IMHO a few of the most popular ones should
> be included. Yes it is just one extra step but if a computer can do it why
> make a human suffer? :-)
> Re: spark-packages, Spark does include important and mature functionality
> in its assembly e.g. Csv parser was merged into core spark when it matured.
> I believe Z should do the same.
>
> Sent from my iPhone
>
> On Jan 17, 2017, at 8:05 PM, Jeff Zhang  wrote:
>
>
> Another thing I'd like to talk is that should we move most of interpreters
> out of zeppelin project to somewhere else just like spark do for
> spark-packages, 2 benefits:
>
> 1. Keep the zeppelin project much smaller
> 2. Each interpreter's improvements won't be blocked by the release of
> zeppelin. Interpreters can has its own release cycle as long as
> zeppelin-interpreter 

Re: InvalidClassException using Zeppelin (master) and spark-2.1 on a standalone spark cluster

2017-01-20 Thread Jonathan Kelly
Hi, Antoine, this issue was being tracked in
https://issues.apache.org/jira/browse/ZEPPELIN-1977, but it is now resolved
as of yesterday (looks like about 18 hours ago). Maybe you need to pull
from master again and rebuild?

~ Jonathan

On Fri, Jan 20, 2017 at 1:19 PM Antoine  wrote:

> Hei,
>
> I'm trying to use Zeppelin from the master git branch with spark-2.1 and I
> get an invalid class exeption when I use a standalone spark cluster.
>
> java.io.InvalidClassException:
> org.apache.commons.lang3.time.FastDateParser; local class incompatible:
> stream classdesc serialVersionUID = 2, local class serialVersionUID = 3
>
> To reproduce the error, I configure the spark interpreter to connect to a
> standalone cluster, it works with a local spark, and to load a file. For
> example, spark.read.json("/data/file.json").
>
> I'm building and running Zeppelin with docker, based on the
> dylanmei/docker-zeppelin docker image, but with a few changes to build the
> master branch (npm must be installed and bower needs to be configured to
> run with the root user), and using spark 2.1 instead of spark 2.0.
>
> Can you reproduce the error ? Is there something I'm missing when I build
> Zeppelin ?
>
> Thanks
>


InvalidClassException using Zeppelin (master) and spark-2.1 on a standalone spark cluster

2017-01-20 Thread Antoine
Hei,

I'm trying to use Zeppelin from the master git branch with spark-2.1 and I
get an invalid class exeption when I use a standalone spark cluster.

java.io.InvalidClassException: org.apache.commons.lang3.time.FastDateParser;
local class incompatible: stream classdesc serialVersionUID = 2, local
class serialVersionUID = 3

To reproduce the error, I configure the spark interpreter to connect to a
standalone cluster, it works with a local spark, and to load a file. For
example, spark.read.json("/data/file.json").

I'm building and running Zeppelin with docker, based on the
dylanmei/docker-zeppelin docker image, but with a few changes to build the
master branch (npm must be installed and bower needs to be configured to
run with the root user), and using spark 2.1 instead of spark 2.0.

Can you reproduce the error ? Is there something I'm missing when I build
Zeppelin ?

Thanks


Re: [Discuss] Move some interpreters out of zeppelin project

2017-01-20 Thread moon soo Lee
Thanks Jeff for staring the thread.
Here's my thoughts

1. Do we need to do this
yes.

2. If the answer is yes, which interpreters should be moved out
If Zeppelin community has no problem maintaining certain interpreter, then
no reason to remove contribution from community.
However, if Zeppelin community can not maintain well (e.g. not catching up
target system version update, bug report is not taken care, etc), then we
can consider move out non-maintainable code from community.

3. How do we integrate these interpreters into zeppelin
Helium package description [1] already reserved package type 'INTERPRETER'
for it. And i hope 'helium' becomes a place
finding/installing/uninstalling/upgrading all pluggable modules in
Zeppelin. I can make pullrequest quickly to support INTERPRETER
installation through helium gui menu.

4. How does zeppelin work with these third party interpreters
In the point of view of encouraging 3rd party interpreter,
after 3) is done, Zeppelin-netinst package will display community managed
interpreters and 3rd party interpreters together in helium menu.
And their installation procedure will be exactly the same. (click 'enable'
button and click 'ok' on confirm dialog).

So, user will not see any difference between using community managed
interpreter and using 3rd party interpreter.
And this encourage develop more 3rd party interpreters than community
managed interpreters, i think.

Thanks,
moon

[1]
https://github.com/apache/zeppelin/blob/master/zeppelin-interpreter/src/main/java/org/apache/zeppelin/helium/HeliumPackage.java#L40


On Fri, Jan 20, 2017 at 6:39 AM Jongyoul Lee  wrote:

> Hi Jeff,
>
> Thanks for starting this issue.
>
> It increases flexibility of improving interpreters itself but it can also
> decreases stability of interpreters. I'm worried about this side-effect. As
> you mentioned, it's hard for me to review new interpreter that I didn't use
> but it couldn't be a reason why we divide some code from Zeppelin. We have
> to make more ppl as committers to review various interpreters. Thus I don't
> want some interpreters out of Zeppelin.
>
> But I, totally, agree about #3, #4. If we deploy minimum package of
> Zeppelin, we have to provide GUI for install/uninstall. If it's done,
> bin-all-pkg is meaningless and bin-min-pkg is enough.
>
> On Fri, Jan 20, 2017 at 7:14 PM, Jeff Zhang  wrote:
>
> > As we talk in another thread [1] about moving some interpreters out of
> > zeppelin project. I open this thread to discuss it in more details. I'd
> > like to raise 4 questions for this.
> >
> > 1. Do we need to do this
> > 2. If the answer is yes, which interpreters should be moved out
> > 3. How do we integrate these interpreters into zeppelin
> > 4. How does zeppelin work with these third party interpreters
> >
> > I will first give my inputs on this.
> >
> > *1. Do we need to do this ?*
> > Personally, I strongly +1 on this. Several reasons:
> >
> >- Keep the zeppelin project much smaller
> >- Each interpreter's improvements won't be blocked by the release of
> >zeppelin. Interpreters can has its own release cycle as long as
> >zeppelin-interpreter doesn't break the compatibility.
> >- Zeppelin developer don't have the knowledge of all interpreters.
> >Sometimes it is very difficult for zeppelin committers to review a new
> >interpreter that he doesn't know.
> >
> >
> > 2. Which interpreters should be moved out ?
> > We can discuss it  in another thread about the min package.
> >
> > 3. How do we integrate these interpreters into zeppelin
> > Currently, user can install third party interpreter by running script (
> > http://zeppelin.apache.org/docs/0.7.0-SNAPSHOT/manual/
> > interpreterinstallation.html#3rd-party-interpreters), but this is not
> > convienient, and it is hard to let every user to be aware of this
> feature.
> > So I think we should do that in zeppelin UI. We should allow user to
> > install/uninstall/upgrade/downgrade third party interpreters in the
> > interpreter page.
> >
> > 4. How does zeppelin work with these third party interpreters
> > Besides the interface zeppelin expose to the third party interpreter to
> be
> > install/uninstall/upgrade/downgrade, it is third party interpreter's own
> > responsibility to develop and make new release.
> >
> > Please help comment on these 4 questions and feel free to add any things
> > that I miss.
> >
> >
> > [1] https://lists.apache.org/thread.html/69f606409790d7ba11422e8c6df941
> > a75c5dfae0aca63eccf2f840bf@%3Cusers.zeppelin.apache.org%3E
> >
>
>
>
> --
> 이종열, Jongyoul Lee, 李宗烈
> http://madeng.net
>


Re: [Discuss] Move some interpreters out of zeppelin project

2017-01-20 Thread Jongyoul Lee
Hi Jeff,

Thanks for starting this issue.

It increases flexibility of improving interpreters itself but it can also
decreases stability of interpreters. I'm worried about this side-effect. As
you mentioned, it's hard for me to review new interpreter that I didn't use
but it couldn't be a reason why we divide some code from Zeppelin. We have
to make more ppl as committers to review various interpreters. Thus I don't
want some interpreters out of Zeppelin.

But I, totally, agree about #3, #4. If we deploy minimum package of
Zeppelin, we have to provide GUI for install/uninstall. If it's done,
bin-all-pkg is meaningless and bin-min-pkg is enough.

On Fri, Jan 20, 2017 at 7:14 PM, Jeff Zhang  wrote:

> As we talk in another thread [1] about moving some interpreters out of
> zeppelin project. I open this thread to discuss it in more details. I'd
> like to raise 4 questions for this.
>
> 1. Do we need to do this
> 2. If the answer is yes, which interpreters should be moved out
> 3. How do we integrate these interpreters into zeppelin
> 4. How does zeppelin work with these third party interpreters
>
> I will first give my inputs on this.
>
> *1. Do we need to do this ?*
> Personally, I strongly +1 on this. Several reasons:
>
>- Keep the zeppelin project much smaller
>- Each interpreter's improvements won't be blocked by the release of
>zeppelin. Interpreters can has its own release cycle as long as
>zeppelin-interpreter doesn't break the compatibility.
>- Zeppelin developer don't have the knowledge of all interpreters.
>Sometimes it is very difficult for zeppelin committers to review a new
>interpreter that he doesn't know.
>
>
> 2. Which interpreters should be moved out ?
> We can discuss it  in another thread about the min package.
>
> 3. How do we integrate these interpreters into zeppelin
> Currently, user can install third party interpreter by running script (
> http://zeppelin.apache.org/docs/0.7.0-SNAPSHOT/manual/
> interpreterinstallation.html#3rd-party-interpreters), but this is not
> convienient, and it is hard to let every user to be aware of this feature.
> So I think we should do that in zeppelin UI. We should allow user to
> install/uninstall/upgrade/downgrade third party interpreters in the
> interpreter page.
>
> 4. How does zeppelin work with these third party interpreters
> Besides the interface zeppelin expose to the third party interpreter to be
> install/uninstall/upgrade/downgrade, it is third party interpreter's own
> responsibility to develop and make new release.
>
> Please help comment on these 4 questions and feel free to add any things
> that I miss.
>
>
> [1] https://lists.apache.org/thread.html/69f606409790d7ba11422e8c6df941
> a75c5dfae0aca63eccf2f840bf@%3Cusers.zeppelin.apache.org%3E
>



-- 
이종열, Jongyoul Lee, 李宗烈
http://madeng.net


[Discuss] Move some interpreters out of zeppelin project

2017-01-20 Thread Jeff Zhang
As we talk in another thread [1] about moving some interpreters out of
zeppelin project. I open this thread to discuss it in more details. I'd
like to raise 4 questions for this.

1. Do we need to do this
2. If the answer is yes, which interpreters should be moved out
3. How do we integrate these interpreters into zeppelin
4. How does zeppelin work with these third party interpreters

I will first give my inputs on this.

*1. Do we need to do this ?*
Personally, I strongly +1 on this. Several reasons:

   - Keep the zeppelin project much smaller
   - Each interpreter's improvements won't be blocked by the release of
   zeppelin. Interpreters can has its own release cycle as long as
   zeppelin-interpreter doesn't break the compatibility.
   - Zeppelin developer don't have the knowledge of all interpreters.
   Sometimes it is very difficult for zeppelin committers to review a new
   interpreter that he doesn't know.


2. Which interpreters should be moved out ?
We can discuss it  in another thread about the min package.

3. How do we integrate these interpreters into zeppelin
Currently, user can install third party interpreter by running script (
http://zeppelin.apache.org/docs/0.7.0-SNAPSHOT/manual/interpreterinstallation.html#3rd-party-interpreters),
but this is not convienient, and it is hard to let every user to be aware
of this feature. So I think we should do that in zeppelin UI. We should
allow user to install/uninstall/upgrade/downgrade third party interpreters
in the interpreter page.

4. How does zeppelin work with these third party interpreters
Besides the interface zeppelin expose to the third party interpreter to be
install/uninstall/upgrade/downgrade, it is third party interpreter's own
responsibility to develop and make new release.

Please help comment on these 4 questions and feel free to add any things
that I miss.


[1]
https://lists.apache.org/thread.html/69f606409790d7ba11422e8c6df941a75c5dfae0aca63eccf2f840bf@%3Cusers.zeppelin.apache.org%3E


Re: [DISCUSS] Release package size

2017-01-20 Thread moon soo Lee
Hi,

I think we need to have some policy to decide which interpreter goes into
zeppelin-bin-min package. And make applying that policy as a part of
release process.
Because i can not see any consistent rule except for "it seems" or "i
guess". And i have no idea how i can explain if somebody ask 'why python is
not in min package?' 'why xxx is not in min package?'.

If we really want to min package, we must have a policy that gives everyone
same expectation which goes to min package and which goes not. Once we
agree on policy we can make it part of the release process.

So, why don't we try define policy together? Here's some idea i can throw.

 a. Min package includes interpreters, binary size less than 10MB
 b. Min package includes interpreters 5 or more JIRA issue created per
month.
 c. Min package includes/exclude interpreter that community decide via
formal vote.

"10MB", "5 or more" they are number i just made up. We can change them to
more reasonable numbers.
Also a,b,c are possible examples. We can refine them, we can use only one,
we can use all three, we can add more.

My point is, we need to give everyone the same expectation which goes min
package, which goes not.
What do you think?

Thanks,
moon

On Thu, Jan 19, 2017 at 12:47 AM Mina Lee  wrote:

> Thank you for sharing your opinion guys.
>
> I like Eric's approach.
> We are planning to provide official docker managed by community.
> There is ongoing work [1] around it, I can focus on this after 0.7.0
> release.
>
> It seems that majority prefers binary package with top used interpreters
> such as spark, md, jdbc.
> I think we can gradually move to providing only netinst package once
> docker is ready.
> For upcoming 0.7.0 release, I'd like to distribute two binary packages:
>   - zeppelin-bin-min(spark, jdbc, md)
>   - zeppelin-bin-netinst(spark only)
>
> [1] https://github.com/apache/zeppelin/pull/1761
>
> Thanks,
> Mina
>
> On Thu, Jan 19, 2017 at 1:57 AM Jongyoul Lee  wrote:
>
> I like to deploy netinst only. And it's good idea that Apache Zeppelin
> supports official docker image with all possible interpreters.
>
> On Wed, Jan 18, 2017 at 7:42 PM, Eric Pugh <
> ep...@opensourceconnections.com> wrote:
>
> Can I throw out an alternate approach?   I feel like the key value of the
> “-all” option is to simplify the life of someone who is new to Zeppelin.
>  If you’re a sophisticated Zeppelin user, then picking and choosing
> interpreters is easy, and you you grok why you want to do that….
>
> However, for myself, when I want to demo Zeppelin, I go straight to one of
> the Docker images, specifically
> https://github.com/dylanmei/docker-zeppelin because it bundles in
> everything.
>
> Would providing a similar Docker image on the “Get Zeppelin” page that
> bundles in all the dependencies and interpreters solve the “how do I try
> Zeppelin in 5 minutes” challenge?  The “Get Zeppelin” page is rather
> daunting page!
>
> Eric
>
>
> On Jan 18, 2017, at 12:00 AM, Mohit Jaggi  wrote:
>
>  Including ALL interpreters is not feasible, not due to download size as
> that is easily increased but because we wouldn't want to couple the release
> cycles as pointed out by Jeff. IMHO a few of the most popular ones should
> be included. Yes it is just one extra step but if a computer can do it why
> make a human suffer? :-)
> Re: spark-packages, Spark does include important and mature functionality
> in its assembly e.g. Csv parser was merged into core spark when it matured.
> I believe Z should do the same.
>
> Sent from my iPhone
>
> On Jan 17, 2017, at 8:05 PM, Jeff Zhang  wrote:
>
>
> Another thing I'd like to talk is that should we move most of interpreters
> out of zeppelin project to somewhere else just like spark do for
> spark-packages, 2 benefits:
>
> 1. Keep the zeppelin project much smaller
> 2. Each interpreter's improvements won't be blocked by the release of
> zeppelin. Interpreters can has its own release cycle as long as
> zeppelin-interpreter doesn't break the compatibility.
>
> If it make sense, I can open another thread to discuss it.
>
>
>
>
> Jun Kim 于2017年1月18日周三 上午11:55写道:
>
> +1 for Jeff's idea! I also use the three interpreters mainly :)
>
> 2017년 1월 18일 (수) 오후 12:52, Jeff Zhang 님이 작성:
>
>
> How about also include markdown and jdbc interpreter if this won't cause
> binary distribution much bigger ? I guess spark, markdown, and jdbc
> interpreters are the top 3 interpreters in zeppelin.
>
>
>
> Ahyoung Ryu 于2017年1月18日周三 上午11:33写道:
>
> Thanks Mina always!
> +1 for releasing only netinst package.
>
> On Wed, Jan 18, 2017 at 12:29 PM, Prabhjyot Singh <
> prabhjyotsi...@apache.org> wrote:
>
> +1
>
> I don't think it's a problem now, but if it keeps increasing then in the
> subsequent releases we can ship Zeppelin with few interpreters, and mark
> others as plugins that can be downloaded later with