Re: [DISCUSS] Release package size

2017-01-17 Thread moon soo Lee
Hi,

+1 for releasing netinst package only.

Regarding make binary package only some packages, like spark, markdown,
jdbc, we have discussed having minimal package in [1].
And i still think it's very difficult to decide which interpreter need to
be included which is not. For example i prefer to have 'sh' and 'python' be
included too and some people might have other opinions. And it's difficult
to say why some interpreters included but the other interpreters can not be
included in binary release, unless we have some policy that everyone agree.

Regarding 3rd party interpreter,
Nothing stops build interpreter in a separate project. Zeppelin's
interpreter installation script [2] supports 3rd party interpreter and
Zeppelin already capable of loading 3rd party interpreter binary. However,
i haven't seen many people using this feature. I also have some idea how we
can encourage making 3rd party interpreter. Let's open separate thread and
discuss there.

Thanks,
moon

[1]
https://lists.apache.org/thread.html/4b54c034cf8d691655156e0cb647243180c57a6829d97aa3c085b63c@%3Cusers.zeppelin.apache.org%3E
[2]
http://zeppelin.apache.org/docs/0.7.0-SNAPSHOT/manual/interpreterinstallation.html#3rd-party-interpreters


On Tue, Jan 17, 2017 at 8:05 PM Jeff Zhang  wrote:

>
> Another thing I'd like to talk is that should we move most of interpreters
> out of zeppelin project to somewhere else just like spark do for
> spark-packages, 2 benefits:
>
> 1. Keep the zeppelin project much smaller
> 2. Each interpreter's improvements won't be blocked by the release of
> zeppelin. Interpreters can has its own release cycle as long as
> zeppelin-interpreter doesn't break the compatibility.
>
> If it make sense, I can open another thread to discuss it.
>
>
>
>
> Jun Kim 于2017年1月18日周三 上午11:55写道:
>
> +1 for Jeff's idea! I also use the three interpreters mainly :)
>
> 2017년 1월 18일 (수) 오후 12:52, Jeff Zhang 님이 작성:
>
>
> How about also include markdown and jdbc interpreter if this won't cause
> binary distribution much bigger ? I guess spark, markdown, and jdbc
> interpreters are the top 3 interpreters in zeppelin.
>
>
>
> Ahyoung Ryu 于2017年1月18日周三 上午11:33写道:
>
> Thanks Mina always!
> +1 for releasing only netinst package.
>
> On Wed, Jan 18, 2017 at 12:29 PM, Prabhjyot Singh <
> prabhjyotsi...@apache.org> wrote:
>
> +1
>
> I don't think it's a problem now, but if it keeps increasing then in the
> subsequent releases we can ship Zeppelin with few interpreters, and mark
> others as plugins that can be downloaded later with instructions with how
> to configure.
>
> On Jan 18, 2017 8:54 AM, "Jun Kim"  wrote:
>
> +1
>
> I think it won't be a problem if we notice it clear.
> Maybe we can do that next to the download button here (
> http://zeppelin.apache.org/download.html)
> A message may be "NOTE: only spark interpreter included since 0.7.0. If
> you want other interpreters, please see interpreter installation guide"
>
> 2017년 1월 18일 (수) 오후 12:14, Jeff Zhang 님이 작성:
>
>
> +1, we should also mention it in release note and in the 0.7 doc
>
>
>
> Mina Lee 于2017年1月18日周三 上午11:12写道:
>
> Hi all,
>
> Zeppelin is about to start 0.7.0 release process, I would like to discuss
> about binary package distribution.
>
> Every time we distribute new binary package, size of the
> zeppelin-0.x.x-bin-all.tgz package is getting bigger:
>- zeppelin-0.6.0-bin-all.tgz: 506M
>- zeppelin-0.6.1-bin-all.tgz: 517M
>- zeppelin-0.6.2-bin-all.tgz: 547M
>- zeppelin-0.7.0-bin-all.tgz: 720M (Expected)
>
> Mostly it is because the number of interpreters supported by zeppelin
> keeps growing,
> and there is high chance that we support more interpreters in the near
> future.
> So instead of asking apache infra team to increase limit,
> I would like to suggest to have only zeppelin-0.7.0-bin-netinst.tgz, which
> only includes spark interpreter from 0.7.0 release.
> One concern is that users need one more step to install the interpreters
> they use,
> but I believe it can be done easily with single line of command [1].
>
> FYI, attaching the link of similar discussion [2] we had last June in
> mailing list.
>
> Regards,
> Mina
>
> [1]
> http://zeppelin.apache.org/docs/0.6.2/manual/interpreterinstallation.html#install-specific-interpreters
> 
> [2]
> https://lists.apache.org/thread.html/4b54c034cf8d691655156e0cb647243180c57a6829d97aa3c085b63c@%3Cusers.zeppelin.apache.org%3E
>
> --
> Taejun Kim
>
> Data Mining Lab.
> School of Electrical and Computer Engineering
> University of Seoul
>
>
> --
> Taejun Kim
>
> Data Mining Lab.
> School of Electrical and Computer Engineering
> University of Seoul
>
>


Re: [DISCUSS] Release package size

2017-01-17 Thread Jeff Zhang
Another thing I'd like to talk is that should we move most of interpreters
out of zeppelin project to somewhere else just like spark do for
spark-packages, 2 benefits:

1. Keep the zeppelin project much smaller
2. Each interpreter's improvements won't be blocked by the release of
zeppelin. Interpreters can has its own release cycle as long as
zeppelin-interpreter doesn't break the compatibility.

If it make sense, I can open another thread to discuss it.




Jun Kim 于2017年1月18日周三 上午11:55写道:

> +1 for Jeff's idea! I also use the three interpreters mainly :)
>
> 2017년 1월 18일 (수) 오후 12:52, Jeff Zhang 님이 작성:
>
>
> How about also include markdown and jdbc interpreter if this won't cause
> binary distribution much bigger ? I guess spark, markdown, and jdbc
> interpreters are the top 3 interpreters in zeppelin.
>
>
>
> Ahyoung Ryu 于2017年1月18日周三 上午11:33写道:
>
> Thanks Mina always!
> +1 for releasing only netinst package.
>
> On Wed, Jan 18, 2017 at 12:29 PM, Prabhjyot Singh <
> prabhjyotsi...@apache.org> wrote:
>
> +1
>
> I don't think it's a problem now, but if it keeps increasing then in the
> subsequent releases we can ship Zeppelin with few interpreters, and mark
> others as plugins that can be downloaded later with instructions with how
> to configure.
>
> On Jan 18, 2017 8:54 AM, "Jun Kim"  wrote:
>
> +1
>
> I think it won't be a problem if we notice it clear.
> Maybe we can do that next to the download button here (
> http://zeppelin.apache.org/download.html)
> A message may be "NOTE: only spark interpreter included since 0.7.0. If
> you want other interpreters, please see interpreter installation guide"
>
> 2017년 1월 18일 (수) 오후 12:14, Jeff Zhang 님이 작성:
>
>
> +1, we should also mention it in release note and in the 0.7 doc
>
>
>
> Mina Lee 于2017年1月18日周三 上午11:12写道:
>
> Hi all,
>
> Zeppelin is about to start 0.7.0 release process, I would like to discuss
> about binary package distribution.
>
> Every time we distribute new binary package, size of the
> zeppelin-0.x.x-bin-all.tgz package is getting bigger:
>- zeppelin-0.6.0-bin-all.tgz: 506M
>- zeppelin-0.6.1-bin-all.tgz: 517M
>- zeppelin-0.6.2-bin-all.tgz: 547M
>- zeppelin-0.7.0-bin-all.tgz: 720M (Expected)
>
> Mostly it is because the number of interpreters supported by zeppelin
> keeps growing,
> and there is high chance that we support more interpreters in the near
> future.
> So instead of asking apache infra team to increase limit,
> I would like to suggest to have only zeppelin-0.7.0-bin-netinst.tgz, which
> only includes spark interpreter from 0.7.0 release.
> One concern is that users need one more step to install the interpreters
> they use,
> but I believe it can be done easily with single line of command [1].
>
> FYI, attaching the link of similar discussion [2] we had last June in
> mailing list.
>
> Regards,
> Mina
>
> [1]
> http://zeppelin.apache.org/docs/0.6.2/manual/interpreterinstallation.html#install-specific-interpreters
> 
> [2]
> https://lists.apache.org/thread.html/4b54c034cf8d691655156e0cb647243180c57a6829d97aa3c085b63c@%3Cusers.zeppelin.apache.org%3E
>
> --
> Taejun Kim
>
> Data Mining Lab.
> School of Electrical and Computer Engineering
> University of Seoul
>
>
> --
> Taejun Kim
>
> Data Mining Lab.
> School of Electrical and Computer Engineering
> University of Seoul
>


Re: [DISCUSS] Release package size

2017-01-17 Thread Jeff Zhang
How about also include markdown and jdbc interpreter if this won't cause
binary distribution much bigger ? I guess spark, markdown, and jdbc
interpreters are the top 3 interpreters in zeppelin.



Ahyoung Ryu 于2017年1月18日周三 上午11:33写道:

> Thanks Mina always!
> +1 for releasing only netinst package.
>
> On Wed, Jan 18, 2017 at 12:29 PM, Prabhjyot Singh <
> prabhjyotsi...@apache.org> wrote:
>
> +1
>
> I don't think it's a problem now, but if it keeps increasing then in the
> subsequent releases we can ship Zeppelin with few interpreters, and mark
> others as plugins that can be downloaded later with instructions with how
> to configure.
>
> On Jan 18, 2017 8:54 AM, "Jun Kim"  wrote:
>
> +1
>
> I think it won't be a problem if we notice it clear.
> Maybe we can do that next to the download button here (
> http://zeppelin.apache.org/download.html)
> A message may be "NOTE: only spark interpreter included since 0.7.0. If
> you want other interpreters, please see interpreter installation guide"
>
> 2017년 1월 18일 (수) 오후 12:14, Jeff Zhang 님이 작성:
>
>
> +1, we should also mention it in release note and in the 0.7 doc
>
>
>
> Mina Lee 于2017年1月18日周三 上午11:12写道:
>
> Hi all,
>
> Zeppelin is about to start 0.7.0 release process, I would like to discuss
> about binary package distribution.
>
> Every time we distribute new binary package, size of the
> zeppelin-0.x.x-bin-all.tgz package is getting bigger:
>- zeppelin-0.6.0-bin-all.tgz: 506M
>- zeppelin-0.6.1-bin-all.tgz: 517M
>- zeppelin-0.6.2-bin-all.tgz: 547M
>- zeppelin-0.7.0-bin-all.tgz: 720M (Expected)
>
> Mostly it is because the number of interpreters supported by zeppelin
> keeps growing,
> and there is high chance that we support more interpreters in the near
> future.
> So instead of asking apache infra team to increase limit,
> I would like to suggest to have only zeppelin-0.7.0-bin-netinst.tgz, which
> only includes spark interpreter from 0.7.0 release.
> One concern is that users need one more step to install the interpreters
> they use,
> but I believe it can be done easily with single line of command [1].
>
> FYI, attaching the link of similar discussion [2] we had last June in
> mailing list.
>
> Regards,
> Mina
>
> [1]
> http://zeppelin.apache.org/docs/0.6.2/manual/interpreterinstallation.html#install-specific-interpreters
> 
> [2]
> https://lists.apache.org/thread.html/4b54c034cf8d691655156e0cb647243180c57a6829d97aa3c085b63c@%3Cusers.zeppelin.apache.org%3E
>
> --
> Taejun Kim
>
> Data Mining Lab.
> School of Electrical and Computer Engineering
> University of Seoul
>
>
>


[DISCUSS] Release package size

2017-01-17 Thread Mina Lee
Hi all,

Zeppelin is about to start 0.7.0 release process, I would like to discuss
about binary package distribution.

Every time we distribute new binary package, size of the
zeppelin-0.x.x-bin-all.tgz package is getting bigger:
   - zeppelin-0.6.0-bin-all.tgz: 506M
   - zeppelin-0.6.1-bin-all.tgz: 517M
   - zeppelin-0.6.2-bin-all.tgz: 547M
   - zeppelin-0.7.0-bin-all.tgz: 720M (Expected)

Mostly it is because the number of interpreters supported by zeppelin keeps
growing,
and there is high chance that we support more interpreters in the near
future.
So instead of asking apache infra team to increase limit,
I would like to suggest to have only zeppelin-0.7.0-bin-netinst.tgz, which
only includes spark interpreter from 0.7.0 release.
One concern is that users need one more step to install the interpreters
they use,
but I believe it can be done easily with single line of command [1].

FYI, attaching the link of similar discussion [2] we had last June in
mailing list.

Regards,
Mina

[1]
http://zeppelin.apache.org/docs/0.6.2/manual/interpreterinstallation.html#install-specific-interpreters

[2]
https://lists.apache.org/thread.html/4b54c034cf8d691655156e0cb647243180c57a6829d97aa3c085b63c@%3Cusers.zeppelin.apache.org%3E


Re: 'File size limit Exceeded' when importing notes - even for small files

2017-01-17 Thread Alexander Bezzubov
Hi,

this deflinitly looks like a regredsion/bug,
Ruslan, would you mind creating a JIRA issue?

Paul, thanks for sharing notebook size reduction pro-tip!

--
Alex

On Wed, Jan 18, 2017, 10:04 Paul Brenner  wrote:

> Just a tip that when I ran into this problem I found that using the “clear
> output” button and then exporting my notebook made it easy to get below the
> size limit. Not very helpful if you need ALL the output, but maybe you can
> selectively clear output from some paragraphs?
>
>  
>  Paul Brenner 
>  
>  
> 
> 
> DATA SCIENTIST
> *(217) 390-3033 *
>
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> [image:
> PlaceIQ:Location Data Accuracy]
> 
>
> On Tue, Jan 17, 2017 at 4:55 PM Ruslan Dautkhanov  > wrote:
>
> From the screenshot "JSON file size cannot exceed MB".
> Notice there is no number between "exceed" and "MB".
> Not sure if we're missing a setting or an environment variable to define
> the limit?
> It now prevents us from importing any notebooks.
>
>
>
> --
> Ruslan Dautkhanov
>
> On Tue, Jan 17, 2017 at 11:54 AM, Ruslan Dautkhanov 
> wrote:
>
> 'File size limit Exceeded' when importing notes - even for small files
>
> This happens even for tiny files - a few Kb.
>
> Is this a known issue?
>
> Running Zeppelin 0.7.0 from a few weeks old snapshot.
>
> See attached screenshot.
>
>
> --
> Ruslan Dautkhanov
>
>
>
>


Re: spark 2.1 and commons.lang3

2017-01-17 Thread Jeff Zhang
What issue do you see ? Can you paste the log and tell how to reproduce it ?



Sherif Akoush 于2017年1月18日周三 上午3:03写道:

> Hi,
>
> spark 2.1 uses commons.lang3 ver 3.5 while zeppelin master still used
> ver 3.4. This mismatch I guess causes executors to fail. Is there a
> requirement for zeppelin to use ver 3.4?
>
> Regards,
> Sherif
>


Re: 'File size limit Exceeded' when importing notes - even for small files

2017-01-17 Thread Paul Brenner
Just a tip that when I ran into this problem I found that using the “clear 
output” button and then exporting my notebook made it easy to get below the 
size limit. Not very helpful if you need ALL the output, but maybe you can 
selectively clear output from some paragraphs?

http://www.placeiq.com/ http://www.placeiq.com/ http://www.placeiq.com/

Paul Brenner

https://twitter.com/placeiq https://twitter.com/placeiq 
https://twitter.com/placeiq
https://www.facebook.com/PlaceIQ https://www.facebook.com/PlaceIQ
https://www.linkedin.com/company/placeiq 
https://www.linkedin.com/company/placeiq

DATA SCIENTIST

(217) 390-3033 

 

http://www.placeiq.com/2015/05/26/placeiq-named-winner-of-prestigious-2015-oracle-data-cloud-activate-award/
 
http://placeiq.com/2015/12/18/accuracy-vs-precision-in-location-data-mma-webinar/
 
http://placeiq.com/2015/12/18/accuracy-vs-precision-in-location-data-mma-webinar/
 
http://placeiq.com/2015/12/18/accuracy-vs-precision-in-location-data-mma-webinar/
 
http://placeiq.com/2015/12/18/accuracy-vs-precision-in-location-data-mma-webinar/
 
http://placeiq.com/2016/03/08/measuring-addressable-tv-campaigns-is-now-possible/
 
http://placeiq.com/2016/04/13/placeiq-joins-the-network-advertising-initiative-nai-as-100th-member/
 
http://placeiq.com/2016/04/13/placeiq-joins-the-network-advertising-initiative-nai-as-100th-member/
 
http://placeiq.com/2016/04/13/placeiq-joins-the-network-advertising-initiative-nai-as-100th-member/
 
http://placeiq.com/2016/04/13/placeiq-joins-the-network-advertising-initiative-nai-as-100th-member/
 
http://placeiq.com/2016/04/13/placeiq-joins-the-network-advertising-initiative-nai-as-100th-member/
 
http://pages.placeiq.com/Location-Data-Accuracy-Whitepaper-Download.html?utm_source=Signature_medium=Email_campaign=AccuracyWP
 
http://placeiq.com/2016/08/03/placeiq-bolsters-location-intelligence-platform-with-mastercard-insights/
 
http://placeiq.com/2016/10/26/the-making-of-a-location-data-industry-milestone/ 
http://placeiq.com/2016/12/07/placeiq-introduces-landmark-a-groundbreaking-offering-that-delivers-access-to-the-highest-quality-location-data-for-insights-that-fuel-limitless-business-decisions/

On Tue, Jan 17, 2017 at 4:55 PM Ruslan Dautkhanov

<
mailto:Ruslan Dautkhanov 
> wrote:

a, pre, code, a:link, body { word-wrap: break-word !important; }

>From the screenshot "JSON file size cannot exceed MB".

Notice there is no number between "exceed" and "MB".

Not sure if we're missing a setting or an environment variable to define the 
limit?

It now prevents us from importing any notebooks.

--

Ruslan Dautkhanov

On Tue, Jan 17, 2017 at 11:54 AM, Ruslan Dautkhanov

<
mailto:dautkha...@gmail.com
>

wrote:

'File size limit Exceeded' when importing notes - even for small files

This happens even for tiny files - a few Kb.

Is this a known issue?

Running Zeppelin 0.7.0 from a few weeks old snapshot.

See attached screenshot.

--

Ruslan Dautkhanov

Re: 'File size limit Exceeded' when importing notes - even for small files

2017-01-17 Thread Ruslan Dautkhanov
>From the screenshot "JSON file size cannot exceed MB".
Notice there is no number between "exceed" and "MB".
Not sure if we're missing a setting or an environment variable to define
the limit?
It now prevents us from importing any notebooks.



-- 
Ruslan Dautkhanov

On Tue, Jan 17, 2017 at 11:54 AM, Ruslan Dautkhanov 
wrote:

> 'File size limit Exceeded' when importing notes - even for small files
>
> This happens even for tiny files - a few Kb.
>
> Is this a known issue?
>
> Running Zeppelin 0.7.0 from a few weeks old snapshot.
>
> See attached screenshot.
>
>
> --
> Ruslan Dautkhanov
>


spark 2.1 and commons.lang3

2017-01-17 Thread Sherif Akoush
Hi,

spark 2.1 uses commons.lang3 ver 3.5 while zeppelin master still used
ver 3.4. This mismatch I guess causes executors to fail. Is there a
requirement for zeppelin to use ver 3.4?

Regards,
Sherif


Latest keyboard shortcuts

2017-01-17 Thread Stephen Boesch
There was an old jira for keyboard shortcuts. But there did not appear to
be an associated document

https://issues.apache.org/jira/browse/ZEPPELIN-391

Is there a comprehensive cheat-sheet for the shortcuts?  Especially to
compare to the excellent jupyter keyboard shortcuts; e.g. dd to delete a
cell.

Thanks!

stephenb


Accessing Zeppelin context in Pyspark interpreter

2017-01-17 Thread Deenar Toraskar
Hi

Is it possible to access Zeppelin context via the Pyspark interpreter. Not
all the method available via the Spark Scala interpreter seem to be
available in the Pyspark one (unless i am doing something wrong). I would
like to do something like this from the Pyspark interpreter.

z.show(df, 100)

or

z.run(z.listParagraphs.indexOf(z.getInterpreterContext().getParagraphId())+1)