Re: install helium packages

2019-03-21 Thread Jhon Anderson Cardenas Diaz
Is there something helium-related in the zeppelin logs once you start it ?
If there is some problem it should say.

Btw, once you open a notebook, if it has associated helium plugins,
zeppelin tries to download and install node and yarn on runtime, that makes
me wonder if it will works without internet connection.

El jue., 21 mar. 2019 a las 16:24, Lian Jiang ()
escribió:

> Any clue is highly appreciated!
>
> On Wed, Mar 20, 2019 at 9:27 PM Lian Jiang  wrote:
>
>> Hi,
>>
>> I am using Horton work HDP3.0 which has zeppelin 0.8.0. I followed
>> https://zeppelin.apache.org/docs/0.8.0/development/helium/writing_visualization_basic.html
>> to install helium viz packages. Since my environment does not have internet
>> access, I have to install packages into local registry. Here is what I did:
>>
>> step 1. added zeppelin.helium.localregistry.default in custom
>> zeppelin-site settings, pointing to a local helium folder on the zeppelin
>> master host. Restart zeppelin in ambari.
>> step 2. download https://s3.amazonaws.com/helium-package/helium.json to
>> the helium folder.
>> step 3. create a npm package tarball, upload the zeppelin master host,
>> unzip into the helium folder. I also updated the artifact to point to local
>> npm folder (e.g. from sogou-map-vis@1.0.0 to /u01/helium/sogou-map-vis)
>>
>> However, zeppelin's helium page does not list any viz packages. I
>> searched zeppelin.helium.localregistry.default in zeppelin source code
>> and found it only exist in the document writing_visualization_basic.md.
>>
>> What am I missing? Thanks a lot.
>>
>>


Change interpreter execution root path

2018-10-20 Thread Jhon Anderson Cardenas Diaz
Hi

Do you know how can I change the folder path where the interpreters are
executed?.

The reason why I want to change that default location (which is
$ZEPPELIN_HOME) is because we are getting very large core dumps files in
that location when the interpreter process die.

As we are in a k8s ecosystem, If we change the run time location to a
subfolder we could mount a volume in that location to avoid zeppelin die
because of full disk space.

We already tried to change the core dumps file location with
/proc/sys/kernel/core_pattern
but this does not work because is running in a docker container inside a
k8s ecosystem.

Thanks!


Re: Can not connect to a remote spark master

2018-10-20 Thread Jhon Anderson Cardenas Diaz
Hi, You can specify it in the zeppelin-env.sh, or in the Dockerfile.

Zeppelin will look for that variable first in the interpreter settings, and
if it does not find it, it will look for it on zeppelin environment
variables; so you can specify it in both sides, but as it does not change
frenquently it is better on zeppelin environment variable.

El sáb., 20 oct. 2018 a las 0:25, Alex Dzhagriev ()
escribió:

> Thanks for the quick reply. Should I specify it to the Zeppelin process or
> the Spark interpreter?
>
> Thanks, Alex.
>
> On Fri, Oct 19, 2018 at 4:53 PM Jeff Zhang  wrote:
>
>> You need to specify SPARK_HOME which is where spark installed.
>>
>>
>> Alex Dzhagriev 于2018年10月20日周六 上午3:12写道:
>>
>>> Hello,
>>>
>>> I have a remote Spark cluster and I'm trying to use it by setting the
>>> spark interpreter property:
>>>
>>> master spark://spark-cluster-master:7077, however I'm getting the
>>> following error:
>>>
>>> java.lang.RuntimeException: SPARK_HOME is not specified in
>>> interpreter-setting for non-local mode, if you specify it in
>>> zeppelin-env.sh, please move that into interpreter setting
>>>
>>> version: Docker Image 0.8.0
>>>
>>> Thanks, Alex.
>>>
>>


Re: zeppeling asking to login

2018-08-15 Thread Jhon Anderson Cardenas Diaz
If you are using shiro, you also can check your config is in this way:

...
/** = anon
#/** = authc

El mié., 15 ago. 2018 a las 21:44, Jhon Anderson Cardenas Diaz (<
jhonderson2...@gmail.com>) escribió:

> Hi,
>
> Check if you have the file conf/zeppelin-site.xml and then validate that
> the value of the property zeppelin.anonymous.allowed be 'true' (as default).
>
> Regards.
>
> El mié., 15 ago. 2018 a las 16:32, Mohit Jaggi ()
> escribió:
>
>> I downloaded Z 0.7.2 and started it on my mac. It is asking me to login.
>> I had another directory which is several months old with the same release
>> and it used to work with anonymous access. Even that one now asks for a
>> username/password.
>>
>> What am I doing wrong?
>>
>


Re: zeppeling asking to login

2018-08-15 Thread Jhon Anderson Cardenas Diaz
Hi,

Check if you have the file conf/zeppelin-site.xml and then validate that
the value of the property zeppelin.anonymous.allowed be 'true' (as default).

Regards.

El mié., 15 ago. 2018 a las 16:32, Mohit Jaggi ()
escribió:

> I downloaded Z 0.7.2 and started it on my mac. It is asking me to login. I
> had another directory which is several months old with the same release and
> it used to work with anonymous access. Even that one now asks for a
> username/password.
>
> What am I doing wrong?
>


Re: Paragraphs outputs from other notebooks/paragraphs

2018-08-02 Thread Jhon Anderson Cardenas Diaz
Mostly spark interpreter.

It is very difficult to reproduce it and logs does not help too much. I
think is when multiple users use zeppelin at the same time.

Do you know what component inside zeppelin manages the paragraph output ?
the interpreters implementation maybe?

El jue., 2 ago. 2018 a las 20:07, Jeff Zhang () escribió:

>
> This is the first time I see user reporting this issue, what interpreter
> do you use ? Is it easy to reproduce ?
>
>
> Jhon Anderson Cardenas Diaz 于2018年8月3日周五
> 上午12:34写道:
>
>> Hi!
>>
>> Has someone else experimented this problem?:
>>
>> Sometimes *when a paragraph is executed it shows random output from
>> another notebook* (from other users also).
>>
>> We are using zeppelin 0.7.3 and Spark and all other interpreters are
>> configured in "Per User - Scoped" mode..
>>
>> Regards.
>>
>


Paragraphs outputs from other notebooks/paragraphs

2018-08-02 Thread Jhon Anderson Cardenas Diaz
Hi!

Has someone else experimented this problem?:

Sometimes *when a paragraph is executed it shows random output from another
notebook* (from other users also).

We are using zeppelin 0.7.3 and Spark and all other interpreters are
configured in "Per User - Scoped" mode..

Regards.


Zeppelin starting time - tied to the load of notebooks loading

2018-07-04 Thread Jhon Anderson Cardenas Diaz
Hi!.

Right now the Zeppelin starting time depends directly on the time it takes
to load the notebooks from the repository. If the user has a lot of
notebooks (ex more than 1000), the starting time starts to be too long.

Is there some plan to re implement this notebooks loading so that it is
done asynchronously?. Or this is not a problem for zeppelin users.

Thanks.


Re: Zeppelin code can access FileSystem

2018-05-10 Thread Jhon Anderson Cardenas Diaz
Yes I did the sudoers configuration and i am using zeppelin user (not root)
to execute that command, the problem is that the command is executed using
sudo (*sudo* -E -H -u  bash -c "...") so it will be executed as root
user anyways as i show you in ps aux results.
Regards.

2018-05-10 14:48 GMT-05:00 Sam Nicholson :

> Well, I don't recommend running as root.
> That's why I went to the trouble to set up zeppelin as a sudoer.
>
> If you don't make this adjustment, yes, you have to run as root,
> or you have to do the ssh key method.
>
> It's always the case that something has to run with elevated
> privilege to allow userID changes at runtime.
>
> With JEE, the best that can be done, today, is to isolate executable
> userIDs
> from the main process user ID.
>
> In general, exposing shells to the web is problematic vis-a-vis security.
>
> Cheers!
> -sam
>


Re: Zeppelin code can access FileSystem

2018-05-10 Thread Jhon Anderson Cardenas Diaz
Thank you again Sam, after following your instructions it seems to be
working but there is still a security concern, the main process that starts
the interpreters would be running with root user right ?. For example with
python interpreter the process would be this ($ps auxwww):

*root*   203  0.0  0.2  87768  4468 ?S15:32   0:00 *sudo -E
-H -u interpreteruser bash -c*  /usr/java/jdk1.8.0_131/bin/java
-Dfile.encoding=UTF-8
-Dlog4j.configuration=file:///usr/zeppelin/conf/log4j.properties
-Dzeppelin.log.file=/usr/zeppelin/logs/zeppelin-interpreter-python-interpreteruser-python--XX.log
-cp :/usr/zeppelin/interpreter/python/*:/usr/zeppelin/lib/interpreter/*:
*org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer* XX.X.X.X


*interpr+*   204  1.0  4.2 4579156 87904 ?   Sl   15:32   0:02
/usr/java/jdk1.8.0_131/bin/java -Dfile.encoding=UTF-8
-Dlog4j.configuration=file:///usr/zeppelin/conf/log4j.properties
-Dzeppelin.log.file=/usr/zeppelin/logs/zeppelin-interpreter-python-interpreteruser-python--XX.log
-cp :/usr/zeppelin/interpreter/python/*:/usr/zeppelin/lib/interpreter/*:
*org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer* XX.X.X.X


I would think that is another security issue of this approach.What do you
think about it?

2018-05-09 12:53 GMT-05:00 Jhon Anderson Cardenas Diaz <
jhonderson2...@gmail.com>:

>
> -- Forwarded message -
> From: Sam Nicholson <sam...@ogt11.com>
> Date: mié., may. 9, 2018 12:04
> Subject: Re: Zeppelin code can access FileSystem
> To: <users@zeppelin.apache.org>
>
>
> Yes, I believe that jira report was about keeping users isolated from each
> other.
> And with user impersonation, and the method I outlined just now, this
> works well.
>
> AND this keeps the shell you fire up from accessing the zeppelin files.
>
> BUT, this is not a zeppelin problem.  This is a JEE problem.  Java has no
> native mechanism
> to set/change userID.  So, while you can sudo / su -c the web application
> upon startup, it cannot
> change itself later.  So, if it needs the filesystem for ANY reason, it'll
> have to start with a userid
> that has filesystem permissions.
>
> This is, IMO, the real problem behind the Spring breakage at Equifax.  If
> the app server default
> userID is leaked, not only can you login, but you can MODIFY the
> application filesystem if you
> can get a shell.
>
> So, I think the Zeppelin team has done an excellent job of mitigating the
> problem as best as
> can be done within the JEE system.  (This is true of tomcat, jetty,
> whathaveyou servlet container.)
>
> Because, by default, Zeppelin gives you shells.  R, Python, sh, all have
> full UNIX abilities, as
> do many other shells.
>
> I'm going to write up a Jira request to have the default interpreter
> settings in a config file.  If
> one is truly paranoid, then just having the server running while one sets
> the interpreter settings
> seems risky.
>
> In short:
>
> Enable user impersonation
> Put zeppelin users in a zeppelin group
> Allow zeppelin sudo to only zeppelin group members
> Ensure zeppelin group members cannot sudo without password and cannot ssh
> without password
> Set shell context as per-user in isolated process
> Set shell.working.directory.user.home to true
>
> And do this for all compatible interpreters.
>
> Cheers!
> -sam
>
> On Wed, May 9, 2018 at 10:17 AM, Jhon Anderson Cardenas Diaz <
> jhonderson2...@gmail.com> wrote:
>
>> Thank you Sam. Reviewing the jira issues, I found that issue was
>> previously identified in this jira ticket ZEPPELIN-1320
>> <https://issues.apache.org/jira/browse/ZEPPELIN-1320>, but i don't know
>> if is my impression but it seems like they focused more on the fact that
>> the processes could not access the directories of other users than on the
>> problem that a process could access the zeppelin file system. Am i right ?
>>
>> 2018-05-08 17:46 GMT-05:00 Sam Nicholson <sam...@ogt11.com>:
>>
>>> And warning!
>>>
>>> Trying to answer the above, I've disconnected my websocket.
>>> I'll figure it out and report back
>>>
>>> On Tue, May 8, 2018 at 6:28 PM, Sam Nicholson <sam...@ogt11.com> wrote:
>>>
>>>> So,
>>>>
>>>> I run the zeppelin process as the web user on my system.  There is no
>>>> other web process, so why not.
>>>>
>>>> Then, UNIX permissions keep it from running, accessing, deleting
>>>> anything else.  EXCEPT items that are world writeable.
>>>>
>>>> There shouldn't be any of those, other than /tmp, but still /tmp is a
>>>> hotbed of nefarious activity 

Re: Zeppelin code can access FileSystem

2018-05-09 Thread Jhon Anderson Cardenas Diaz
Thank you Sam. Reviewing the jira issues, I found that issue was previously
identified in this jira ticket ZEPPELIN-1320
<https://issues.apache.org/jira/browse/ZEPPELIN-1320>, but i don't know if
is my impression but it seems like they focused more on the fact that the
processes could not access the directories of other users than on the
problem that a process could access the zeppelin file system. Am i right ?

2018-05-08 17:46 GMT-05:00 Sam Nicholson <sam...@ogt11.com>:

> And warning!
>
> Trying to answer the above, I've disconnected my websocket.
> I'll figure it out and report back
>
> On Tue, May 8, 2018 at 6:28 PM, Sam Nicholson <sam...@ogt11.com> wrote:
>
>> So,
>>
>> I run the zeppelin process as the web user on my system.  There is no
>> other web process, so why not.
>>
>> Then, UNIX permissions keep it from running, accessing, deleting anything
>> else.  EXCEPT items that are world writeable.
>>
>> There shouldn't be any of those, other than /tmp, but still /tmp is a
>> hotbed of nefarious activity on hacked machines.  :)
>>
>> For example:
>>
>> %sh
>>
>> pwd
>> ls
>> touch bazzot
>> ls -l bazzot
>> rm bazzot
>>
>> Gives:
>>
>> /var/www/zeppelin
>> derby.log
>> figure
>> metastore_db
>> Rgraphics
>> Rgraphics.zip
>> -rw-r--r-- 1 www-data www-data 0 May 8 18:04 bazzot
>> ls: cannot access 'bazzot': No such file or directory
>> ExitValue: 2
>>
>> For another example:
>>
>> %sh
>> id
>> cd /home/samcn2
>> touch bazzot
>> ls -l bazzot
>> rm bazzot
>>
>> Gives:
>>
>> uid=33(www-data) gid=33(www-data) groups=33(www-data)
>> touch: cannot touch 'bazzot': Permission denied
>> ls: cannot access 'bazzot': No such file or directory
>> rm: cannot remove 'bazzot': No such file or directory
>> ExitValue: 1
>>
>>
>> So, you can't access other users' files.
>>
>> But you CAN access the web user's files.  That may be a bug.  I'm going
>> to try changing the zeppelin  running user.  Wait one...
>>
>> OK.  So you can run zeppelin as some other user, the logs and the run
>> directory must be owned by that user.
>> I do this with symlinks.  But the websocket is failing.  So no joy
>> there...
>>
>> So, for now, you can set things up so that zeppelin can't access any
>> other files from other users on the system,
>> but zeppelin web can access the zeppelin executable.  So, don't put this
>> up for untrusted users!!!
>>
>> Here is my zeppelin start script:
>> #!/bin/sh
>>
>> cd /var/www/zeppelin/home
>>
>> sudo -u zeppelin /opt/apache/zeppelin/zeppelin-
>> 0.7.3-bin-all/bin/zeppelin-daemon.sh $*
>>
>>
>> If /var/www/zeppelin/home is owned by zeppelin, as is
>> /opt/apache/zeppelin/*, then this works with the caveat above.
>>
>> Cheers!
>> -sam
>>
>>
>> On Tue, May 8, 2018 at 5:48 PM, Jhon Anderson Cardenas Diaz <
>> jhonderson2...@gmail.com> wrote:
>>
>>> Dear Zeppelin Community,
>>>
>>> Currently when a Zeppelin paragraph is executed, the code in it can read
>>> sensitive config files, change them, including web app pages and etc. Like
>>> in this example:
>>>
>>> %python
>>> f = open("/usr/zeppelin/conf/credentials.json", "r")
>>> f.read()
>>>
>>> Do you know if is there a way to configure the user used to start the
>>> interpreters or run the paragraph's code ?, so that user can not access the
>>> File System where zeppelin is running, or has  more restricted access.
>>>
>>> Thank you.
>>>
>>
>>
>


Zeppelin code can access FileSystem

2018-05-08 Thread Jhon Anderson Cardenas Diaz
Dear Zeppelin Community,

Currently when a Zeppelin paragraph is executed, the code in it can read
sensitive config files, change them, including web app pages and etc. Like
in this example:

%python
f = open("/usr/zeppelin/conf/credentials.json", "r")
f.read()

Do you know if is there a way to configure the user used to start the
interpreters or run the paragraph's code ?, so that user can not access the
File System where zeppelin is running, or has  more restricted access.

Thank you.


Filter for Zeppelin Notebook Server (Websocket)

2018-04-25 Thread Jhon Anderson Cardenas Diaz
Hi!

I am trying to implement a filter inside zeppelin in order to intercept the
petitions and collect metrics about zeppelin performance. I registered the
javax servlet filter in the zeppelin-web/src/WEB-INF/web.xml, and the
filter works well for the REST request; but it does not intercept the
Websocket communication (which is the way zeppelin ui interact with
zeppelin server in notebook related stuffs).

Do you know a way to configure a javax servlet filter to intercept
websocket communication without modify NotebookServer class?.

Thank you!


Re: [Kubernetes] How to manage Zeppelin in K8s

2018-04-09 Thread Jhon Anderson Cardenas Diaz
Hi,

The permission settings are stored in:
$ZEPPELIN_HOME/conf/notebook-authorization.json

The interpreter settings are stored in:
$ZEPPELIN_HOME/conf/interpreter.json

I think since zeppelin 0.8.0 exists a mechanism to persist the interpreters
configuration. If you work with an earlier version, you would need to
implement a mechanism to store this configuration in your s3 bucket.

Regards.


2018-04-09 9:12 GMT-05:00 Josh Goldsborough 
:

> We have a cluster using Kubernetes to host Zeppelin.  We have the
> notebooks backed up in S3, but right now we weren't using a stateful set.
> And it seems like when ever we recreate the cluster it comes back & loads
> the notebooks fine.  But each notebook loses all the permissions (leaving
> them exposed to everyone).
>
> We can convert to using stateful sets in K8s so the Zeppelin file system
> comes back to it's existing state (instead of a clean build) but exactly
> where/how is Zeppelin managing the interpreter & permission settings?
> Since they clearly aren't tied to the notebook files themselves.  And we'll
> need to back them up in case of disaster recovery.
>
> Thanks!
> -Josh
>


Re: Zeppelin - Spark Driver location

2018-03-13 Thread Jhon Anderson Cardenas Diaz
Does this new feature work only for yarn-cluster ?. Or for spark standalone
too ?

El mar., 13 de mar. de 2018 18:34, Ruslan Dautkhanov <dautkha...@gmail.com>
escribió:

> > Zeppelin version: 0.8.0 (merged at September 2017 version)
>
> https://issues.apache.org/jira/browse/ZEPPELIN-2898 was merged end of
> September so not sure if you have that.
>
> Check out
> https://medium.com/@zjffdu/zeppelin-0-8-0-new-features-ea53e8810235 how
> to set this up.
>
>
>
> --
> Ruslan Dautkhanov
>
> On Tue, Mar 13, 2018 at 5:24 PM, Jhon Anderson Cardenas Diaz <
> jhonderson2...@gmail.com> wrote:
>
>> Hi zeppelin users !
>>
>> I am working with zeppelin pointing to a spark in standalone. I am trying
>> to figure out a way to make zeppelin runs the spark driver outside of
>> client process that submits the application.
>>
>> According with the documentation (
>> http://spark.apache.org/docs/2.1.1/spark-standalone.html):
>>
>> *For standalone clusters, Spark currently supports two deploy modes.
>> In client mode, the driver is launched in the same process as the client
>> that submits the application. In cluster mode, however, the driver is
>> launched from one of the Worker processes inside the cluster, and the
>> client process exits as soon as it fulfills its responsibility of
>> submitting the application without waiting for the application to finish.*
>>
>> The problem is that, even when I set the properties for spark-standalone
>> cluster and deploy mode in cluster, the driver still run inside zeppelin
>> machine (according with spark UI/executors page). These are properties that
>> I am setting for the spark interpreter:
>>
>> master: spark://:7077
>> spark.submit.deployMode: cluster
>> spark.executor.memory: 16g
>>
>> Any ideas would be appreciated.
>>
>> Thank you
>>
>> Details:
>> Spark version: 2.1.1
>> Zeppelin version: 0.8.0 (merged at September 2017 version)
>>
>
>


Zeppelin - Spark Driver location

2018-03-13 Thread Jhon Anderson Cardenas Diaz
Hi zeppelin users !

I am working with zeppelin pointing to a spark in standalone. I am trying
to figure out a way to make zeppelin runs the spark driver outside of
client process that submits the application.

According with the documentation (
http://spark.apache.org/docs/2.1.1/spark-standalone.html):

*For standalone clusters, Spark currently supports two deploy modes.
In client mode, the driver is launched in the same process as the client
that submits the application. In cluster mode, however, the driver is
launched from one of the Worker processes inside the cluster, and the
client process exits as soon as it fulfills its responsibility of
submitting the application without waiting for the application to finish.*

The problem is that, even when I set the properties for spark-standalone
cluster and deploy mode in cluster, the driver still run inside zeppelin
machine (according with spark UI/executors page). These are properties that
I am setting for the spark interpreter:

master: spark://:7077
spark.submit.deployMode: cluster
spark.executor.memory: 16g

Any ideas would be appreciated.

Thank you

Details:
Spark version: 2.1.1
Zeppelin version: 0.8.0 (merged at September 2017 version)


Unmodifiable interpreter properties

2018-03-02 Thread Jhon Anderson Cardenas Diaz
Hi fellow Zeppelin users.

I would like to know if is there a way in zeppelin to set interpreter
properties
that can not be changed by the user from the graphic interface.

An example use case in which this can be useful is if we want that zeppelin
users can not kill jobs from the spark ui; for this you must modify the
property "spark.ui.killEnabled" with value FALSE. The problem is that the
user can change this property from the interpreter screen and enable the
feature again.

I am wondering if maybe is there some attribute in the properties
registered in interpreter-setting.json that makes that property readonly
(for end user) or something like that.

Thank you in advance!.


Re: Jar dependencies are not reloaded when Spark interpreter is restarted?

2018-02-22 Thread Jhon Anderson Cardenas Diaz
When you say you change the dependency, is only about the content? Or
content and version. I think the dependency should be reloaded only if its
version change.

I do not think it's optimal to re-download the dependencies every time the
interpreter reboots.

El 22 feb. 2018 05:22, "Partridge, Lucas (GE Aviation)" <
lucas.partri...@ge.com> escribió:

> I’m using Zeppelin 0.7.3 against a local standalone Spark ‘cluster’. I’ve
> added a Scala jar dependency to my Spark interpreter using Zeppelin’s UI. I
> thought if I changed my Scala code and updated the jar (using sbt outside
> of Zeppelin) then all I’d have to do is restart the interpreter for the new
> code to be picked up in Zeppelin in a regular scala paragraph.  However
> restarting the interpreter appears to have no effect – the new code is not
> detected. Is that expected behaviour or a bug?
>
>
>
> The workaround I’m using at the moment is to edit the spark interpreter,
> remove the jar, re-add it, save the changes and then restart the
> interpreter. Clumsy but that’s better than restarting Zeppelin altogether.
>
>
>
> Also, if anyone knows of a better way to reload code without restarting
> the interpreter then I’m open to suggestions:). Having to re-run lots of
> paragraphs after a restart is pretty tedious.
>
>
>
> Thanks, Lucas.
>
>
>


Re: Extending SparkInterpreter functionality

2018-02-02 Thread Jhon Anderson Cardenas Diaz
lly we want to add lot more flexibility.
>>>
>>> We are building a platform to cater to multiple clients. So, multiple
>>> Zeppelin instances, multiple spark clusters, multiple Spark UIs and on top
>>> of that maintaining the security and privacy in a shared multi-tenant env
>>> will need all the flexibility we can get!
>>>
>>> Thanks
>>> Ankit
>>>
>>> On Feb 1, 2018, at 7:51 PM, Jeff Zhang <zjf...@gmail.com> wrote:
>>>
>>>
>>> Hi Jhon,
>>>
>>> Do you mind to share what kind of custom function you want to add to
>>> spark interpreter ? One idea in my mind is that we could add extension
>>> point to the existing SparkInterpreter, and user can enhance
>>> SparkInterpreter via these extension point. That means we just open some
>>> interfaces and users can implement those interfaces, and just add their
>>> jars to spark interpreter folder.
>>>
>>>
>>>
>>> Jhon Anderson Cardenas Diaz <jhonderson2...@gmail.com>于2018年2月2日周五
>>> 上午5:30写道:
>>>
>>>> Hello!
>>>>
>>>> I'm a software developer and as part of a project I require to extend
>>>> the functionality of SparkInterpreter without modifying it. I need instead
>>>> create a new interpreter that extends it or wrap its functionality.
>>>>
>>>> I also need the spark sub-interpreters to use my new custom
>>>> interpreter, but the problem comes here, because the spark sub-interpreters
>>>> has a direct dependency to spark interpreter as they use the class name of
>>>> spark interpreter to obtain its instance:
>>>>
>>>>
>>>> private SparkInterpreter getSparkInterpreter() {
>>>>
>>>> ...
>>>>
>>>> Interpreter p = 
>>>> getInterpreterInTheSameSessionByClassName(SparkInterpreter.class.getName());
>>>>
>>>> }
>>>>
>>>>
>>>> *Approach without modify apache zeppelin*
>>>>
>>>> My current approach to solve is to create a SparkCustomInterpreter that
>>>> override the getClassName method as follows:
>>>>
>>>> public class SparkCustomInterpreter extends SparkInterpreter {
>>>> ...
>>>>
>>>> @Override
>>>> public String getClassName() {
>>>> return SparkInterpreter.class.getName();
>>>> }
>>>> }
>>>>
>>>>
>>>> and put the new class name in the interpreter-setting.json file of
>>>> spark:
>>>>
>>>> [
>>>>   {
>>>> "group": "spark",
>>>> "name": "spark",
>>>> "className": "org.apache.zeppelin.spark.SparkCustomInterpreter",
>>>> ...
>>>> "properties": {...}
>>>>   }, ...
>>>> ]
>>>>
>>>>
>>>> The problem with this approach is that when I run a paragraph it fails.
>>>> In general it fails because zeppelin uses both the class name of the
>>>> instance and the getClassName() method to access the instance, and
>>>> that causes many problems.
>>>>
>>>> *Approaches modifying apache zeppelin*
>>>>
>>>> There are two possible solutions related with the way in which the
>>>> sub-interpreters get the SparkInterpreter instance class, one is
>>>> getting the class name from a property:
>>>>
>>>>
>>>> private SparkInterpreter getSparkInterpreter() {
>>>>
>>>> ...
>>>>
>>>> Interpreter p = 
>>>> getInterpreterInTheSameSessionByClassName(*property.getProperty("zeppelin.spark.mainClass",
>>>>  SparkInterpreter.class.getName())* );
>>>>
>>>> }
>>>>
>>>> And the other possibility is to modify the method Interpreter.
>>>> getInterpreterInTheSameSessionByClassName(String) in order to return
>>>> the instance that whether has the same class name specified in the
>>>> parameter or which super class has the same class name specified in the
>>>> parameter:
>>>>
>>>>
>>>> @ZeppelinApi
>>>> public Interpreter getInterpreterInTheSameSessionByClassName(String 
>>>> className) {
>>>>   synchronized (interpreterGroup) {
>>>> for (List interpreters : interpreterGroup.values()) {
>>>>   
>>>>   for (Interpreter intp : interpreters) {
>>>> if (intp.getClassName().equals(className) *|| 
>>>> intp.getClass().getSuperclass().getName().equals(className)*) {
>>>>   interpreterFound = intp;
>>>> }
>>>>
>>>> ...
>>>>   }
>>>>
>>>>   ...
>>>> }
>>>>   }
>>>>   return null;
>>>> }
>>>>
>>>>
>>>> Either of the two solutions would involve the modification of apache
>>>> zeppelin code; do you think the change could be contributed to the
>>>> community?, or maybe do you realize some other approach to change the
>>>> way in which sub-interpreters of spark get the instance of spark 
>>>> interpreter
>>>> ?
>>>>
>>>> Any information about it I'll be attempt.
>>>>
>>>> Greetings
>>>>
>>>>
>>>> Jhon
>>>>
>>>


Re: How to create security filter for Spark UI in Spark on YARN

2018-02-01 Thread Jhon Anderson Cardenas Diaz
I solve this by using the property hadoop.http.authentication.type to
specify a custom Java Handler objects that contains the authentication
logic. This class only has to implement the
interface 
org.apache.hadoop.security.authentication.server.AuthenticationHandler.
See:

https://hadoop.apache.org/docs/r2.7.3/hadoop-project-dist/hadoop-common/HttpAuthentication.html

Regards


2018-01-10 19:38 GMT-05:00 Jeff Zhang <zjf...@gmail.com>:

>
> It seems by design of yarn mode. Have you ever make it work in spark-shell
> ?
>
>
> Jhon Anderson Cardenas Diaz <jhonderson2...@gmail.com>于2018年1月10日周三
> 下午9:17写道:
>
>> *Environment*:
>> AWS EMR, yarn cluster.
>>
>> *Description*:
>>
>> I am trying to use a java filter to protect the access to spark ui, this
>> by using the property spark.ui.filters; the problem is that when spark is
>> running on yarn mode, that property is being allways overriden with the
>> filter org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter:
>>
>> *spark.ui.filters:
>> org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter*
>>
>> And this properties are automatically added:
>>
>>
>> *spark.org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter.param.PROXY_HOSTS:
>> ip-x-x-x-226.eu-west-1.compute.internalspark.org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter.param.PROXY_URI_BASES:
>> http://ip-x-x-x-226.eu-west-1.compute.internal:20888/proxy/application_x_
>> <http://ip-x-x-x-226.eu-west-1.compute.internal:20888/proxy/application_x_>*
>>
>> Any suggestion of how to add a java security filter so ti does not get
>> overriden, or maybe how to configure the security from hadoop side?
>>
>> Thanks.
>>
>


Extending SparkInterpreter functionality

2018-02-01 Thread Jhon Anderson Cardenas Diaz
Hello!

I'm a software developer and as part of a project I require to extend the
functionality of SparkInterpreter without modifying it. I need instead
create a new interpreter that extends it or wrap its functionality.

I also need the spark sub-interpreters to use my new custom interpreter,
but the problem comes here, because the spark sub-interpreters has a direct
dependency to spark interpreter as they use the class name of spark
interpreter to obtain its instance:


private SparkInterpreter getSparkInterpreter() {

...

Interpreter p =
getInterpreterInTheSameSessionByClassName(SparkInterpreter.class.getName());

}


*Approach without modify apache zeppelin*

My current approach to solve is to create a SparkCustomInterpreter that
override the getClassName method as follows:

public class SparkCustomInterpreter extends SparkInterpreter {
...

@Override
public String getClassName() {
return SparkInterpreter.class.getName();
}
}


and put the new class name in the interpreter-setting.json file of spark:

[
  {
"group": "spark",
"name": "spark",
"className": "org.apache.zeppelin.spark.SparkCustomInterpreter",
...
"properties": {...}
  }, ...
]


The problem with this approach is that when I run a paragraph it fails. In
general it fails because zeppelin uses both the class name of the instance
and the getClassName() method to access the instance, and that causes many
problems.

*Approaches modifying apache zeppelin*

There are two possible solutions related with the way in which the
sub-interpreters get the SparkInterpreter instance class, one is getting
the class name from a property:


private SparkInterpreter getSparkInterpreter() {

...

Interpreter p =
getInterpreterInTheSameSessionByClassName(*property.getProperty("zeppelin.spark.mainClass",
SparkInterpreter.class.getName())* );

}

And the other possibility is to modify the method Interpreter.
getInterpreterInTheSameSessionByClassName(String) in order to return the
instance that whether has the same class name specified in the parameter or
which super class has the same class name specified in the parameter:


@ZeppelinApi
public Interpreter getInterpreterInTheSameSessionByClassName(String className) {
  synchronized (interpreterGroup) {
for (List interpreters : interpreterGroup.values()) {
  
  for (Interpreter intp : interpreters) {
if (intp.getClassName().equals(className) *||
intp.getClass().getSuperclass().getName().equals(className)*) {
  interpreterFound = intp;
}

...
  }

  ...
}
  }
  return null;
}


Either of the two solutions would involve the modification of apache
zeppelin code; do you think the change could be contributed to the
community?, or maybe do you realize some other approach to change the way
in which sub-interpreters of spark get the instance of spark interpreter?

Any information about it I'll be attempt.

Greetings

Jhon


Implementation of NotebookRepo

2018-01-26 Thread Jhon Anderson Cardenas Diaz
Hi fellow Zeppelin users,

I would like to create another implementation of
org.apache.zeppelin.notebook.repo.NotebookRepo interface in order to
persist the notebooks from zeppelin in S3 but in a versioned way (like a
Git on S3).

How do you recommend that i can add my jar file with the custom
implementation to my zeppelin docker deployment?, is there maybe an
zeppelin folder where i can put custom libraries or do i have to extend
zeppelin class path?.

Thanks & Regards,
Jhon


How to create security filter for Spark UI in Spark on YARN

2018-01-10 Thread Jhon Anderson Cardenas Diaz
*Environment*:
AWS EMR, yarn cluster.

*Description*:

I am trying to use a java filter to protect the access to spark ui, this by
using the property spark.ui.filters; the problem is that when spark is
running on yarn mode, that property is being allways overriden with the
filter org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter:

*spark.ui.filters:
org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter*

And this properties are automatically added:


*spark.org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter.param.PROXY_HOSTS:
ip-x-x-x-226.eu-west-1.compute.internalspark.org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter.param.PROXY_URI_BASES:
http://ip-x-x-x-226.eu-west-1.compute.internal:20888/proxy/application_x_
*

Any suggestion of how to add a java security filter so ti does not get
overriden, or maybe how to configure the security from hadoop side?

Thanks.