[GitHub] zeppelin issue #2752: [ZEPPELIN-3195] Remove the limit on the number of run ...

2018-02-01 Thread mebelousov
Github user mebelousov commented on the issue:

https://github.com/apache/zeppelin/pull/2752
  
@felixcheung 
Where I could document about ZEPPELIN_INTERPRETER_MAX_POOL_SIZE?
There is no such cases in documentation.

Also I cannot imagine than Zeppelin administrator will go to documentation 
when users will complain that manual note run is OK but only 10 first 
paragraphs are done at cron.


---


[GitHub] zeppelin pull request #2757: [ZEPPELIN-3198] UI should not show Version/GIT ...

2018-02-01 Thread prabhjyotsingh
GitHub user prabhjyotsingh opened a pull request:

https://github.com/apache/zeppelin/pull/2757

[ZEPPELIN-3198] UI should not show Version/GIT Control if the same if not 
supported

### What is this PR for?
Currently, UI shows an option for version/GIT Control even when it is not 
supported by the underlying implementing storage configuration.

It is only after users try to save a commit and get an error "Couldn't 
checkpoint note revision: possibly storage doesn't support versioning. Please 
check the logs for more details.".

So, if implementing storage configuration doesn't support git storage, UI 
should not show those options.

### What type of PR is it?
[Improvement]

### What is the Jira issue?
* 
[ZEPPELIN-3198](https://issues.apache.org/jira/projects/ZEPPELIN/issues/ZEPPELIN-3198)

### How should this be tested?
On using "org.apache.zeppelin.notebook.repo.GitNotebookRepo"  for 
`zeppelin.notebook.storage`  user should see revision/version control option, 
for rest of the others e.g. "FileSystemNotebookRepo" user should not see that 
option 


### Questions:
* Does the licenses files need update? N/A
* Is there breaking changes for older versions? N/A
* Does this needs documentation? N/A


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/prabhjyotsingh/zeppelin ZEPPELIN-3198

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/zeppelin/pull/2757.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2757


commit cd7fde105c5017ac5ef8fab956c79987529fc10d
Author: Prabhjyot Singh 
Date:   2018-02-01T09:38:50Z

ZEPPELIN-3198: add isRevisionSupported for NotebookRepo

Change-Id: I67af210fb003e007129db4b24e5f8f53fb034c6f




---


[GitHub] zeppelin pull request #2758: ZEPPELIN-3157. Fixed Checkstyle errors in hbase...

2018-02-01 Thread HorizonNet
GitHub user HorizonNet opened a pull request:

https://github.com/apache/zeppelin/pull/2758

ZEPPELIN-3157. Fixed Checkstyle errors in hbase module

### What is this PR for?

Fixed Checkstyle issues in the **hbase** module.

### What type of PR is it?
Improvement

### Todos
* [ ] - Task

### What is the Jira issue?
* https://issues.apache.org/jira/browse/ZEPPELIN-3157

### How should this be tested?
* CI pass

### Screenshots (if appropriate)

### Questions:
* Does the licenses files need update? no
* Is there breaking changes for older versions? no
* Does this needs documentation? no


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/ultratendency/zeppelin ZEPPELIN-3157

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/zeppelin/pull/2758.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2758


commit 4aa4eaecb6dd86cab60f252165be648292e29c94
Author: Jan Hentschel 
Date:   2018-02-01T10:22:47Z

ZEPPELIN-3157. Fixed Checkstyle errors in hbase module




---


[GitHub] zeppelin issue #2700: [ZEPPELIN-3092] GitHub Integration

2018-02-01 Thread mohamagdy
Github user mohamagdy commented on the issue:

https://github.com/apache/zeppelin/pull/2700
  
@zjffdu all good. I closed and reopened the pull request.

I would suggest making the process of rerunning Jenkins easier than closing 
and reopening the pull request. This also triggers Travis tests which take 
around 1.5 hours. Maybe adding a `build` button in Jenkins so that one can 
rerun the job? What do you think?


---


[GitHub] zeppelin issue #2700: [ZEPPELIN-3092] GitHub Integration

2018-02-01 Thread zjffdu
Github user zjffdu commented on the issue:

https://github.com/apache/zeppelin/pull/2700
  
It is awsome if there's such button. But I don't know how to make such 
button. BTW usually I would force push a dummy commit to trigger the build. And 
I believe we should also fix these flaky test which cause inconvenience for 
developers. 


---


[GitHub] zeppelin issue #2758: ZEPPELIN-3157. Fixed Checkstyle errors in hbase module

2018-02-01 Thread zjffdu
Github user zjffdu commented on the issue:

https://github.com/apache/zeppelin/pull/2758
  
Thanks @HorizonNet  LGTM


---


[GitHub] zeppelin issue #2757: [ZEPPELIN-3198] UI should not show Version/GIT Control...

2018-02-01 Thread jhonderson
Github user jhonderson commented on the issue:

https://github.com/apache/zeppelin/pull/2757
  
The methods for the versioning of the notes are defined at interface level 
in NotebookRepo.java:

 - Revision checkpoint(String noteId, String checkpointMsg, 
AuthenticationInfo)
 - Note get(String noteId, String revId, AuthenticationInfo)
 - List revisionHistory(String noteId, AuthenticationInfo)
 - Note setNoteRevision(String noteId, String revId, AuthenticationInfo)

They could be implemented by a non-git repository, for example a repository 
that saves the notes in S3 and its versioning information in a database or 
something like that. So i guess a better solution would be a method in 
NotebookRepo.java that indicates if the implementation supports versioning 
(maybe doesSupportVersioning()); and use that method to show or hide the 
versioning feature.

What do you think?.


---


[GitHub] zeppelin issue #2757: [ZEPPELIN-3198] UI should not show Version/GIT Control...

2018-02-01 Thread prabhjyotsingh
Github user prabhjyotsingh commented on the issue:

https://github.com/apache/zeppelin/pull/2757
  
Yes, I believe that is exactly what I'm trying to do over here, for example 
in S3NotebookRepo, since none of checkpoint, get, revisionHistory, or 
setNoteRevision is implemented 
(https://github.com/apache/zeppelin/blob/master/zeppelin-zengine/src/main/java/org/apache/zeppelin/notebook/repo/S3NotebookRepo.java#L292)
 hence for the same the method isRevisionSupported returns false. Let me know 
if I'm missing out on a case.


---


[jira] [Created] (ZEPPELIN-3201) Detached Zeppelin processes after server shutdown

2018-02-01 Thread Jasper Knulst (JIRA)
Jasper Knulst created ZEPPELIN-3201:
---

 Summary: Detached Zeppelin processes after server shutdown
 Key: ZEPPELIN-3201
 URL: https://issues.apache.org/jira/browse/ZEPPELIN-3201
 Project: Zeppelin
  Issue Type: Bug
  Components: zeppelin-interpreter
Affects Versions: 0.7.2
Reporter: Jasper Knulst


If you check the number of Zeppelin processes running after the server has been 
up for a while (ps aux | grep zeppelin) there are a lot of very old OS level 
processes around. They seem to be detached somehow (if you kill them nobody 
would notice it) and can run into hundreds of them.

Moreover, if you shut down the Zeppelin server and run " ps aux | grep zeppelin 
" there are also lots of detached processes not under management of the server 
anymore. Some of those are isolated, impersonated shell processes belonging to 
the %sh interpreter.

I created script to clean-up but the server should be aware of its child 
processes and manage them, especially on shut down.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ZEPPELIN-3202) Missing test dependencies in scio

2018-02-01 Thread Jan Hentschel (JIRA)
Jan Hentschel created ZEPPELIN-3202:
---

 Summary: Missing test dependencies in scio
 Key: ZEPPELIN-3202
 URL: https://issues.apache.org/jira/browse/ZEPPELIN-3202
 Project: Zeppelin
  Issue Type: Bug
Affects Versions: 0.9.0
 Environment: Maven home: /usr/local/Cellar/maven/3.5.2/libexec
Java version: 1.8.0_51, vendor: Oracle Corporation
Java home: /Library/Java/JavaVirtualMachines/jdk1.8.0_51.jdk/Contents/Home/jre
Default locale: en_US, platform encoding: UTF-8
OS name: "mac os x", version: "10.13.3", arch: "x86_64", family: "mac"
Reporter: Jan Hentschel
Assignee: Jan Hentschel


Currently tests are failing for me in the *scio* module when running {{mvn 
clean install}}. It seems that some test dependencies are missing in the POM 
definition. The initial error message is

{code}
Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 0.028 sec <<< 
FAILURE! - in org.apache.zeppelin.scio.ScioInterpreterTest
initializationError(org.apache.zeppelin.scio.ScioInterpreterTest)  Time 
elapsed: 0.01 sec  <<< ERROR!
java.lang.NoClassDefFoundError: org/hamcrest/SelfDescribing
at java.lang.ClassLoader.defineClass1(Native Method)
at java.lang.ClassLoader.defineClass(ClassLoader.java:760)
at 
java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
at java.net.URLClassLoader.defineClass(URLClassLoader.java:467)
at java.net.URLClassLoader.access$100(URLClassLoader.java:73)
at java.net.URLClassLoader$1.run(URLClassLoader.java:368)
at java.net.URLClassLoader$1.run(URLClassLoader.java:362)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:361)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
at 
org.junit.internal.builders.JUnit4Builder.runnerForClass(JUnit4Builder.java:10)
at 
org.junit.runners.model.RunnerBuilder.safeRunnerForClass(RunnerBuilder.java:59)
at 
org.junit.internal.builders.AllDefaultPossibilitiesBuilder.runnerForClass(AllDefaultPossibilitiesBuilder.java:26)
at 
org.junit.runners.model.RunnerBuilder.safeRunnerForClass(RunnerBuilder.java:59)
at 
org.junit.internal.requests.ClassRequest.getRunner(ClassRequest.java:33)
at 
org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:262)
at 
org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:153)
at 
org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:124)
at 
org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:200)
at 
org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:153)
at 
org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:103)
{code}

After fixing this one similar messages appear for *gson* and *commons-lang*.

The build runs without a problem when running {{mvn clean package}}. Suggestion 
is to add *hamcrest-all*, *gson* and *commons-lang* with scope *test*.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] zeppelin pull request #2759: ZEPPELIN-3202. Added missing test dependencies ...

2018-02-01 Thread HorizonNet
GitHub user HorizonNet opened a pull request:

https://github.com/apache/zeppelin/pull/2759

ZEPPELIN-3202. Added missing test dependencies in the scio module

### What is this PR for?

Added missing test dependencies for the **scio** module to prevent test 
failures when running `mvn clean install`.

### What type of PR is it?
Bug Fix

### Todos
* [ ] - Task

### What is the Jira issue?
* https://issues.apache.org/jira/browse/ZEPPELIN-3202

### How should this be tested?
* CI pass

### Screenshots (if appropriate)

### Questions:
* Does the licenses files need update? no
* Is there breaking changes for older versions? no
* Does this needs documentation? no


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/ultratendency/zeppelin ZEPPELIN-3202

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/zeppelin/pull/2759.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2759


commit 2d9ffdf0759c431ad3e524188b0526b5557cd0e8
Author: Jan Hentschel 
Date:   2018-02-01T14:53:29Z

ZEPPELIN-3202. Added missing test dependencies in the scio module




---


Extending SparkInterpreter functionality

2018-02-01 Thread Jhon Anderson Cardenas Diaz
Hello!

I'm a software developer and as part of a project I require to extend the
functionality of SparkInterpreter without modifying it. I need instead
create a new interpreter that extends it or wrap its functionality.

I also need the spark sub-interpreters to use my new custom interpreter,
but the problem comes here, because the spark sub-interpreters has a direct
dependency to spark interpreter as they use the class name of spark
interpreter to obtain its instance:


private SparkInterpreter getSparkInterpreter() {

...

Interpreter p =
getInterpreterInTheSameSessionByClassName(SparkInterpreter.class.getName());

}


*Approach without modify apache zeppelin*

My current approach to solve is to create a SparkCustomInterpreter that
override the getClassName method as follows:

public class SparkCustomInterpreter extends SparkInterpreter {
...

@Override
public String getClassName() {
return SparkInterpreter.class.getName();
}
}


and put the new class name in the interpreter-setting.json file of spark:

[
  {
"group": "spark",
"name": "spark",
"className": "org.apache.zeppelin.spark.SparkCustomInterpreter",
...
"properties": {...}
  }, ...
]


The problem with this approach is that when I run a paragraph it fails. In
general it fails because zeppelin uses both the class name of the instance
and the getClassName() method to access the instance, and that causes many
problems.

*Approaches modifying apache zeppelin*

There are two possible solutions related with the way in which the
sub-interpreters get the SparkInterpreter instance class, one is getting
the class name from a property:


private SparkInterpreter getSparkInterpreter() {

...

Interpreter p =
getInterpreterInTheSameSessionByClassName(*property.getProperty("zeppelin.spark.mainClass",
SparkInterpreter.class.getName())* );

}

And the other possibility is to modify the method Interpreter.
getInterpreterInTheSameSessionByClassName(String) in order to return the
instance that whether has the same class name specified in the parameter or
which super class has the same class name specified in the parameter:


@ZeppelinApi
public Interpreter getInterpreterInTheSameSessionByClassName(String className) {
  synchronized (interpreterGroup) {
for (List interpreters : interpreterGroup.values()) {
  
  for (Interpreter intp : interpreters) {
if (intp.getClassName().equals(className) *||
intp.getClass().getSuperclass().getName().equals(className)*) {
  interpreterFound = intp;
}

...
  }

  ...
}
  }
  return null;
}


Either of the two solutions would involve the modification of apache
zeppelin code; do you think the change could be contributed to the
community?, or maybe do you realize some other approach to change the way
in which sub-interpreters of spark get the instance of spark interpreter?

Any information about it I'll be attempt.

Greetings

Jhon


Re: Extending SparkInterpreter functionality

2018-02-01 Thread Jeff Zhang
Hi Jhon,

Do you mind to share what kind of custom function you want to add to spark
interpreter ? One idea in my mind is that we could add extension point to
the existing SparkInterpreter, and user can enhance SparkInterpreter via
these extension point. That means we just open some interfaces and users
can implement those interfaces, and just add their jars to spark
interpreter folder.



Jhon Anderson Cardenas Diaz 于2018年2月2日周五 上午5:30写道:

> Hello!
>
> I'm a software developer and as part of a project I require to extend the
> functionality of SparkInterpreter without modifying it. I need instead
> create a new interpreter that extends it or wrap its functionality.
>
> I also need the spark sub-interpreters to use my new custom interpreter,
> but the problem comes here, because the spark sub-interpreters has a
> direct dependency to spark interpreter as they use the class name of spark
> interpreter to obtain its instance:
>
>
> private SparkInterpreter getSparkInterpreter() {
>
> ...
>
> Interpreter p = 
> getInterpreterInTheSameSessionByClassName(SparkInterpreter.class.getName());
>
> }
>
>
> *Approach without modify apache zeppelin*
>
> My current approach to solve is to create a SparkCustomInterpreter that
> override the getClassName method as follows:
>
> public class SparkCustomInterpreter extends SparkInterpreter {
> ...
>
> @Override
> public String getClassName() {
> return SparkInterpreter.class.getName();
> }
> }
>
>
> and put the new class name in the interpreter-setting.json file of spark:
>
> [
>   {
> "group": "spark",
> "name": "spark",
> "className": "org.apache.zeppelin.spark.SparkCustomInterpreter",
> ...
> "properties": {...}
>   }, ...
> ]
>
>
> The problem with this approach is that when I run a paragraph it fails. In
> general it fails because zeppelin uses both the class name of the
> instance and the getClassName() method to access the instance, and that
> causes many problems.
>
> *Approaches modifying apache zeppelin*
>
> There are two possible solutions related with the way in which the
> sub-interpreters get the SparkInterpreter instance class, one is getting
> the class name from a property:
>
>
> private SparkInterpreter getSparkInterpreter() {
>
> ...
>
> Interpreter p = 
> getInterpreterInTheSameSessionByClassName(*property.getProperty("zeppelin.spark.mainClass",
>  SparkInterpreter.class.getName())* );
>
> }
>
> And the other possibility is to modify the method
> Interpreter.getInterpreterInTheSameSessionByClassName(String) in order to
> return the instance that whether has the same class name specified in the
> parameter or which super class has the same class name specified in the
> parameter:
>
>
> @ZeppelinApi
> public Interpreter getInterpreterInTheSameSessionByClassName(String 
> className) {
>   synchronized (interpreterGroup) {
> for (List interpreters : interpreterGroup.values()) {
>   
>   for (Interpreter intp : interpreters) {
> if (intp.getClassName().equals(className) *|| 
> intp.getClass().getSuperclass().getName().equals(className)*) {
>   interpreterFound = intp;
> }
>
> ...
>   }
>
>   ...
> }
>   }
>   return null;
> }
>
>
> Either of the two solutions would involve the modification of apache
> zeppelin code; do you think the change could be contributed to the
> community?, or maybe do you realize some other approach to change the way
> in which sub-interpreters of spark get the instance of spark interpreter?
>
> Any information about it I'll be attempt.
>
> Greetings
>
>
> Jhon
>


[GitHub] zeppelin pull request #2760: [WIP] ZEPPELIN-3196. Plugin framework for Zeppe...

2018-02-01 Thread zjffdu
GitHub user zjffdu opened a pull request:

https://github.com/apache/zeppelin/pull/2760

[WIP] ZEPPELIN-3196. Plugin framework for Zeppelin Engine

### What is this PR for?
A few sentences describing the overall goals of the pull request's commits.
First time? Check out the contributing guide - 
https://zeppelin.apache.org/contribution/contributions.html


### What type of PR is it?
[Bug Fix | Improvement | Feature | Documentation | Hot Fix | Refactoring]

### Todos
* [ ] - Task

### What is the Jira issue?
* Open an issue on Jira https://issues.apache.org/jira/browse/ZEPPELIN/
* Put link here, and add [ZEPPELIN-*Jira number*] in PR title, eg. 
[ZEPPELIN-533]

### How should this be tested?
* First time? Setup Travis CI as described on 
https://zeppelin.apache.org/contribution/contributions.html#continuous-integration
* Strongly recommended: add automated unit tests for any new or changed 
behavior
* Outline any manual steps to test the PR here.

### Screenshots (if appropriate)

### Questions:
* Does the licenses files need update?
* Is there breaking changes for older versions?
* Does this needs documentation?


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/zjffdu/zeppelin ZEPPELIN-3196

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/zeppelin/pull/2760.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2760


commit ad3233ab4e6c670a9e1ddc275aed7b810caa22ac
Author: Jeff Zhang 
Date:   2018-01-31T11:46:44Z

ZEPPELIN-3196. Plugin framework for Zeppelin Engine




---


[GitHub] zeppelin pull request #2758: ZEPPELIN-3157. Fixed Checkstyle errors in hbase...

2018-02-01 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/zeppelin/pull/2758


---


[GitHub] zeppelin pull request #2709: ZEPPELIN-3111. Refactor SparkInterpreter

2018-02-01 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/zeppelin/pull/2709


---


[GitHub] zeppelin issue #2757: [ZEPPELIN-3198] UI should not show Version/GIT Control...

2018-02-01 Thread prabhjyotsingh
Github user prabhjyotsingh commented on the issue:

https://github.com/apache/zeppelin/pull/2757
  
@felixcheung @zjffdu  can you please help review this?


---


Re: Extending SparkInterpreter functionality

2018-02-01 Thread Jeff Zhang
1) Spark UI which works differently on EMR than standalone, so that logic
will be in an interpreter specific to emr.
   Could you create a ticket for that, and please add details of that ? I
don't know exactly what the difference between EMR and standalone, we can
expose api to allow customization if necessary.


2) We want to add more metrics & logs in the interpreter, say number of
requests coming to the interpreter.
   Could you create a ticket for that as well ? I think it is not difficult
to do that.

3) Ideally we will like to connect to different spark clusters in
spark-submit and not tie to one which happens on Zeppelin startup right now.
It is possible now already, you can create different spark interpreter for
different spark clusters. e.g. you can create spark_16 for spark 1.6 and
spark_22 for spark 2.2, and what you need to do is just setting SPARK_HOME
properly in their interpreter setting.


Ankit Jain 于2018年2月2日周五 下午1:36写道:

> This is exactly what we want Jeff! A hook to plug in our own interpreters.
> (I am on same team as Jhon btw)
>
> Right now there are too many concrete references and injecting stuff is
> not possible.
>
> Eg of customizations -
> 1) Spark UI which works differently on EMR than standalone, so that logic
> will be in an interpreter specific to emr.
> 2) We want to add more metrics & logs in the interpreter, say number of
> requests coming to the interpreter.
> 3) Ideally we will like to connect to different spark clusters in
> spark-submit and not tie to one which happens on Zeppelin startup right now.
>
> Basically we want to add lot more flexibility.
>
> We are building a platform to cater to multiple clients. So, multiple
> Zeppelin instances, multiple spark clusters, multiple Spark UIs and on top
> of that maintaining the security and privacy in a shared multi-tenant env
> will need all the flexibility we can get!
>
> Thanks
> Ankit
>
> On Feb 1, 2018, at 7:51 PM, Jeff Zhang  wrote:
>
>
> Hi Jhon,
>
> Do you mind to share what kind of custom function you want to add to spark
> interpreter ? One idea in my mind is that we could add extension point to
> the existing SparkInterpreter, and user can enhance SparkInterpreter via
> these extension point. That means we just open some interfaces and users
> can implement those interfaces, and just add their jars to spark
> interpreter folder.
>
>
>
> Jhon Anderson Cardenas Diaz 于2018年2月2日周五
> 上午5:30写道:
>
>> Hello!
>>
>> I'm a software developer and as part of a project I require to extend the
>> functionality of SparkInterpreter without modifying it. I need instead
>> create a new interpreter that extends it or wrap its functionality.
>>
>> I also need the spark sub-interpreters to use my new custom interpreter,
>> but the problem comes here, because the spark sub-interpreters has a
>> direct dependency to spark interpreter as they use the class name of spark
>> interpreter to obtain its instance:
>>
>>
>> private SparkInterpreter getSparkInterpreter() {
>>
>> ...
>>
>> Interpreter p = 
>> getInterpreterInTheSameSessionByClassName(SparkInterpreter.class.getName());
>>
>> }
>>
>>
>> *Approach without modify apache zeppelin*
>>
>> My current approach to solve is to create a SparkCustomInterpreter that
>> override the getClassName method as follows:
>>
>> public class SparkCustomInterpreter extends SparkInterpreter {
>> ...
>>
>> @Override
>> public String getClassName() {
>> return SparkInterpreter.class.getName();
>> }
>> }
>>
>>
>> and put the new class name in the interpreter-setting.json file of spark:
>>
>> [
>>   {
>> "group": "spark",
>> "name": "spark",
>> "className": "org.apache.zeppelin.spark.SparkCustomInterpreter",
>> ...
>> "properties": {...}
>>   }, ...
>> ]
>>
>>
>> The problem with this approach is that when I run a paragraph it fails.
>> In general it fails because zeppelin uses both the class name of the
>> instance and the getClassName() method to access the instance, and that
>> causes many problems.
>>
>> *Approaches modifying apache zeppelin*
>>
>> There are two possible solutions related with the way in which the
>> sub-interpreters get the SparkInterpreter instance class, one is getting
>> the class name from a property:
>>
>>
>> private SparkInterpreter getSparkInterpreter() {
>>
>> ...
>>
>> Interpreter p = 
>> getInterpreterInTheSameSessionByClassName(*property.getProperty("zeppelin.spark.mainClass",
>>  SparkInterpreter.class.getName())* );
>>
>> }
>>
>> And the other possibility is to modify the method
>> Interpreter.getInterpreterInTheSameSessionByClassName(String) in order to
>> return the instance that whether has the same class name specified in the
>> parameter or which super class has the same class name specified in the
>> parameter:
>>
>>
>> @ZeppelinApi
>> public Interpreter getInterpreterInTheSameSessionByClassName(String 
>> className) {
>>   synchronized (interpreterGroup) {
>> for (List interpreters : interpreterG

Re: Extending SparkInterpreter functionality

2018-02-01 Thread Jeff Zhang
>>> Same spark versions but multiple clusters. So based on logged in user,
we may want to route to different spark cluster or even let user choose the
spark he wants to connect to.
If you use standalone, you can set `master` in interpreter setting for
different standalone cluster. If using yarn, then you can set
`HADOOP_CONF_DIR` for different yarn cluster.


>>> Are you okay with us working on those tickets?
Welcome contribution, I would love to help if you need.




Ankit Jain 于2018年2月2日周五 下午2:51写道:

> Hi Jeff,
> #3 is not about different spark versions.
>
> Same spark versions but multiple clusters. So based on logged in user, we
> may want to route to different spark cluster or even let user choose the
> spark he wants to connect to.
>
> Will work with Jhon to create tickets on other #2.
> What is the turn-around time for such tasks usually?
> Are you okay with us working on those tickets?
>
> Maybe we can setup a meeting early Monday to discuss our proposals in
> detail?
>
> Thanks
> Ankit
>
> On Feb 1, 2018, at 10:15 PM, Jeff Zhang  wrote:
>
>
> 1) Spark UI which works differently on EMR than standalone, so that logic
> will be in an interpreter specific to emr.
>Could you create a ticket for that, and please add details of that ? I
> don't know exactly what the difference between EMR and standalone, we can
> expose api to allow customization if necessary.
>
>
> 2) We want to add more metrics & logs in the interpreter, say number of
> requests coming to the interpreter.
>Could you create a ticket for that as well ? I think it is not
> difficult to do that.
>
> 3) Ideally we will like to connect to different spark clusters in
> spark-submit and not tie to one which happens on Zeppelin startup right now.
> It is possible now already, you can create different spark interpreter for
> different spark clusters. e.g. you can create spark_16 for spark 1.6 and
> spark_22 for spark 2.2, and what you need to do is just setting SPARK_HOME
> properly in their interpreter setting.
>
>
> Ankit Jain 于2018年2月2日周五 下午1:36写道:
>
>> This is exactly what we want Jeff! A hook to plug in our own interpreters.
>> (I am on same team as Jhon btw)
>>
>> Right now there are too many concrete references and injecting stuff is
>> not possible.
>>
>> Eg of customizations -
>> 1) Spark UI which works differently on EMR than standalone, so that logic
>> will be in an interpreter specific to emr.
>> 2) We want to add more metrics & logs in the interpreter, say number of
>> requests coming to the interpreter.
>> 3) Ideally we will like to connect to different spark clusters in
>> spark-submit and not tie to one which happens on Zeppelin startup right now.
>>
>> Basically we want to add lot more flexibility.
>>
>> We are building a platform to cater to multiple clients. So, multiple
>> Zeppelin instances, multiple spark clusters, multiple Spark UIs and on top
>> of that maintaining the security and privacy in a shared multi-tenant env
>> will need all the flexibility we can get!
>>
>> Thanks
>> Ankit
>>
>> On Feb 1, 2018, at 7:51 PM, Jeff Zhang  wrote:
>>
>>
>> Hi Jhon,
>>
>> Do you mind to share what kind of custom function you want to add to
>> spark interpreter ? One idea in my mind is that we could add extension
>> point to the existing SparkInterpreter, and user can enhance
>> SparkInterpreter via these extension point. That means we just open some
>> interfaces and users can implement those interfaces, and just add their
>> jars to spark interpreter folder.
>>
>>
>>
>> Jhon Anderson Cardenas Diaz 于2018年2月2日周五
>> 上午5:30写道:
>>
>>> Hello!
>>>
>>> I'm a software developer and as part of a project I require to extend
>>> the functionality of SparkInterpreter without modifying it. I need instead
>>> create a new interpreter that extends it or wrap its functionality.
>>>
>>> I also need the spark sub-interpreters to use my new custom
>>> interpreter, but the problem comes here, because the spark sub-interpreters
>>> has a direct dependency to spark interpreter as they use the class name of
>>> spark interpreter to obtain its instance:
>>>
>>>
>>> private SparkInterpreter getSparkInterpreter() {
>>>
>>> ...
>>>
>>> Interpreter p = 
>>> getInterpreterInTheSameSessionByClassName(SparkInterpreter.class.getName());
>>>
>>> }
>>>
>>>
>>> *Approach without modify apache zeppelin*
>>>
>>> My current approach to solve is to create a SparkCustomInterpreter that
>>> override the getClassName method as follows:
>>>
>>> public class SparkCustomInterpreter extends SparkInterpreter {
>>> ...
>>>
>>> @Override
>>> public String getClassName() {
>>> return SparkInterpreter.class.getName();
>>> }
>>> }
>>>
>>>
>>> and put the new class name in the interpreter-setting.json file of spark:
>>>
>>> [
>>>   {
>>> "group": "spark",
>>> "name": "spark",
>>> "className": "org.apache.zeppelin.spark.SparkCustomInterpreter",
>>> ...
>>> "properties": {...}
>>>   }, ...
>>> ]
>>>
>>>
>>> The problem with