[GitHub] zeppelin issue #3000: [ZEPPELIN-3467] two-step, atomic configuration file

2018-07-06 Thread sanjaydasgupta
Github user sanjaydasgupta commented on the issue:

https://github.com/apache/zeppelin/pull/3000
  
I've fixed the style issues pointed out.

One test still fails on travis--despite a rebase and restart. But the 
failure seems unrelated to the changes made in this PR.


---


[GitHub] zeppelin issue #3056: [ZEPPELIN-3567] fix InterpreterContext convert(...) me...

2018-07-06 Thread zjffdu
Github user zjffdu commented on the issue:

https://github.com/apache/zeppelin/pull/3056
  
LGTM


---


[GitHub] zeppelin issue #2231: ZEPPELIN-2150. NoSuchMethodError: org.apache.spark.ui....

2018-07-06 Thread SivaKaviyappa
Github user SivaKaviyappa commented on the issue:

https://github.com/apache/zeppelin/pull/2231
  
@yywwd - Few things to check
What are your zeppelin-env.sh and livy intrepreter settings
1.have you set the PYTHONPATH in the zeppelin-env.sh
2. zeppelin.livy.url=http://localhost:8998 - have set this value in your 
livy intrepreter?
3.  you have installed python libraries in all the core nodes?
I am using emr 5.11.0 and everything works fine. 



---


[GitHub] zeppelin issue #3000: [ZEPPELIN-3467] two-step, atomic configuration file

2018-07-06 Thread sanjaydasgupta
Github user sanjaydasgupta commented on the issue:

https://github.com/apache/zeppelin/pull/3000
  
Have fixed the style issues @felixcheung.

One of the tests is failing--despite rebase and restart--but appears to be 
unrelated to the code change.


---


[GitHub] zeppelin issue #3047: [ZEPPELIN-3574] fix large number rendering issue

2018-07-06 Thread Tagar
Github user Tagar commented on the issue:

https://github.com/apache/zeppelin/pull/3047
  
wow. that's pretty serious issue. thanks for fixing this. LGTM.


---


[GitHub] zeppelin pull request #3057: [Zeppelin 3582] Add type data to result of quer...

2018-07-06 Thread oxygen311
GitHub user oxygen311 opened a pull request:

https://github.com/apache/zeppelin/pull/3057

[Zeppelin 3582] Add type data to result of query from SQL

### What is this PR for?
JDBCInterpreter knows type information for every SQL Query. We could save 
this info to pool and use it.
There are three types of table column:
- Number;
- String;
- Date.

### What type of PR is it?
Improvement

### What is the Jira issue?
[Zeppelin 
3582](https://issues.apache.org/jira/projects/ZEPPELIN/issues/ZEPPELIN-3582)

### Screenshots
![screenshot from 2018-07-06 
18-20-29](https://user-images.githubusercontent.com/16215034/42386866-5f5e993a-8149-11e8-996a-c62a2a204f97.png)

### Questions:
* Does the licenses files need update? No
* Is there breaking changes for older versions? No
* Does this needs documentation? No

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/TinkoffCreditSystems/zeppelin ZEPPELIN-3582

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/zeppelin/pull/3057.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3057






---


[jira] [Created] (ZEPPELIN-3591) Some values of "args" property in interpreter settings for Spark ruin UDF execution

2018-07-06 Thread Denis Efarov (JIRA)
Denis Efarov created ZEPPELIN-3591:
--

 Summary: Some values of "args" property in interpreter settings 
for Spark ruin UDF execution
 Key: ZEPPELIN-3591
 URL: https://issues.apache.org/jira/browse/ZEPPELIN-3591
 Project: Zeppelin
  Issue Type: Bug
  Components: zeppelin-interpreter
Affects Versions: 0.7.2
 Environment: CentOS Linux 7.3.1611

Java 1.8.0_60

Scala 2.11.8

Spark 2.1.1

Hadoop 2.6.0

Zeppelin 0.7.2

 

 
Reporter: Denis Efarov


In "args" interpreter configuration property, any value which starts with "-" 
(minus) sign prevents correct UDF execution in Spark running on YARN. Text 
after "-" doesn't matter, it fails anyway. All the other properties do not 
affect this.

Steps to reproduce: 
 * On the interpreter settings page, find Spark interpreter
 * For "args" property, put any value starting with "-", for example "-test" 
 * Make sure spark starts on yarn (master=yarn-client)
 * Save settings and restart the interpreter
 * In any notebook, write and execute the following code:
 ** %spark
val udfDemo = (i: Int) => i + 10;
sqlContext.udf.register("demoUdf", (i: Int) => i);
sqlContext.sql("select demoUdf(1) val").show

Stacktrace:

{{java.lang.ClassCastException: cannot assign instance of 
scala.collection.immutable.List$SerializationProxy to field 
org.apache.spark.rdd.RDD.org$apache$spark$rdd$RDD$$dependencies_ of type 
scala.collection.Seq in instance of org.apache.spark.rdd.MapPartitionsRDD}}{{at 
java.io.ObjectStreamClass$FieldReflector.setObjFieldValues(ObjectStreamClass.java:2083)}}{{at
 java.io.ObjectStreamClass.setObjFieldValues(ObjectStreamClass.java:1261)}}{{at 
java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1996)}}{{at 
java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)}}{{at 
java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)}}{{at 
java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)}}{{at 
java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)}}{{at 
java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)}}{{at 
java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)}}{{at 
java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)}}{{at 
java.io.ObjectInputStream.readObject(ObjectInputStream.java:370)}}{{at 
org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:75)}}{{at
 
org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:114)}}{{at
 org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:80)}}{{at 
org.apache.spark.scheduler.Task.run(Task.scala:99)}}{{at 
org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:322)}}{{at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)}}{{at
 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)}}{{at
 java.lang.Thread.run(Thread.java:744)}}

Making the same UDF declaration in, for example, %pyspark interpreter, helps, 
even if one executes it in %spark.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] zeppelin pull request #3056: [ZEPPELIN-3567] fix InterpreterContext convert(...

2018-07-06 Thread Savalek
GitHub user Savalek opened a pull request:

https://github.com/apache/zeppelin/pull/3056

[ZEPPELIN-3567] fix InterpreterContext convert(...) method

### What is this PR for?
After commit 
[7af861...](https://github.com/apache/zeppelin/commit/7af86168254e0ad08234c57043e18179fca8d04c)
 will be lost convert of `config`.
This PR returning it back.
Because of this, the status of the autocomplete was lost after the run of 
the paragraph.


![tab_complition_fix](https://user-images.githubusercontent.com/30798933/42382820-17e4ea92-813e-11e8-994c-4791ccbfe16f.png)

### What type of PR is it?
Bug Fix

JIRA: [ZEPPELIN-3567](https://issues.apache.org/jira/browse/ZEPPELIN-3567)

### Questions:
* Does the licenses files need update? no
* Is there breaking changes for older versions? no
* Does this needs documentation? no


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/TinkoffCreditSystems/zeppelin ZEPPELIN-3567

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/zeppelin/pull/3056.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3056


commit 05f2659779ca0d606cd4d44408b15fa032d5e558
Author: Savalek 
Date:   2018-07-06T13:48:39Z

[ZEPPELIN-3567] fix InterpreterContext convert(...) method




---


[GitHub] zeppelin issue #3053: [ZEPPELIN-3583] Add function getNoteName() in Interpre...

2018-07-06 Thread egorklimov
Github user egorklimov commented on the issue:

https://github.com/apache/zeppelin/pull/3053
  
Look again please:
* Thrift file updated
* CI is green: 
https://travis-ci.org/TinkoffCreditSystems/zeppelin/builds/400831400


---


[GitHub] zeppelin pull request #3035: [ZEPPELIN-3553] Fix URLs on "Multi-user Support...

2018-07-06 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/zeppelin/pull/3035


---


[GitHub] zeppelin issue #2848: [Zeppelin-3307] - Improved shared browsing/editing for...

2018-07-06 Thread jongyoul
Github user jongyoul commented on the issue:

https://github.com/apache/zeppelin/pull/2848
  
Git it. Thanks. I thought it might not be useful. but I might be wrong. 
Thank you.


---


[GitHub] zeppelin issue #2848: [Zeppelin-3307] - Improved shared browsing/editing for...

2018-07-06 Thread mebelousov
Github user mebelousov commented on the issue:

https://github.com/apache/zeppelin/pull/2848
  
@jongyoul I see next usecase. 10 users open the note, put client ID in 
dynamic form, refresh note, get and process data  result. In this case getting 
of default note view is OK.
That is in personal mode we may not save note updates.


---


[jira] [Created] (ZEPPELIN-3590) Add test for spark streaming

2018-07-06 Thread Jeff Zhang (JIRA)
Jeff Zhang created ZEPPELIN-3590:


 Summary: Add test for spark streaming
 Key: ZEPPELIN-3590
 URL: https://issues.apache.org/jira/browse/ZEPPELIN-3590
 Project: Zeppelin
  Issue Type: Improvement
Reporter: Jeff Zhang






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


zeppelin 0.8.0-rc2 pyspark error

2018-07-06 Thread Panchappanavar, Naveenakumar Gurushantap (Nokia - IN/Bangalore)
Hi All,

I am running the streaming pyspark programme from pyspark interpreter by using 
zeppelin-0.8.0-rc2 code .

When pyspark streaming programme is being submitted, it is giving following 
error message, When we see the driver logs.

ERROR [2018-07-06 06:35:14,026] ({JobScheduler} Logging.scala[logError]:91) - 
Error generating jobs for time 1530858914000 ms
org.apache.zeppelin.py4j.Py4JException: Command Part is unknown: yro464

and following is the pyspark programme

%spark.pyspark
import time
from pyspark.streaming import StreamingContext
ssc = StreamingContext(sc, 1)
rddQueue = []
for i in range(5):
rddQueue += [ssc.sparkContext.parallelize([j for j in range(1, 1001)], 10)]
print rddQueue
  #Create the QueueInputDStream and use it do some processing
inputStream = ssc.queueStream(rddQueue)
mappedStream = inputStream.map(lambda x: (x % 10, 1))
reducedStream = mappedStream.reduceByKey(lambda a, b: a + b)
reducedStream.pprint()
ssc.start()
time.sleep(6)
ssc.stop(stopSparkContext=True, stopGraceFully=True)

any idea what we can do for this.

Regards
Naveen



[GitHub] zeppelin issue #2848: [Zeppelin-3307] - Improved shared browsing/editing for...

2018-07-06 Thread jongyoul
Github user jongyoul commented on the issue:

https://github.com/apache/zeppelin/pull/2848
  
@mebelousov The main purpose of "personalized mode" is to keep the current 
user's view. But if the user refreshes the browser, it changes the newest one. 
Do you think it's enough?


---


[GitHub] zeppelin issue #2848: [Zeppelin-3307] - Improved shared browsing/editing for...

2018-07-06 Thread mebelousov
Github user mebelousov commented on the issue:

https://github.com/apache/zeppelin/pull/2848
  
@jongyoul As I understand personal mode allows users run paragraphs and 
have different views and different results due user chosen values in dynamic 
forms.
I'm against the removal of personal mode.


---


[GitHub] zeppelin issue #3055: ZEPPELIN-3587. Interpret paragarph text as whole code ...

2018-07-06 Thread zjffdu
Github user zjffdu commented on the issue:

https://github.com/apache/zeppelin/pull/3055
  
oops, please ignore my last comment. Actually we could interpreter 
paragraph text as whole 


---


Re: [DISCUSS] Is interpreter binding necessary ?

2018-07-06 Thread Jeff Zhang
We already allow setting default interpreter when creating note. Another
way to set default interpreter is to reorder the interpreter setting
binding in note page.

But personally I don't recommend user to use short interpreter name because
of default interpreter. 2 Reaons:
1. It introduce in-accurate info. e.g. In our product, we have 2 spark
interpreters (`spark`: for spark 1.x & `spark2` for spark 2.x).  Then user
often specify `%spark` for spark interpreter. But it could mean both
`%spark.spark`  and `%spark2.spark`, So usually it is very hard to tell
what's wrong when user expect to work spark2 but actually he still use
spark 1.x. So usually we would recommend user to specify the full qualified
interpreter name. Just type several more characters which just cost 2
seconds but make it more clear and readable.
2. Another issue is that interpreter binding is stored in interpreter.json,
that means if they export this note to another zeppelin instance, the
default interpreter won't work.

So I don't think setting default interpreter via interpreter binding is
valuable for users. If user really want to do that, I would suggest to
store it in note.json instead of interpreter.json


Jongyoul Lee 于2018年7月6日周五 下午3:36写道:

> There are two purposes of interpreter binding. One is what you mentioned
> and another one is to manage a default interpreter. If we provide a new way
> to set default interpreter, I think we can remove them :-) We could set
> permissions in other ways.
>
> Overall, +1
>
> On Fri, Jul 6, 2018 at 4:24 PM, Jeff Zhang  wrote:
>
>> Hi Folks,
>>
>> I raise this thread to discuss whether we need the interpreter binding.
>> Currently when user create notes, they have to bind interpreters to their
>> notes in note page. Otherwise they will hit interpreter not found issue.
>> Besides that in zeppelin server side, we maintain the interpreter binding
>> info in memory as well as in interpreter.json.
>>
>> IMHO, it is not necessary to do interpreter binding. Because it just add
>> extra burden to maintain the interpreter binding info in zeppelin server
>> side, and doesn't introduce any benefits. The only benefit is that we will
>> check whether user have permission to use this interpreter, but actually
>> zeppelin will check the permission when running paragraph, so I don't think
>> we need to introduce interpreter binding just for this kind of permission
>> check that we will do later.
>>
>> So overall, I would suggest to remove interpreter binding feature.  What
>> do you think ?
>>
>
>
>
> --
> 이종열, Jongyoul Lee, 李宗烈
> http://madeng.net
>


Re: [DISCUSS] Is interpreter binding necessary ?

2018-07-06 Thread Jongyoul Lee
There are two purposes of interpreter binding. One is what you mentioned
and another one is to manage a default interpreter. If we provide a new way
to set default interpreter, I think we can remove them :-) We could set
permissions in other ways.

Overall, +1

On Fri, Jul 6, 2018 at 4:24 PM, Jeff Zhang  wrote:

> Hi Folks,
>
> I raise this thread to discuss whether we need the interpreter binding.
> Currently when user create notes, they have to bind interpreters to their
> notes in note page. Otherwise they will hit interpreter not found issue.
> Besides that in zeppelin server side, we maintain the interpreter binding
> info in memory as well as in interpreter.json.
>
> IMHO, it is not necessary to do interpreter binding. Because it just add
> extra burden to maintain the interpreter binding info in zeppelin server
> side, and doesn't introduce any benefits. The only benefit is that we will
> check whether user have permission to use this interpreter, but actually
> zeppelin will check the permission when running paragraph, so I don't think
> we need to introduce interpreter binding just for this kind of permission
> check that we will do later.
>
> So overall, I would suggest to remove interpreter binding feature.  What
> do you think ?
>



-- 
이종열, Jongyoul Lee, 李宗烈
http://madeng.net


[GitHub] zeppelin issue #2231: ZEPPELIN-2150. NoSuchMethodError: org.apache.spark.ui....

2018-07-06 Thread yywwd
Github user yywwd commented on the issue:

https://github.com/apache/zeppelin/pull/2231
  
@zjffdu NoSuchMethodError: org.apache.spark.ui.SparkUI.appUIAddress()
My AWS cluster is EMR-5.14.0, Application: Ganglia 3.7.2, Spark 2.3.0, 
Zeppelin 0.7.3, Livy 0.4.0


---


[GitHub] zeppelin issue #3054: [WIP] ZEPPELIN-3569. Improvement of FlinkInterpreter

2018-07-06 Thread zjffdu
Github user zjffdu commented on the issue:

https://github.com/apache/zeppelin/pull/3054
  
There's one critical issue in flink 1.5.0 FLINK-9554


---


[GitHub] zeppelin issue #2231: ZEPPELIN-2150. NoSuchMethodError: org.apache.spark.ui....

2018-07-06 Thread zjffdu
Github user zjffdu commented on the issue:

https://github.com/apache/zeppelin/pull/2231
  
What kind of error you see in zeppelin when using livy.spark.master yarn 
mode


---


[GitHub] zeppelin issue #3055: ZEPPELIN-3587. Don't stop to interpret when the next l...

2018-07-06 Thread zjffdu
Github user zjffdu commented on the issue:

https://github.com/apache/zeppelin/pull/3055
  
@felixcheung  I am afraid we have to break paragraph text. Zeppelin 
interpreter is different from scala shell. scala shell will execute the code 
when user type a complete code, while in zeppelin there may be multiple 
complete scala code.

e.g. If we type the following code in scala-shell, it will execute 
`sc.version`, then you can type `1+1`. While in zeppelin spark interpreter, we 
submit the whole code to spark interpreter which will break it and execute it 
via scala repl api.

```
sc.version
1+1
```



---


[DISCUSS] Is interpreter binding necessary ?

2018-07-06 Thread Jeff Zhang
Hi Folks,

I raise this thread to discuss whether we need the interpreter binding.
Currently when user create notes, they have to bind interpreters to their
notes in note page. Otherwise they will hit interpreter not found issue.
Besides that in zeppelin server side, we maintain the interpreter binding
info in memory as well as in interpreter.json.

IMHO, it is not necessary to do interpreter binding. Because it just add
extra burden to maintain the interpreter binding info in zeppelin server
side, and doesn't introduce any benefits. The only benefit is that we will
check whether user have permission to use this interpreter, but actually
zeppelin will check the permission when running paragraph, so I don't think
we need to introduce interpreter binding just for this kind of permission
check that we will do later.

So overall, I would suggest to remove interpreter binding feature.  What do
you think ?


[GitHub] zeppelin issue #3055: ZEPPELIN-3587. Don't stop to interpret when the next l...

2018-07-06 Thread jongyoul
Github user jongyoul commented on the issue:

https://github.com/apache/zeppelin/pull/3055
  
I think the most simple solution is to copy from the old interpret method. 
We discussed several times and fixed several times as well.


---


[GitHub] zeppelin issue #2231: ZEPPELIN-2150. NoSuchMethodError: org.apache.spark.ui....

2018-07-06 Thread yywwd
Github user yywwd commented on the issue:

https://github.com/apache/zeppelin/pull/2231
  
@zjffdu This issue is not related to the codes run in zeppelin. I mean, I 
cannot guarantee “Using Livy interpreter in Zeppelin”  and "Using 
Programmatic API" work well at the same time.

Using Zeppelin, I just write codes like this:
```
%livy.pyspark
sc.version
```
just test whether it works.


Using Programmatic API, I just use official examples: 
https://github.com/apache/incubator-livy/blob/master/examples/src/main/python/pi_app.py

**BUT!!!** 
In **livy.spark.master yarn-cluster** mode zeppelin works, programmatic API 
cannot work
In **livy.spark.master yarn** mode (the default mode), programmatic API 
works, zeppelin cannot work


---


[GitHub] zeppelin issue #2231: ZEPPELIN-2150. NoSuchMethodError: org.apache.spark.ui....

2018-07-06 Thread zjffdu
Github user zjffdu commented on the issue:

https://github.com/apache/zeppelin/pull/2231
  
Sorry, I still not sure what code you run in zeppelin. If it is livy 
related issue, you need to ask question in livy mail list. 


---


[GitHub] zeppelin issue #2231: ZEPPELIN-2150. NoSuchMethodError: org.apache.spark.ui....

2018-07-06 Thread yywwd
Github user yywwd commented on the issue:

https://github.com/apache/zeppelin/pull/2231
  
@zjffdu I'm sorry to response slowly, because I tried to reproduce the 
bugs. I thought it may caused by my codes, so I tried to use official PySpark 
examples, the bugs still occurred. This is the code I used. 
https://github.com/apache/incubator-livy/blob/master/examples/src/main/python/pi_app.py
**Note:** I comment the last line # client.stop(True), cause I won't want 
to close the session after submitting just one job. The details about this bug 
are as follows:

1. When I use default yarn mode, that is "yarn", official PySpark examples 
and programmatic API work well. Using Livy interpreter in Zeppelin will throw  
exception:` NoSuchMethodError: org.apache.spark.ui.SparkUI.appUIAddress()` for 
the spark master.

2. ThenI change the yarn mode into "yarn-cluster" as SivaKaviyappa 
suggested, Zeppelin works well. But the logs of this statement will hav a 
warning: `"Warning: Master yarn-cluster is deprecated since 2.0. Please use 
master \"yarn\" with specified deploy mode instead.`
However, Using programmatic API will have such bug:  

  2.1  I delete all Livy sessions, and run pi_app.py. It throw such 
exception:  
```
ReadTimeout: HTTPConnectionPool(host='172.31.5.251', port=8998): Read timed 
out. (read timeout=10)
Traceback (most recent call last):
  File "/home/ec2-user/wandongwu/livy_test_9/pi_app.py", line 35, in 

pi = client.submit(pi_job).result()
 File "/usr/local/lib/python2.7/site-packages/concurrent/futures/_base.py", 
line 462, in result
return self.__get_result()
  File 
"/usr/local/lib/python2.7/site-packages/concurrent/futures/_base.py", line 414, 
in __get_result
raise exception_type, self._exception, self._traceback
TypeError: raise: arg 3 must be a traceback or None
```
But I find It has started a new Livy session. So I edit configuration 
parameter into `'http://:8998/sessions/0' 2`. It will throw 
another exception:
```
Traceback (most recent call last):
  File "/home/ec2-user/wandongwu/livy_test_9/pi_app.py", line 35, in 

pi = client.submit(pi_job).result()
  File 
"/usr/local/lib/python2.7/site-packages/concurrent/futures/_base.py", line 462, 
in result
return self.__get_result()
  File 
"/usr/local/lib/python2.7/site-packages/concurrent/futures/_base.py", line 414, 
in __get_result
raise exception_type, self._exception, self._traceback
Exception: org.apache.livy.repl.PythonJobException: Client job 
error:Traceback (most recent call last):
  File 
"/mnt/yarn/usercache/livy/appcache/application_1528945006613_0302/container_1528945006613_0302_01_01/tmp/4991895008696585180",
 line 159, in processBypassJob
deserialized_job = pickle.loads(serialized_job)
  File "/usr/lib64/python2.7/pickle.py", line 1388, in loads
return Unpickler(file).load()
  File "/usr/lib64/python2.7/pickle.py", line 864, in load
dispatch[key](self)
  File "/usr/lib64/python2.7/pickle.py", line 1096, in load_global
klass = self.find_class(module, name)
  File "/usr/lib64/python2.7/pickle.py", line 1130, in find_class
__import__(module)
ImportError: No module named cloudpickle.cloudpickle
```

3. Then I change yarn mode into default mode, that is "yarn", programmatic 
API can work well, but Zeppelin still cannot work.


---