[jira] [Created] (ZEPPELIN-3886) Remove dependency on flatmap-stream 0.1.1

2018-11-28 Thread Ruslan Dautkhanov (JIRA)
Ruslan Dautkhanov created ZEPPELIN-3886:
---

 Summary: Remove dependency on flatmap-stream 0.1.1
 Key: ZEPPELIN-3886
 URL: https://issues.apache.org/jira/browse/ZEPPELIN-3886
 Project: Zeppelin
  Issue Type: Bug
  Components: build, Core, Interpreters
Affects Versions: 0.8.0, 0.9.0, 0.8.1
Reporter: Ruslan Dautkhanov


copy-pasting [~derektapley]'s report in ZEPPELIN-3881

https://issues.apache.org/jira/browse/ZEPPELIN-3881?focusedCommentId=16702336=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16702336

 
{panel}
I see that the error is do to flatmap-stream 0.1.1 not being found, which is a 
dependency of the event-stream library.  It turns out this might actually be 
due to being a "poisoned' library, as some news articles recently indicate that 
event-stream was [backdoored to exploit a popular cryptocurrency 
wallet|[https://www.zdnet.com/article/hacker-backdoors-popular-javascript-library-to-steal-bitcoin-funds/].]
  As such, npmjs.com has removed the dependency and the event-stream version 
needs to be updated to the latest, 4.0.1.
{panel}
 

 It seems that zeppelin master build is broken due to this.

Would it be possible to remove dependency of either `flatmap-stream` or 
`event-stream` or find a secure equivalent ?

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ZEPPELIN-3873) copyResourceToPythonWorkDir("python/mpl_config.py", "mpl_config.py") for IPythonInterpreter

2018-11-20 Thread Ruslan Dautkhanov (JIRA)
Ruslan Dautkhanov created ZEPPELIN-3873:
---

 Summary: copyResourceToPythonWorkDir("python/mpl_config.py", 
"mpl_config.py") for IPythonInterpreter
 Key: ZEPPELIN-3873
 URL: https://issues.apache.org/jira/browse/ZEPPELIN-3873
 Project: Zeppelin
  Issue Type: Bug
Affects Versions: 0.8.0
Reporter: Ruslan Dautkhanov


This example is broken in the new IPythonInterpreter

[https://zeppelin.apache.org/docs/latest/interpreter/spark.html#matplotlib-integration-pyspark]

getting no module named 'mpl_config'

mpl_config is part of core Zeppelin

The old Python Interpreter was copying it manually here

[https://github.com/apache/zeppelin/blob/0d746fa2e2787a661db70d74035120ae3516ace3/python/src/main/java/org/apache/zeppelin/python/PythonInterpreter.java#L179]

New IPythonInterpeter doesn't do this.

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ZEPPELIN-3719) LdapGroupRealm allows to login with empty password

2018-08-15 Thread Ruslan Dautkhanov (JIRA)
Ruslan Dautkhanov created ZEPPELIN-3719:
---

 Summary: LdapGroupRealm allows to login with empty password
 Key: ZEPPELIN-3719
 URL: https://issues.apache.org/jira/browse/ZEPPELIN-3719
 Project: Zeppelin
  Issue Type: Bug
  Components: security
Affects Versions: 0.8.0
Reporter: Ruslan Dautkhanov


We use LDAPGroupRealm for authentication.

Not sure how we didn't notice, but just entering *empty* password allows to 
login (!)

Hopefully it's just a misconfiguration on our side, but if it's not, it looks 
like a big security hole.

Looking at the code, there should be an exception here

[https://github.com/apache/zeppelin/blob/master/zeppelin-server/src/main/java/org/apache/zeppelin/rest/LoginRestApi.java#L165]

but it doesn't happen. 

Changed log4j logging to DEBUG but still don't see any traces why this happens. 

Can somebody else please try to see if they can reproduce?

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ZEPPELIN-3511) remove old button "Download Data as CSV/TSV"

2018-05-29 Thread Ruslan Dautkhanov (JIRA)
Ruslan Dautkhanov created ZEPPELIN-3511:
---

 Summary: remove old button "Download Data as CSV/TSV"
 Key: ZEPPELIN-3511
 URL: https://issues.apache.org/jira/browse/ZEPPELIN-3511
 Project: Zeppelin
  Issue Type: Improvement
  Components: front-end
Affects Versions: 0.8.0, 0.9.0, 0.8.1
Reporter: Ruslan Dautkhanov


As discussed on 
[PR-2601|https://github.com/apache/zeppelin/pull/2601#issuecomment-356609580] 
and 
[PR-2971|https://github.com/apache/zeppelin/pull/2971#issuecomment-391219166], 
having two separate and incompatible ways to export csv from a datagrid is 
confusing to the users.

Moreover, the old way of exporting has some issues (doesn't comply with doesn't 
conform to RFC-4180 ) as described on 
[ZEPPELIN-1803|https://issues.apache.org/jira/browse/ZEPPELIN-1803], 
[ZEPPELIN-2956|https://issues.apache.org/jira/browse/ZEPPELIN-2956] .

This jira is to *remove* old way of exporting data as csv/tsv. 

The new way of exporting through angular-ui-grid doesn't have those issues and 
also has an option of exporting data as xlsx and exporting only visible data.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ZEPPELIN-3505) IPython interpreter: ERROR:tornado.general:Uncaught exception in ZMQStream callback

2018-05-26 Thread Ruslan Dautkhanov (JIRA)
Ruslan Dautkhanov created ZEPPELIN-3505:
---

 Summary: IPython interpreter: ERROR:tornado.general:Uncaught 
exception in ZMQStream callback
 Key: ZEPPELIN-3505
 URL: https://issues.apache.org/jira/browse/ZEPPELIN-3505
 Project: Zeppelin
  Issue Type: Bug
  Components: Interpreters, pySpark, python-interpreter, 
zeppelin-interpreter
Affects Versions: 0.8.0, 0.9.0, 0.8.1
Reporter: Ruslan Dautkhanov


Getting following exceptions in IPython interpreter.. 

ERROR:tornado.general:Uncaught exception in ZMQStream callback 
ValueError: signal only works in main thread
ERROR:tornado.general:Uncaught exception in zmqstream callback
ERROR:tornado.application:Exception in callback 
...
raise RuntimeError("IOLoop is already running")

Complete list of exceptions - 

{noformat}
DEBUG [2018-05-26 10:28:29,453] ({Exec Stream Pumper} 
IPythonInterpreter.java[processLine]:411) - Process Output: Traceback (most 
recent call last):
DEBUG [2018-05-26 10:28:29,453] ({Exec Stream Pumper} 
IPythonInterpreter.java[processLine]:411) - Process Output:   File 
"/opt/cloudera/parcels/Anaconda/lib/python2.7/runpy.py", line 174, in 
_run_module_as_main
DEBUG [2018-05-26 10:28:29,454] ({Exec Stream Pumper} 
IPythonInterpreter.java[processLine]:411) - Process Output: "__main__", 
fname, loader, pkg_name)
DEBUG [2018-05-26 10:28:29,454] ({Exec Stream Pumper} 
IPythonInterpreter.java[processLine]:411) - Process Output:   File 
"/opt/cloudera/parcels/Anaconda/lib/python2.7/runpy.py", line 72, in _run_code
DEBUG [2018-05-26 10:28:29,454] ({Exec Stream Pumper} 
IPythonInterpreter.java[processLine]:411) - Process Output: exec code in 
run_globals
DEBUG [2018-05-26 10:28:29,454] ({Exec Stream Pumper} 
IPythonInterpreter.java[processLine]:411) - Process Output:   File 
"/opt/cloudera/parcels/Anaconda-4.4.0/lib/python2.7/site-packages/ipykernel_launcher.py",
 line 16, in 
DEBUG [2018-05-26 10:28:29,454] ({Exec Stream Pumper} 
IPythonInterpreter.java[processLine]:411) - Process Output: 
app.launch_new_instance()
DEBUG [2018-05-26 10:28:29,454] ({Exec Stream Pumper} 
IPythonInterpreter.java[processLine]:411) - Process Output:   File 
"/opt/cloudera/parcels/Anaconda/lib/python2.7/site-packages/traitlets/config/application.py",
 line 658, in launch_instance
DEBUG [2018-05-26 10:28:29,455] ({Exec Stream Pumper} 
IPythonInterpreter.java[processLine]:411) - Process Output: app.start()
DEBUG [2018-05-26 10:28:29,455] ({Exec Stream Pumper} 
IPythonInterpreter.java[processLine]:411) - Process Output: 
ERROR:tornado.general:Uncaught exception in ZMQStream callback
DEBUG [2018-05-26 10:28:29,455] ({Exec Stream Pumper} 
IPythonInterpreter.java[processLine]:411) - Process Output: Traceback (most 
recent call last):
DEBUG [2018-05-26 10:28:29,455] ({Exec Stream Pumper} 
IPythonInterpreter.java[processLine]:411) - Process Output:   File 
"/opt/cloudera/parcels/Anaconda/lib/python2.7/site-packages/zmq/eventloop/zmqstream.py",
 line 432, in _run_callback
DEBUG [2018-05-26 10:28:29,455] ({Exec Stream Pumper} 
IPythonInterpreter.java[processLine]:411) - Process Output: callback(*args, 
**kwargs)
DEBUG [2018-05-26 10:28:29,455] ({Exec Stream Pumper} 
IPythonInterpreter.java[processLine]:411) - Process Output:   File 
"/opt/cloudera/parcels/Anaconda/lib/python2.7/site-packages/tornado/stack_context.py",
 line 276, in null_wrapper
DEBUG [2018-05-26 10:28:29,455] ({Exec Stream Pumper} 
IPythonInterpreter.java[processLine]:411) - Process Output: return 
fn(*args, **kwargs)
DEBUG [2018-05-26 10:28:29,455] ({Exec Stream Pumper} 
IPythonInterpreter.java[processLine]:411) - Process Output:   File 
"/opt/cloudera/parcels/Anaconda/lib/python2.7/site-packages/ipykernel/kernelbase.py",
 line 283, in dispatcher
DEBUG [2018-05-26 10:28:29,455] ({Exec Stream Pumper} 
IPythonInterpreter.java[processLine]:411) - Process Output: return 
self.dispatch_shell(stream, msg)
DEBUG [2018-05-26 10:28:29,455] ({Exec Stream Pumper} 
IPythonInterpreter.java[processLine]:411) - Process Output:   File 
"/opt/cloudera/parcels/Anaconda/lib/python2.7/site-packages/ipykernel/kernelbase.py",
 line 233, in dispatch_shell
DEBUG [2018-05-26 10:28:29,455] ({Exec Stream Pumper} 
IPythonInterpreter.java[processLine]:411) - Process Output: 
self.pre_handler_hook()
DEBUG [2018-05-26 10:28:29,455] ({Exec Stream Pumper} 
IPythonInterpreter.java[processLine]:411) - Process Output:   File 
"/opt/cloudera/parcels/Anaconda/lib/python2.7/site-packages/ipykernel/kernelbase.py",
 line 248, in pre_handler_hook
DEBUG [2018-05-26 10:28:29,455] ({Exec Stream Pumper} 
IPythonInterpreter.java[processLine]:411) - Process Output: 
self.saved_sigint_handler = signal(SIGINT, default_int_handler)
DEBUG [2018-05-26 10:28:29,455] ({Exec Stream Pumper} 
IPythonInterpreter.java[processLine]:411) - Process Output: ValueError: signal 
only works in main thread
DEBUG 

[jira] [Created] (ZEPPELIN-3504) Suppress some org.glassfish.jersey.internal.inject.Providers warnings

2018-05-25 Thread Ruslan Dautkhanov (JIRA)
Ruslan Dautkhanov created ZEPPELIN-3504:
---

 Summary: Suppress some 
org.glassfish.jersey.internal.inject.Providers warnings
 Key: ZEPPELIN-3504
 URL: https://issues.apache.org/jira/browse/ZEPPELIN-3504
 Project: Zeppelin
  Issue Type: Improvement
Affects Versions: 0.8.0, 0.9.0, 0.8.1
Reporter: Ruslan Dautkhanov


It would be great to suppress warning like [1] which we see a lot after jersey 
upgrade.

As discussed on [https://github.com/jersey/jersey/issues/3700], it's possible 
to workaround 
 by calling
{code:java}
Java.lang.System.setErr(){code}
 to set 

{quote}org.glassfish.jersey.internal.inject.Providers=SEVERE{quote}

Thanks to [~zjffdu] for pointing this out.

 

[1]
{noformat}
WARNING: A provider org.apache.zeppelin.rest.SecurityRestApi registered in 
SERVER runtime does not implement any provider interfaces applicable in the 
SERVER runtime. Due to constraint configuration problems the provider 
org.apache.zeppelin.rest.SecurityRestApi will be ignored.
 
May 22, 2018 11:21:57 AM org.glassfish.jersey.internal.inject.Providers 
checkProviderRuntime
WARNING: A provider org.apache.zeppelin.rest.InterpreterRestApi registered in 
SERVER runtime does not implement any provider interfaces applicable in the 
SERVER runtime. Due to constraint configuration problems the provider 
org.apache.zeppelin.rest.InterpreterRestApi will be ignored.
 
May 22, 2018 11:21:57 AM org.glassfish.jersey.internal.inject.Providers 
checkProviderRuntime
WARNING: A provider org.apache.zeppelin.rest.LoginRestApi registered in SERVER 
runtime does not implement any provider interfaces applicable in the SERVER 
runtime. Due to constraint configuration problems the provider 
org.apache.zeppelin.rest.LoginRestApi will be ignored.
 
May 22, 2018 11:21:57 AM org.glassfish.jersey.internal.inject.Providers 
checkProviderRuntime
WARNING: A provider org.apache.zeppelin.rest.NotebookRepoRestApi registered in 
SERVER runtime does not implement any provider interfaces applicable in the 
SERVER runtime. Due to constraint configuration problems the provider 
org.apache.zeppelin.rest.NotebookRepoRestApi will be ignored.
 
May 22, 2018 11:21:57 AM org.glassfish.jersey.internal.inject.Providers 
checkProviderRuntime
WARNING: A provider org.apache.zeppelin.rest.HeliumRestApi registered in SERVER 
runtime does not implement any provider interfaces applicable in the SERVER 
runtime. Due to constraint configuration problems the provider 
org.apache.zeppelin.rest.HeliumRestApi will be ignored.
 
May 22, 2018 11:21:57 AM org.glassfish.jersey.internal.inject.Providers 
checkProviderRuntime
WARNING: A provider org.apache.zeppelin.rest.NotebookRestApi registered in 
SERVER runtime does not implement any provider interfaces applicable in the 
SERVER runtime. Due to constraint configuration problems the provider 
org.apache.zeppelin.rest.NotebookRestApi will be ignored.
 
May 22, 2018 11:21:57 AM org.glassfish.jersey.internal.inject.Providers 
checkProviderRuntime
WARNING: A provider org.apache.zeppelin.rest.ConfigurationsRestApi registered 
in SERVER runtime does not implement any provider interfaces applicable in the 
SERVER runtime. Due to constraint configuration problems the provider 
org.apache.zeppelin.rest.ConfigurationsRestApi will be ignored.
 
May 22, 2018 11:21:57 AM org.glassfish.jersey.internal.inject.Providers 
checkProviderRuntime
WARNING: A provider org.apache.zeppelin.rest.CredentialRestApi registered in 
SERVER runtime does not implement any provider interfaces applicable in the 
SERVER runtime. Due to constraint configuration problems the provider 
org.apache.zeppelin.rest.CredentialRestApi will be ignored.
 
May 22, 2018 11:21:57 AM org.glassfish.jersey.internal.inject.Providers 
checkProviderRuntime
WARNING: A provider org.apache.zeppelin.rest.ZeppelinRestApi registered in 
SERVER runtime does not implement any provider interfaces applicable in the 
SERVER runtime. Due to constraint configuration problems the provider 
org.apache.zeppelin.rest.ZeppelinRestApi will be ignored.
 
May 22, 2018 11:21:57 AM org.glassfish.jersey.internal.Errors logErrors
WARNING: The following warnings have been detected: WARNING: A HTTP GET method, 
public javax.ws.rs.core.Response 
org.apache.zeppelin.rest.InterpreterRestApi.listInterpreter(java.lang.String), 
should not consume any entity.
 
WARNING: A HTTP GET method, public javax.ws.rs.core.Response 
org.apache.zeppelin.rest.CredentialRestApi.getCredentials(java.lang.String) 
throws java.io.IOException,java.lang.IllegalArgumentException, should not 
consume any entity.
 
{noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ZEPPELIN-3487) Spark SQL (%sql) paragraphs for DDLs should suppress datagrid-ui

2018-05-22 Thread Ruslan Dautkhanov (JIRA)
Ruslan Dautkhanov created ZEPPELIN-3487:
---

 Summary: Spark SQL (%sql) paragraphs for DDLs should suppress 
datagrid-ui
 Key: ZEPPELIN-3487
 URL: https://issues.apache.org/jira/browse/ZEPPELIN-3487
 Project: Zeppelin
  Issue Type: Improvement
  Components: Core, Interpreters, zeppelin-interpreter
Affects Versions: 0.7.3, 0.7.2, 0.8.0, 0.7.4, 0.9.0, 0.8.1
Reporter: Ruslan Dautkhanov
 Attachments: image-2018-05-22-15-43-02-964.png

It would be super nice if Spark interpreter would understand not all Spark SQL 
queries return data.

All DDL statements don't.

So we end up with a lot of paragraphs that show space-consuming empty datagrid 
boxes like one below:

 

!image-2018-05-22-15-43-02-964.png!

 

%sql paragraphs for DDLs should suppress datagrid-ui altogether.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ZEPPELIN-3486) Spark SQL interpreter doesn't show %sql (Spark SQL ) exceptions

2018-05-22 Thread Ruslan Dautkhanov (JIRA)
Ruslan Dautkhanov created ZEPPELIN-3486:
---

 Summary: Spark SQL interpreter doesn't show %sql (Spark SQL ) 
exceptions
 Key: ZEPPELIN-3486
 URL: https://issues.apache.org/jira/browse/ZEPPELIN-3486
 Project: Zeppelin
  Issue Type: Bug
  Components: Core, zeppelin-interpreter
Affects Versions: 0.7.3, 0.8.0, 0.7.4, 0.9.0, 0.8.1
Reporter: Ruslan Dautkhanov
 Attachments: image-2018-05-22-15-36-38-774.png

When I run a Spark SQL ddl statement (like create temporary function), Spark 
interpreter doesn't show actual error:

!image-2018-05-22-15-36-38-774.png!

I have to go to Spark interpreter log (and I have DEBUG level set) to get true 
root cause of the issue (this is just an example):
{noformat}
ERROR [2018-05-22 15:33:36,345] ({pool-2-thread-11} 
SparkSqlInterpreter.java[interpret]:125) - Invocation target exception
java.lang.reflect.InvocationTargetException
at sun.reflect.GeneratedMethodAccessor18.invoke(Unknown Source)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.apache.zeppelin.spark.SparkSqlInterpreter.interpret(SparkSqlInterpreter.java:120)
at 
org.apache.zeppelin.interpreter.LazyOpenInterpreter.interpret(LazyOpenInterpreter.java:103)
at 
org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer$InterpretJob.jobRun(RemoteInterpreterServer.java:625)
at org.apache.zeppelin.scheduler.Job.run(Job.java:185)
at 
org.apache.zeppelin.scheduler.ParallelScheduler$JobRunner.run(ParallelScheduler.java:162)
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.ClassNotFoundException: com.epsilon.some.nonextsitent.class
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:348)
{noformat}

Most of our users don't know how to look inside of Spark interpreter logs. 
It would be great if Spark interpreter would show all exceptions.




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ZEPPELIN-3485) getting "JSON file size cannot exceed 4 MB"

2018-05-22 Thread Ruslan Dautkhanov (JIRA)
Ruslan Dautkhanov created ZEPPELIN-3485:
---

 Summary: getting "JSON file size cannot exceed 4 MB"
 Key: ZEPPELIN-3485
 URL: https://issues.apache.org/jira/browse/ZEPPELIN-3485
 Project: Zeppelin
  Issue Type: Bug
Reporter: Ruslan Dautkhanov
 Attachments: image-2018-05-22-15-06-41-470.png





--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ZEPPELIN-3484) sc.setJobGroup() shows up in error stack and shifts line numbering

2018-05-22 Thread Ruslan Dautkhanov (JIRA)
Ruslan Dautkhanov created ZEPPELIN-3484:
---

 Summary: sc.setJobGroup() shows up in error stack and shifts line 
numbering
 Key: ZEPPELIN-3484
 URL: https://issues.apache.org/jira/browse/ZEPPELIN-3484
 Project: Zeppelin
  Issue Type: Bug
Affects Versions: 0.8.0, 0.9.0, 0.8.1
Reporter: Ruslan Dautkhanov
 Attachments: image-2018-05-22-14-36-16-569.png

sc.setJobGroup() shows up in all exception stacks which is confusing for some 
users.

Also it shifts line numbers by 1. See example below.

It would be great to move sc.setJobGroup() out of user code, to solve both of 
the above issues.

 

!image-2018-05-22-14-36-16-569.png!



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ZEPPELIN-3483) NPE in io.grpc.stub.ClientCalls$StreamObserverToCallListenerAdapter.onClose

2018-05-22 Thread Ruslan Dautkhanov (JIRA)
Ruslan Dautkhanov created ZEPPELIN-3483:
---

 Summary: NPE in 
io.grpc.stub.ClientCalls$StreamObserverToCallListenerAdapter.onClose
 Key: ZEPPELIN-3483
 URL: https://issues.apache.org/jira/browse/ZEPPELIN-3483
 Project: Zeppelin
  Issue Type: Bug
  Components: Core
Affects Versions: 0.8.0, 0.9.0, 0.8.1
 Environment: latest Zeppelin from master snapshot
Reporter: Ruslan Dautkhanov


 

[~jeffzhang], looking at the timing and exception stack it seems happens at 
interpreter restart?

Doesn't seem to break anything, but I figured out I better point this out in 
case if it can cause any issues.

{noformat}

ERROR [2018-05-22 10:35:33,232] (\{grpc-default-executor-1} 
SerializingExecutor.java[run]:120) - Exception while executing runnable 
io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl$1StreamClosed@5793c1b4
java.lang.NullPointerException
 at 
io.grpc.stub.ClientCalls$StreamObserverToCallListenerAdapter.onClose(ClientCalls.java:395)
 at io.grpc.internal.ClientCallImpl.closeObserver(ClientCallImpl.java:426)
 at io.grpc.internal.ClientCallImpl.access$100(ClientCallImpl.java:76)
 at 
io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl.close(ClientCallImpl.java:512)
 at 
io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl.access$700(ClientCallImpl.java:429)
 at 
io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl$1StreamClosed.runInContext(ClientCallImpl.java:544)
 at io.grpc.internal.ContextRunnable.run(ContextRunnable.java:52)
 at io.grpc.internal.SerializingExecutor.run(SerializingExecutor.java:117)
 at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
 at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
 at java.lang.Thread.run(Thread.java:748)
ERROR [2018-05-22 11:00:46,949] (\{grpc-default-executor-1} 
SerializingExecutor.java[run]:120) - Exception while executing runnable 
io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl$1StreamClosed@485a4f69
java.lang.NullPointerException
 at 
io.grpc.stub.ClientCalls$StreamObserverToCallListenerAdapter.onClose(ClientCalls.java:395)
 at io.grpc.internal.ClientCallImpl.closeObserver(ClientCallImpl.java:426)
 at io.grpc.internal.ClientCallImpl.access$100(ClientCallImpl.java:76)
 at 
io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl.close(ClientCallImpl.java:512)
 at 
io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl.access$700(ClientCallImpl.java:429)
 at 
io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl$1StreamClosed.runInContext(ClientCallImpl.java:544)
 at io.grpc.internal.ContextRunnable.run(ContextRunnable.java:52)
 at io.grpc.internal.SerializingExecutor.run(SerializingExecutor.java:117)
 at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
 at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
 at java.lang.Thread.run(Thread.java:748)

{noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ZEPPELIN-3478) Download Data as CSV downloads data as a single line

2018-05-21 Thread Ruslan Dautkhanov (JIRA)
Ruslan Dautkhanov created ZEPPELIN-3478:
---

 Summary: Download Data as CSV downloads data as a single line
 Key: ZEPPELIN-3478
 URL: https://issues.apache.org/jira/browse/ZEPPELIN-3478
 Project: Zeppelin
  Issue Type: Bug
Affects Versions: 0.8.0, 0.9.0
Reporter: Ruslan Dautkhanov
 Attachments: zep-csv-export-bug.png

Download data as CSV and as TSV seems to be broken in master snapshot

!zep-csv-export-bug.png!





--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ZEPPELIN-3467) two-step, atomic configuration file writes

2018-05-16 Thread Ruslan Dautkhanov (JIRA)
Ruslan Dautkhanov created ZEPPELIN-3467:
---

 Summary: two-step, atomic configuration file writes
 Key: ZEPPELIN-3467
 URL: https://issues.apache.org/jira/browse/ZEPPELIN-3467
 Project: Zeppelin
  Issue Type: Improvement
Affects Versions: 0.7.3, 0.9.0, 0.8.1
Reporter: Ruslan Dautkhanov


We have seen when a file system runs full, Zeppelin nullifies its configuration 
files.
It open file for write, tries to write a file and that latter operation fails.
We end up with losing configuration completely. 

It should be done this two-step approach instead:
- write configuration files (like `interpreter.json` and other such files) to a 
temp file in the same directory
- rename file from temp to target name. 

This would guarantee no partial writes are done, leaving corrupted 
configuration files.
Also POSIX filesystems guarantee that file `rename`  is done atomically, having 
no room for any side effects. 




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ZEPPELIN-3334) Set spark.scheduler.pool to authenticate user name

2018-03-14 Thread Ruslan Dautkhanov (JIRA)
Ruslan Dautkhanov created ZEPPELIN-3334:
---

 Summary: Set spark.scheduler.pool to authenticate user name
 Key: ZEPPELIN-3334
 URL: https://issues.apache.org/jira/browse/ZEPPELIN-3334
 Project: Zeppelin
  Issue Type: Improvement
Reporter: Ruslan Dautkhanov


Setting *spark.scheduler.pool* to authenticated user name would allow to have 
multiple resource pools for different users when using shared Spark context / 
shared Spark Interpreter;

This improvement request is for "The interpreter will be instantiated 
*Globally* in *shared* process" Spark Interpreter mode.
 
 Per Spark documentation, 
[https://spark.apache.org/docs/latest/job-scheduling.html] 
  
{quote}" _within_ each Spark application, multiple “jobs” (Spark actions) may 
be running concurrently if they were submitted by different threads 
 ... /skip/
 threads. By “job”, in this section, we mean a Spark action (e.g. {{save}}, 
{{collect}}) and any tasks that need to run to evaluate that action. Spark’s 
scheduler is fully thread-safe and supports this use case to enable 
applications that serve multiple requests (e.g. queries for multiple users).
 ... /skip/
 Without any intervention, newly submitted jobs go into a _default pool_, but 
jobs’ pools can be set by adding the {{*spark.scheduler.pool*}} “local 
property” to the SparkContext in the thread that’s submitting them.    "
{quote}
Notice that setting *spark.scheduler.pool* to authenticated user name has to be 
done *in a separate thread* - assuming Zeppelin internally has a separate 
thread for each separate authenticated user.. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ZEPPELIN-3327) NPE when Spark interpreter couldn't start

2018-03-13 Thread Ruslan Dautkhanov (JIRA)
Ruslan Dautkhanov created ZEPPELIN-3327:
---

 Summary: NPE when Spark interpreter couldn't start
 Key: ZEPPELIN-3327
 URL: https://issues.apache.org/jira/browse/ZEPPELIN-3327
 Project: Zeppelin
  Issue Type: Bug
Affects Versions: 0.8.0, 0.9.0
Reporter: Ruslan Dautkhanov
 Attachments: image-2018-03-13-19-16-46-353.png, 
image-2018-03-13-19-19-59-364.png

When Spark couldn't start on backend, Zeppelin just shows NPE:

!image-2018-03-13-19-16-46-353.png!

What it should have printed, is true root cause or exception as it was given by 
spark-submit.

To reproduce, for example, add an invalid spark interpreter setting, like
 !image-2018-03-13-19-19-59-364.png! 
and try to start Spark interpreter to reproduce NPE.

This is confusing to users not to see true error obstructed by NPE.

Zeppelin should transparently deliver exception, as was produced by Spark, like 
in this example:
{noformat}
Caused by: java.lang.NumberFormatException: Size must be specified as bytes 
(b), kibibytes (k), mebibytes (m), gibibytes (g), tebibytes (t), or 
pebibytes(p). E.g. 50b, 100k, or 250m.
Invalid suffix: "petabytes"
at 
org.apache.spark.network.util.JavaUtils.byteStringAs(JavaUtils.java:291)
at 
org.apache.spark.network.util.JavaUtils.byteStringAsBytes(JavaUtils.java:302)
at org.apache.spark.util.Utils$.byteStringAsBytes(Utils.scala:1087)
at org.apache.spark.SparkConf.getSizeAsBytes(SparkConf.scala:302)
at 
org.apache.spark.memory.UnifiedMemoryManager$.getMaxMemory(UnifiedMemoryManager.scala:223)
at 
org.apache.spark.memory.UnifiedMemoryManager$.apply(UnifiedMemoryManager.scala:199)
at org.apache.spark.SparkEnv$.create(SparkEnv.scala:332)
at org.apache.spark.SparkEnv$.createDriverEnv(SparkEnv.scala:175)
at org.apache.spark.SparkContext.createSparkEnv(SparkContext.scala:257)
at org.apache.spark.SparkContext.(SparkContext.scala:432)
{noformat}
Notice I had to dig deep into the logs to find root cause and not every user 
can do that.

Full exception from interpreter log -

{noformat}
ERROR [2018-03-13 19:15:26,476] ({pool-2-thread-2} 
PySparkInterpreter.java[open]:203) - Error
java.lang.NullPointerException
at org.apache.zeppelin.spark.Utils.invokeMethod(Utils.java:44)
at org.apache.zeppelin.spark.Utils.invokeMethod(Utils.java:39)
at 
org.apache.zeppelin.spark.OldSparkInterpreter.createSparkContext_2(OldSparkInterpreter.java:375)
at 
org.apache.zeppelin.spark.OldSparkInterpreter.createSparkContext(OldSparkInterpreter.java:364)
at 
org.apache.zeppelin.spark.OldSparkInterpreter.getSparkContext(OldSparkInterpreter.java:172)
at 
org.apache.zeppelin.spark.OldSparkInterpreter.open(OldSparkInterpreter.java:740)
at 
org.apache.zeppelin.spark.SparkInterpreter.open(SparkInterpreter.java:61)
at 
org.apache.zeppelin.interpreter.LazyOpenInterpreter.open(LazyOpenInterpreter.java:69)
at 
org.apache.zeppelin.spark.PySparkInterpreter.getSparkInterpreter(PySparkInterpreter.java:665)
at 
org.apache.zeppelin.spark.PySparkInterpreter.createGatewayServerAndStartScript(PySparkInterpreter.java:273)
at 
org.apache.zeppelin.spark.PySparkInterpreter.open(PySparkInterpreter.java:201)
at 
org.apache.zeppelin.interpreter.LazyOpenInterpreter.open(LazyOpenInterpreter.java:69)
at 
org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer$InterpretJob.jobRun(RemoteInterpreterServer.java:618)
at org.apache.zeppelin.scheduler.Job.run(Job.java:186)
at 
org.apache.zeppelin.scheduler.FIFOScheduler$1.run(FIFOScheduler.java:139)
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
ERROR [2018-03-13 19:15:26,476] ({pool-2-thread-2} Job.java[run]:188) - Job 
failed
org.apache.zeppelin.interpreter.InterpreterException: 
java.lang.NullPointerException
at 
org.apache.zeppelin.spark.PySparkInterpreter.open(PySparkInterpreter.java:204)
at 
org.apache.zeppelin.interpreter.LazyOpenInterpreter.open(LazyOpenInterpreter.java:69)
at 
org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer$InterpretJob.jobRun(RemoteInterpreterServer.java:618)
at org.apache.zeppelin.scheduler.Job.run(Job.java:186)
at 

[jira] [Created] (ZEPPELIN-3293) extraneous lines if data in a column has "\n" character

2018-03-05 Thread Ruslan Dautkhanov (JIRA)
Ruslan Dautkhanov created ZEPPELIN-3293:
---

 Summary: extraneous lines if data in a column has "\n" character
 Key: ZEPPELIN-3293
 URL: https://issues.apache.org/jira/browse/ZEPPELIN-3293
 Project: Zeppelin
  Issue Type: Bug
  Components: front-end
Affects Versions: 0.8.0, 0.9.0
Reporter: Ruslan Dautkhanov


|Apache Zeppelin uses angular-ui/ui-grid for data tables visualization.
If data has "\n" characters present, extra table lines are present.
[!https://user-images.githubusercontent.com/3013418/36998541-7e6a4fe8-207a-11e8-8ea2-40ad9d1773cb.png!|https://user-images.githubusercontent.com/3013418/36998541-7e6a4fe8-207a-11e8-8ea2-40ad9d1773cb.png]
Is there is an option to switch this behvior off. We think a better options 
could be: * display "\n" as whitespace;
 * display "\n" as "\n" (backslash, "n" character).

Thanks.|
 
[https://github.com/angular-ui/ui-grid/issues/6589]
 
cc [~prabhjyotsi...@apache.com]
 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ZEPPELIN-3292) angular-ui/ui-grid duplicates rows if "

2018-03-05 Thread Ruslan Dautkhanov (JIRA)
Ruslan Dautkhanov created ZEPPELIN-3292:
---

 Summary: angular-ui/ui-grid duplicates rows if "
 Key: ZEPPELIN-3292
 URL: https://issues.apache.org/jira/browse/ZEPPELIN-3292
 Project: Zeppelin
  Issue Type: Bug
Reporter: Ruslan Dautkhanov






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ZEPPELIN-3286) Run All Paragraphs stops if there is a disabled paragraph mid-run

2018-03-02 Thread Ruslan Dautkhanov (JIRA)
Ruslan Dautkhanov created ZEPPELIN-3286:
---

 Summary: Run All Paragraphs stops if there is a disabled paragraph 
mid-run
 Key: ZEPPELIN-3286
 URL: https://issues.apache.org/jira/browse/ZEPPELIN-3286
 Project: Zeppelin
  Issue Type: Bug
Affects Versions: 0.7.3, 0.8.0, 0.9.0
Reporter: Ruslan Dautkhanov


Run All Paragraphs runs fine until it reaches a disabled paragraph in the 
middle of a run.

How it should be processed: disabled paragraphs should be just skipped.

What happens: disabled paragraph doesn't run (as expected), but it doesn't run 
non-disabled paragraphs after that as well (this is the bug). 

Also it seems make Zeppelin not able to run any paragraphs at all (even 
manually).. it's in some stuck state.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ZEPPELIN-3282) pyspark as a Default Interpreter

2018-03-02 Thread Ruslan Dautkhanov (JIRA)
Ruslan Dautkhanov created ZEPPELIN-3282:
---

 Summary: pyspark as a Default Interpreter
 Key: ZEPPELIN-3282
 URL: https://issues.apache.org/jira/browse/ZEPPELIN-3282
 Project: Zeppelin
  Issue Type: New Feature
  Components: front-end, zeppelin-server
Affects Versions: 0.8.0
Reporter: Ruslan Dautkhanov
 Attachments: image-2018-03-02-13-26-39-913.png

I can't choose PySpark as a default Interpreter when creating a new note.. is 
this a new bug or am I missing some configuration?

 

!image-2018-03-02-13-26-39-913.png!



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ZEPPELIN-3272) Upgrade angular-ui/ui-grid

2018-02-27 Thread Ruslan Dautkhanov (JIRA)
Ruslan Dautkhanov created ZEPPELIN-3272:
---

 Summary: Upgrade angular-ui/ui-grid
 Key: ZEPPELIN-3272
 URL: https://issues.apache.org/jira/browse/ZEPPELIN-3272
 Project: Zeppelin
  Issue Type: Bug
  Components: front-end
Affects Versions: 0.7.3, 0.8.0
Reporter: Ruslan Dautkhanov


As discussed in comments of ZEPPELIN-3238:
  
 [~prabhjyotsingh] added a comment - 6 days ago
 This sounds like an interesting exception, let me explore this
{code:java}
TypeError: Cannot read property 'options' of undefined
at Object.shown (ui-grid.js:19094)
at o.t.itemShown (ui-grid.js:2359)
at fn (eval at compile (vendor.cfb12f83ec630b56.js:39), :4:218)
at o.$digest (vendor.cfb12f83ec630b56.js:38)
at o.$apply (vendor.cfb12f83ec630b56.js:38)
at HTMLDivElement. (vendor.cfb12f83ec630b56.js:40)
at HTMLDivElement.dispatch (vendor.cfb12f83ec630b56.js:30)
at HTMLDivElement.q.handle (vendor.cfb12f83ec630b56.js:30){code}
According to [https://github.com/angular-ui/ui-grid/issues/6578], this is a 
known issue in angular-ui/ui-grid 4.0.11. It was resolved in 4.1.0.

[~prabhjyotsingh] would be great to consider upgrading Zeppelin to 4.1.0 or 
really any version after that as they have a lot of other bug-fixes:

[https://github.com/angular-ui/ui-grid/blob/master/CHANGELOG.md] :

 
{panel}
h3. v4.2.4 (2018-02-07)
 * *uiGridAutoResize:* Asking for grid $elm sizing in a digest loop always 
triggers {{refresh}}, not cond
  

h3. v4.2.3 (2018-02-02)
 * *exporter:* Fix bug where selection column width was included
 * *importer.js:* Remove unnecessary on destroy event.
 * *selection.js:* Allow selection in tables that use grouping (#6556)
 * *ui-grid-header-cell:* Improved styles with grid menu.
  

h3. v4.2.2 (2018-01-17)
 * *gridEdit:* Fixing scrollToFocus issues.

h3. v4.2.1 (2018-01-17)
 * *GridRenderContainer:* Fixing scrollbar styles.
 * *gridEdit:* Fixing issues with focus and grid edit.
 * *importer:* Fix console error on opening grid menu.
 * *menus:* Switching applyAsync for timeout.
  

h3. v4.2.0 (2018-01-15)
 * *build:* Fixing build failure due to poor updates.
 * *cellnav:* Replace $timeout with $applyAsync.
 * *docs:* Fix broken docs.
 * *edit:* Replace $timeout with $applyAsync.
 * *infinite-scroll:* Replace $timeout with $applyAsync.
 * *lang:* Update Polish translations.
 * *move-columns:* Replace $timeout with $applyAsync.
 * *resize-columns:* Replace $timeout with $applyAsync.
 * *tutorial:* Updating some tutorial examples.
  

h3. v4.1.3 (2017-12-23)
 * *protractor:* Improving reliability of protractor tests and ensuring they 
can run at a basic l
 * *uiGridAutoResize:* Changed [0].clientHeight to gridUtil.elementHe... 
(#6490) 2])
 * *GridRenderContainer.js:* Fix bug of space to the right of the last column 
(#6371)
 * *ui-grid.html:* Fix bug with template for last row's bottom border (#4413)
  

h3. v4.1.2 (2017-12-21)
 * *tutorial:* Replacing .success with .then due to angular upgrade.

h3. v4.1.1 (2017-12-20)
 * *ui-grid.info:* Updating ui-grid.info to support angular 1.6.7
  

h3. [v4.1.0 (2017-12-18)
 * *exporter:*
 ** fix issue #6019 errors while opening grid menu with exporter service
 ** Excel export with npm instructions, fix error on menu and more 
examples{panel}
 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ZEPPELIN-3270) Spark interpreter - job progress is not shown nor a link to Spark driver UI

2018-02-26 Thread Ruslan Dautkhanov (JIRA)
Ruslan Dautkhanov created ZEPPELIN-3270:
---

 Summary: Spark interpreter - job progress is not shown nor a link 
to Spark driver UI
 Key: ZEPPELIN-3270
 URL: https://issues.apache.org/jira/browse/ZEPPELIN-3270
 Project: Zeppelin
  Issue Type: Bug
Affects Versions: 0.8.0
Reporter: Ruslan Dautkhanov
 Attachments: image-2018-02-26-21-34-19-661.png

New Zeppelin doesn't show progress bar, nor shows a link to Spark Driver UI as 
it used to.

When I manually go to Spark Driver UI, I see job is running, has a running 
stages etc.

Not sure how it became broken.

See screenshot below (no actual code shown - it's basically running a spark 
file to data frame logic) :

!image-2018-02-26-21-34-19-661.png!



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ZEPPELIN-3268) Zeppelin version doesn't show in pop-up

2018-02-26 Thread Ruslan Dautkhanov (JIRA)
Ruslan Dautkhanov created ZEPPELIN-3268:
---

 Summary: Zeppelin version doesn't show in pop-up
 Key: ZEPPELIN-3268
 URL: https://issues.apache.org/jira/browse/ZEPPELIN-3268
 Project: Zeppelin
  Issue Type: Bug
  Components: front-end
Affects Versions: 0.8.0
Reporter: Ruslan Dautkhanov
 Attachments: image-2018-02-26-11-15-52-284.png

See example below

 

!image-2018-02-26-11-15-52-284.png!

 

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ZEPPELIN-3260) iPython shell (! magic command) doesn't print

2018-02-23 Thread Ruslan Dautkhanov (JIRA)
Ruslan Dautkhanov created ZEPPELIN-3260:
---

 Summary: iPython shell (! magic command) doesn't print
 Key: ZEPPELIN-3260
 URL: https://issues.apache.org/jira/browse/ZEPPELIN-3260
 Project: Zeppelin
  Issue Type: Bug
  Components: Core, front-end, pySpark, python-interpreter
Affects Versions: 0.8.0, 0.9.0
Reporter: Ruslan Dautkhanov
 Attachments: image-2018-02-23-10-38-07-304.png

 
{code:java}
%ipyspark
! !echo 1; sleep 1; echo 2
{code}
uses ipython's shell magic command (! - exclamation mark), but in Zeppelin it 
prints just two empty lines:

!image-2018-02-23-10-38-07-304.png!

Should have printed two lines with "1" and "2" in them.

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ZEPPELIN-3257) "Show line numbers" doesn't show current line

2018-02-22 Thread Ruslan Dautkhanov (JIRA)
Ruslan Dautkhanov created ZEPPELIN-3257:
---

 Summary: "Show line numbers" doesn't show current line
 Key: ZEPPELIN-3257
 URL: https://issues.apache.org/jira/browse/ZEPPELIN-3257
 Project: Zeppelin
  Issue Type: Bug
  Components: front-end
Affects Versions: 0.8.0
Reporter: Ruslan Dautkhanov
 Attachments: image-2018-02-22-21-36-17-677.png

Not super critical - but looks a bit odd in the new Zeppelin current line 
number is missing.

Notice current line 5 has no number. 

This is a regression from previous Zeppelin release.

 

!image-2018-02-22-21-36-17-677.png!



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ZEPPELIN-3256) ipython backend: capture when backend ipython process dies

2018-02-22 Thread Ruslan Dautkhanov (JIRA)
Ruslan Dautkhanov created ZEPPELIN-3256:
---

 Summary: ipython backend: capture when backend ipython process dies
 Key: ZEPPELIN-3256
 URL: https://issues.apache.org/jira/browse/ZEPPELIN-3256
 Project: Zeppelin
  Issue Type: Bug
  Components: Core, pySpark, python-interpreter, zeppelin-interpreter
Affects Versions: 0.7.3, 0.8.0
Reporter: Ruslan Dautkhanov


Using `quit()` in the new ipython interpreter backend leads to ipython backend 
.. exiting, and a new paragraph run gets stuck in 'RUNNING' indefinitely, or at 
least until pySpark interpreter is restarted.

Two suggestions:
 # Ignore `quit()` calls 
 # More importantly - capture when IPython backend process dies (for this any 
other) reason so Spark interpreter would know it has to start a new session, 
and so it would also not show misleading 'RUNNING' state indefinitely on the 
front-end.

First one might be easy to fix using something `def quit(): pass` or something 
as soon as ipython process starts.

But again more importantly it would be great to capture and recognize events 
when ipython process exits or dies for some reason and pass this information up 
to Spark interpreter.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ZEPPELIN-3253) `Tab` key indentation is broken in paragraph editor

2018-02-21 Thread Ruslan Dautkhanov (JIRA)
Ruslan Dautkhanov created ZEPPELIN-3253:
---

 Summary: `Tab` key indentation is broken in paragraph editor
 Key: ZEPPELIN-3253
 URL: https://issues.apache.org/jira/browse/ZEPPELIN-3253
 Project: Zeppelin
  Issue Type: Bug
  Components: front-end
Affects Versions: 0.7.3, 0.8.0, 0.9.0
Reporter: Ruslan Dautkhanov


`Tab` key used to indent Python code (like in most other IDEs).
It was working correctly in previous Zeppelin releases, including 0.7.3.

It's interesting `Shift-Tab` still de-indents code correctly.

So there is an inconsistency now between `Tab` and `Shift-Tab`.

Would be great to have this resolved.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ZEPPELIN-3251) slow rendering on very wide datasets - display columns only when necessary

2018-02-20 Thread Ruslan Dautkhanov (JIRA)
Ruslan Dautkhanov created ZEPPELIN-3251:
---

 Summary: slow rendering on very wide datasets - display columns 
only when necessary
 Key: ZEPPELIN-3251
 URL: https://issues.apache.org/jira/browse/ZEPPELIN-3251
 Project: Zeppelin
  Issue Type: Improvement
  Components: front-end
Affects Versions: 0.8.0, 0.7.4, 0.9.0
Reporter: Ruslan Dautkhanov


Apache Zeppelin recently upgraded to angular ui-grid - that looks awesome!

What we're facing though is on wider datasets Chrome sometimes spends minutes 
to render a table.

The table doesn't have a lot of data (it's very sparsely populated), so it 
seems just the number of columns is what important.

Maybe more of an extreme case - 5000 columns x 1000 rows recent Chrome spent 
many minutes to render on a decent PC. That's not a make-up case and we 
actually had to display such a wide sparsely populated table. (Our company 
normally deals with wider datasets)

Even on not so wide datasets we found browser rendering could be lagging..

*Possible solution: shows rows/columns lazily on scrolling events*.

As an example, Cloudera HUE worked around displaying such wide datasets by 
adding "virtual" horizontal scrolling - basically columns are pushed to an html 
table only when they have to be visualized (if a user has pushed horizontal 
scrolling to the right .. and efficiently removing columns from html when 
they're not necessary for visualization). Cloudera Hue can render such wide 
datasets very quickly because browser has to render only a tiny subset of 
columns that it actually needs.





--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ZEPPELIN-3247) Restart grpc stream for each paragraph run

2018-02-19 Thread Ruslan Dautkhanov (JIRA)
Ruslan Dautkhanov created ZEPPELIN-3247:
---

 Summary: Restart grpc stream for each paragraph run
 Key: ZEPPELIN-3247
 URL: https://issues.apache.org/jira/browse/ZEPPELIN-3247
 Project: Zeppelin
  Issue Type: Bug
  Components: Core, pySpark, python-interpreter, zeppelin-interpreter, 
zeppelin-server
Affects Versions: 0.7.3, 0.8.0
Reporter: Ruslan Dautkhanov


Please read [https://github.com/grpc/grpc-java/issues/4086] for details
|The RPC (a.k.a. the stream) cannot continue, but you can start a new RPC. The 
Channel will still work. You can start a new RPC on the existing channel. 
Without seeing the code I don't know why a failed RPC would fail the whole 
application, but that isn't the intended behavior.|
 
It seems spark interpreter --> grpc --> ipython backend is currently somewhat 
brittle as any exception stops grpc stream [1].
 
Would it be possible to adjust code in ipython logic to restart grpc stream for 
each paragraph run ? To make new ipython logic more robust.
 
 
 
 
[1]
 
{quote}INFO [2018-02-14 10:39:10,923] (\{grpc-default-worker-ELG-1-2} 
AbstractClientStream2.java[inboundDataReceived]:249) - Received data on closed 
stream
INFO [2018-02-14 10:39:10,924] (\{grpc-default-worker-ELG-1-2} 
AbstractClientStream2.java[inboundDataReceived]:249) - Received data on closed 
stream
INFO [2018-02-14 10:39:10,925] (\{grpc-default-worker-ELG-1-2} 
AbstractClientStream2.java[inboundDataReceived]:249) - Received data on closed 
stream
{quote}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ZEPPELIN-3242) Listener threw an exception java.lang.NPEat o.a.zeppelin.spark.Utils.getNoteId(Utils.java:156)

2018-02-16 Thread Ruslan Dautkhanov (JIRA)
Ruslan Dautkhanov created ZEPPELIN-3242:
---

 Summary: Listener threw an exception java.lang.NPEat 
o.a.zeppelin.spark.Utils.getNoteId(Utils.java:156)
 Key: ZEPPELIN-3242
 URL: https://issues.apache.org/jira/browse/ZEPPELIN-3242
 Project: Zeppelin
  Issue Type: Bug
Affects Versions: 0.7.3, 0.8.0
Reporter: Ruslan Dautkhanov


  
{code:java}
INFO [2018-02-16 17:59:50,518] ({Thread-52} Logging.scala[logInfo]:54) - 
Post-Scan Filters:
 INFO [2018-02-16 17:59:50,521] ({Thread-52} Logging.scala[logInfo]:54) - 
Output Data Schema: struct<>
 INFO [2018-02-16 17:59:50,522] ({Thread-52} Logging.scala[logInfo]:54) - 
Pushed Filters:
 INFO [2018-02-16 17:59:50,668] ({Thread-52} Logging.scala[logInfo]:54) - Block 
broadcast_15 stored as values in memory (estimated size 347.7 KB, free 5.2 GB)
 INFO [2018-02-16 17:59:50,687] ({Thread-52} Logging.scala[logInfo]:54) - Block 
broadcast_15_piece0 stored as bytes in memory (estimated size 34.3 KB, free 5.2 
GB)
 INFO [2018-02-16 17:59:50,687] ({dispatcher-event-loop-7} 
Logging.scala[logInfo]:54) - Added broadcast_15_piece0 in memory on 
10.20.32.57:15295 (size: 34.3 KB, free: 5.2 GB)
 INFO [2018-02-16 17:59:50,688] ({Thread-52} Logging.scala[logInfo]:54) - 
Created broadcast 15 from count at NativeMethodAccessorImpl.java:0
 INFO [2018-02-16 17:59:50,688] ({Thread-52} Logging.scala[logInfo]:54) - 
Planning scan with bin packing, max size: 4194304 bytes, open cost is 
considered as scanning 4194304 bytes.
 INFO [2018-02-16 17:59:50,705] ({Thread-52} Logging.scala[logInfo]:54) - 
Starting job: count at NativeMethodAccessorImpl.java:0
 INFO [2018-02-16 17:59:50,706] ({dag-scheduler-event-loop} 
Logging.scala[logInfo]:54) - Registering RDD 47 (count at 
NativeMethodAccessorImpl.java:0)
 INFO [2018-02-16 17:59:50,707] ({dag-scheduler-event-loop} 
Logging.scala[logInfo]:54) - Got job 5 (count at 
NativeMethodAccessorImpl.java:0) with 1 output partitions
 INFO [2018-02-16 17:59:50,707] ({dag-scheduler-event-loop} 
Logging.scala[logInfo]:54) - Final stage: ResultStage 11 (count at 
NativeMethodAccessorImpl.java:0)
 INFO [2018-02-16 17:59:50,707] ({dag-scheduler-event-loop} 
Logging.scala[logInfo]:54) - Parents of final stage: List(ShuffleMapStage 10)
 INFO [2018-02-16 17:59:50,707] ({dag-scheduler-event-loop} 
Logging.scala[logInfo]:54) - Missing parents: List()
 INFO [2018-02-16 17:59:50,707] ({dag-scheduler-event-loop} 
Logging.scala[logInfo]:54) - Submitting ResultStage 11 (MapPartitionsRDD[50] at 
count at NativeMethodAccessorImpl.java:0), which has no missing parents
 INFO [2018-02-16 17:59:50,709] ({dag-scheduler-event-loop} 
Logging.scala[logInfo]:54) - Block broadcast_16 stored as values in memory 
(estimated size 7.0 KB, free 5.2 GB)
ERROR [2018-02-16 17:59:50,710] ({SparkListenerBus} Logging.scala[logError]:91) 
- Listener threw an exception
java.lang.NullPointerException
 at org.apache.zeppelin.spark.Utils.getNoteId(Utils.java:156)
 at 
org.apache.zeppelin.spark.NewSparkInterpreter$1.onJobStart(NewSparkInterpreter.java:225)
 at 
org.apache.spark.scheduler.SparkListenerBus$class.doPostEvent(SparkListenerBus.scala:37)
 at 
org.apache.spark.scheduler.LiveListenerBus.doPostEvent(LiveListenerBus.scala:36)
 at 
org.apache.spark.scheduler.LiveListenerBus.doPostEvent(LiveListenerBus.scala:36)
 at org.apache.spark.util.ListenerBus$class.postToAll(ListenerBus.scala:63)
 at 
org.apache.spark.scheduler.LiveListenerBus.postToAll(LiveListenerBus.scala:36)
 at 
org.apache.spark.scheduler.LiveListenerBus$$anon$1$$anonfun$run$1$$anonfun$apply$mcV$sp$1.apply$mcV$sp(LiveListenerBus.scala:94)
 at 
org.apache.spark.scheduler.LiveListenerBus$$anon$1$$anonfun$run$1$$anonfun$apply$mcV$sp$1.apply(LiveListenerBus.scala:79)
 at 
org.apache.spark.scheduler.LiveListenerBus$$anon$1$$anonfun$run$1$$anonfun$apply$mcV$sp$1.apply(LiveListenerBus.scala:79)
 at scala.util.DynamicVariable.withValue(DynamicVariable.scala:58)
 at 
org.apache.spark.scheduler.LiveListenerBus$$anon$1$$anonfun$run$1.apply$mcV$sp(LiveListenerBus.scala:78)
 at org.apache.spark.util.Utils$.tryOrStopSparkContext(Utils.scala:1279)
 at 
org.apache.spark.scheduler.LiveListenerBus$$anon$1.run(LiveListenerBus.scala:77)
 INFO [2018-02-16 17:59:50,714] ({dag-scheduler-event-loop} 
Logging.scala[logInfo]:54) - Block broadcast_16_piece0 stored as bytes in 
memory (estimated size 3.7 KB, free 5.2 GB)
 INFO [2018-02-16 17:59:50,715] ({dispatcher-event-loop-4} 
Logging.scala[logInfo]:54) - Added broadcast_16_piece0 in memory on 
10.20.32.57:15295 (size: 3.7 KB, free: 5.2 GB)
 
 
 
{code}
 

 

Notice
{code:java}
ERROR [2018-02-16 17:59:50,710] ({SparkListenerBus} Logging.scala[logError]:91) 
- Listener threw an exception
java.lang.NullPointerException
 at org.apache.zeppelin.spark.Utils.getNoteId(Utils.java:156)
 at 
org.apache.zeppelin.spark.NewSparkInterpreter$1.onJobStart(NewSparkInterpreter.java:225)
 at 

[jira] [Created] (ZEPPELIN-3239) unicode characters in an iPython paragraph makes Spark interpreter irrsponsive

2018-02-14 Thread Ruslan Dautkhanov (JIRA)
Ruslan Dautkhanov created ZEPPELIN-3239:
---

 Summary: unicode characters in an iPython paragraph makes Spark 
interpreter irrsponsive
 Key: ZEPPELIN-3239
 URL: https://issues.apache.org/jira/browse/ZEPPELIN-3239
 Project: Zeppelin
  Issue Type: Bug
Reporter: Ruslan Dautkhanov
 Attachments: Zeppelin-iPython_para_with_Unicode.PNG

An unicode characters in an iPython paragraph makes Spark interpreter 
irresponsive.

To reproduce, type into a new %ipyspark code following phase (yes, it's not a 
valid python code but the imprtant part is that it has a long unicode dash 
character ):
{code}
One following unicide character makes ipythonInterpreter not responding to 
Cancel commands –  
{code}
 DEBUG interpreter log shows following:
{quote}DEBUG [2018-02-15 00:39:45,628] (\{pool-2-thread-2} 
IPythonClient.java[stream_execute]:87) - stream_execute code:
One following unicide character makes ipythonInterpreter not responding to 
Cancel commands –
DEBUG [2018-02-15 00:39:45,632] (\{Exec Stream Pumper} 
IPythonInterpreter.java[processLine]:388) - Process Output: 
ERROR:root:Exception iterating responses: 'ascii' codec can't encode character 
u'\u2013' in position 91: ordinal not in range(128)
DEBUG [2018-02-15 00:39:45,632] (\{Exec Stream Pumper} 
IPythonInterpreter.java[processLine]:388) - Process Output: Traceback (most 
recent call last):
DEBUG [2018-02-15 00:39:45,633] (\{Exec Stream Pumper} 
IPythonInterpreter.java[processLine]:388) - Process Output: File 
"/opt/cloudera/parcels/Anaconda/lib/python2.7/site-packages/grpc/_server.py", 
line 401, in _take_response_from_response_iterator
ERROR [2018-02-15 00:39:45,633] (\{grpc-default-executor-0} 
IPythonClient.java[onError]:138) - Fail to call IPython grpc
io.grpc.StatusRuntimeException: UNKNOWN: Exception iterating responses: 'ascii' 
codec can't encode character u'\u2013' in position 91: ordinal not in range(128)
 at io.grpc.Status.asRuntimeException(Status.java:543)
 at 
io.grpc.stub.ClientCalls$StreamObserverToCallListenerAdapter.onClose(ClientCalls.java:395)
 at io.grpc.internal.ClientCallImpl.closeObserver(ClientCallImpl.java:426)
 at io.grpc.internal.ClientCallImpl.access$100(ClientCallImpl.java:76)
 at 
io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl.close(ClientCallImpl.java:512)
 at 
io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl.access$700(ClientCallImpl.java:429)
 at 
io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl$1StreamClosed.runInContext(ClientCallImpl.java:544)
 at io.grpc.internal.ContextRunnable.run(ContextRunnable.java:52)
 at io.grpc.internal.SerializingExecutor.run(SerializingExecutor.java:117)
 at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
 at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
 at java.lang.Thread.run(Thread.java:748)
DEBUG [2018-02-15 00:39:45,633] (\{Exec Stream Pumper} 
IPythonInterpreter.java[processLine]:388) - Process Output: return 
next(response_iterator), True
DEBUG [2018-02-15 00:39:45,633] (\{Exec Stream Pumper} 
IPythonInterpreter.java[processLine]:388) - Process Output: File 
"/tmp/zeppelin_ipython1942535087961089556/ipython_server.py", line 54, in 
execute
DEBUG [2018-02-15 00:39:45,633] (\{Exec Stream Pumper} 
IPythonInterpreter.java[processLine]:388) - Process Output: print(request.code)
DEBUG [2018-02-15 00:39:45,634] (\{Exec Stream Pumper} 
IPythonInterpreter.java[processLine]:388) - Process Output: UnicodeEncodeError: 
'ascii' codec can't encode character u'\u2013' in position 91: ordinal not in 
range(128)
 INFO [2018-02-15 00:39:58,894] (\{dispatcher-event-loop-23} 
Logging.scala[logInfo]:54) - Registered executor 
NettyRpcEndpointRef(spark-client://Executor) (10.20.33.75:40434) with ID 2
{quote}
 

Notice 

"Process Output: UnicodeEncodeError: 'ascii' codec can't encode character 
u'\u2013' in position 91: ordinal not in range(128) "

So iPython interpreter breaks on presence of any unicode data.

!Zeppelin-iPython_para_with_Unicode.PNG!

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ZEPPELIN-3238) z.show() starts showing empty box on larger datasets

2018-02-14 Thread Ruslan Dautkhanov (JIRA)
Ruslan Dautkhanov created ZEPPELIN-3238:
---

 Summary: z.show() starts showing empty box on larger datasets
 Key: ZEPPELIN-3238
 URL: https://issues.apache.org/jira/browse/ZEPPELIN-3238
 Project: Zeppelin
  Issue Type: Bug
Affects Versions: 0.8.0
 Environment: this week's snapshot of master.
Reporter: Ruslan Dautkhanov
 Attachments: Zeppelin-empty_box.PNG

{code:java}
z.show(spark.sql(""" 
 with q as (select 1 a, 2 b, 3 c union all select 4,5,6 union all select 7,8,9 
union all select 10,11,12)
 select * from q cross join q cross join q cross join q cross join q -- cross 
join q cross join q 
 """
 ).toPandas()){code}
 - works correctly.

But
{code:java}
z.show(spark.sql(""" 
 with q as (select 1 a, 2 b, 3 c union all select 4,5,6 union all select 7,8,9 
union all select 10,11,12)
 select * from q cross join q cross join q cross join q cross join q cross join 
q 
 """
 ).toPandas())
{code}
shows an empty white box:

!Zeppelin-empty_box.PNG!

The only difference between the two options is size. In previous release of 
Zeppelin it would just show a table partially (whatever number of rows it could 
read in).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ZEPPELIN-3235) new UI grid displays an empty box if output is cut

2018-02-14 Thread Ruslan Dautkhanov (JIRA)
Ruslan Dautkhanov created ZEPPELIN-3235:
---

 Summary: new UI grid displays an empty box if output is cut
 Key: ZEPPELIN-3235
 URL: https://issues.apache.org/jira/browse/ZEPPELIN-3235
 Project: Zeppelin
  Issue Type: Bug
Affects Versions: 0.7.3, 0.8.0, 0.7.4
Reporter: Ruslan Dautkhanov


The new UI grid displays just an empty box when output is cut with a message 
like
{quote}Output is truncated to 102400 bytes. Learn more about  
ZEPPELIN_INTERPRETER_OUTPUT_LIMIT{quote}
It doesn't happen every time, I think it depends on how interpreter has cut the 
table?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ZEPPELIN-3234) z.show() compatibility with previous release

2018-02-14 Thread Ruslan Dautkhanov (JIRA)
Ruslan Dautkhanov created ZEPPELIN-3234:
---

 Summary: z.show() compatibility with previous release
 Key: ZEPPELIN-3234
 URL: https://issues.apache.org/jira/browse/ZEPPELIN-3234
 Project: Zeppelin
  Issue Type: Bug
Affects Versions: 0.7.3, 0.8.0
Reporter: Ruslan Dautkhanov


We've noticed two major issues with z.show() after upgrading Zeppelin
 
1)
z.show(df) used to work directly on spark dataframe object,
now it produces TypeError: object of type 'DataFrame' has no len()
Full exception stack in [1].
 
2)
We tried disabling ipython and it seems to be a workaround.
I there is a way to have compatibility with previous Zeppelin release on 
z.show() 
without disabling ipython altogether?
 
 
 
[1]
 
{quote}TypeErrorTraceback (most recent call last) 
 in ()
> 1 z.show(spark.sql('select * from disc_mrt.unified_fact')) 
 in show(self, p, **kwargs)
 73 # `isinstance(p, DataFrame)` would req `import pandas.core.frame.DataFrame`
 74 # and so a dependency on pandas
---> 75 self.show_dataframe(p, **kwargs)
 76 elif hasattr(p, '__call__'):
 77 p() #error reporting  in show_dataframe(self, 
df, show_index, **kwargs)
 80 """Pretty prints DF using Table Display System 81 """ ---> 82 limit = 
len(df) > self.max_result 83 header_buf = StringIO("")
 84 if show_index: TypeError: object of type 'DataFrame' has no len()

{quote}
 
 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ZEPPELIN-2931) JS memory/object leak makes Zeppelin rendering in browsing lag

2017-09-13 Thread Ruslan Dautkhanov (JIRA)
Ruslan Dautkhanov created ZEPPELIN-2931:
---

 Summary: JS memory/object leak makes Zeppelin rendering in 
browsing lag
 Key: ZEPPELIN-2931
 URL: https://issues.apache.org/jira/browse/ZEPPELIN-2931
 Project: Zeppelin
  Issue Type: Bug
Affects Versions: 0.7.2, 0.7.3, 0.8.0
Reporter: Ruslan Dautkhanov
Priority: Critical


>From the Zeppelin users list:

{quote}
Mid-size notebooks lag in browser rendering / responding to simple navigation 
requests. 
I've seen many times that once notebooks grow up above a "small" size, 
rendering lags can happen. Running latest Chrome with a decent Intel i7 
processor. For example, a notebook I am working on has around 35 paragraphs and 
is just 145kb note.json (there are no big data tables - nothhing crazy), starts 
lagging quite a bit which doesn't help with UX.
{quote}

Response from [~pbrenner]:
{quote}
I experience this regularly. 

I have found that leaving these larger notebooks open in chrome can cause lag 
to grow worse over time. Sometimes I can bring lag back down to a manageable 
level by closing the tab completely and loading the notebook from a new tab. 
Clearing output does not seem to help, so like your colleagues I’ll sometimes 
resort to splitting to a separate notebook. 
{quote}

Further observation:
{quote}
It's interesting you mentioned that restarting a tab helps.

I just went to Chrome at it was showing that same Zeppelin tab consumes 2Gb+ of 
memory, 1Gb+ of which are JS objects. I tried to take a heap snapshot in Chrome 
devtools, but Chrome rendering engine crashed so had to restart that tab. After 
tab was restarted, that tab's heap consumption went down to just 82Mb - and it 
renders now without lags! It looks to me some sort of JS objects / JS memory 
leak. 
{quote}

After I restarted tab and left it running for 12 hours, memory consumption of 
the Tab went up to 400Mb, 122Mb of which are JS related:


It may take a few days to go back to 2Gb/1Gb as it was before.




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (ZEPPELIN-2886) Make usernames case insensitive

2017-08-28 Thread Ruslan Dautkhanov (JIRA)
Ruslan Dautkhanov created ZEPPELIN-2886:
---

 Summary: Make usernames case insensitive
 Key: ZEPPELIN-2886
 URL: https://issues.apache.org/jira/browse/ZEPPELIN-2886
 Project: Zeppelin
  Issue Type: Bug
Affects Versions: 0.8.0
Reporter: Ruslan Dautkhanov
Priority: Critical


It would be great if Zeppelin would normalizes case of usernames after 
authentication (let's say, to lower case).

We noticed user sometimes authenticate using different case (for example, 
john@epsilon.com , next time john@epsilon.com / john@epsilon.com 
etc). 

Apparently authentication backends don't case about case (for example, we use 
LDAP).
It breaks at least one thing - `notebook-authorization.json`'s `owners` field 
with different cases and users don't see their own notebooks - even though they 
are in fact are still owners. They can only see a subset of notebooks, that 
they created when logged in with exact same username.

Would be nice if Zeppelin normalize case, after user successfully authenticates 
in.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (ZEPPELIN-2703) Drop down user's interpreter uid to authenticated user's uid

2017-06-28 Thread Ruslan Dautkhanov (JIRA)
Ruslan Dautkhanov created ZEPPELIN-2703:
---

 Summary: Drop down user's interpreter uid to authenticated user's 
uid
 Key: ZEPPELIN-2703
 URL: https://issues.apache.org/jira/browse/ZEPPELIN-2703
 Project: Zeppelin
  Issue Type: Improvement
Affects Versions: 0.7.2, 0.7.0, 0.8.0
Reporter: Ruslan Dautkhanov
Priority: Critical


Would be great if Zeppelin would launch user's Zeppelin interpreter processes 
under their own uid through setuid() call. 

So then keytabs could be locked down to be accessible to that one user. 

For example, after I LDAP-authenticated as "tagar" user, Zeppelin will drop 
down uid to tagar user and its keytab will have unix access bits set to 0600.

As suggested on 
[PR-2407|https://github.com/apache/zeppelin/pull/2407#issuecomment-311485194] 
for ZEPPELIN-1907.

Another advantage is that for example, user's shell interpreter would find ~ to 
be correct user's home directory, not a shared service accounts' home directory.

Notice, that setuid() doesn't require Zeppelin to run as root user. It's only 
required to set CAP_SETUID Linux capability on the executable so Zeppelin 
server can change user's interpreter processes from Zeppelin's service 
account's uid to that specific user's uid. 




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (ZEPPELIN-2506) "Interpreter null not found" when a note is not bound to any interpreters

2017-05-04 Thread Ruslan Dautkhanov (JIRA)
Ruslan Dautkhanov created ZEPPELIN-2506:
---

 Summary: "Interpreter null not found" when a note is not bound to 
any interpreters
 Key: ZEPPELIN-2506
 URL: https://issues.apache.org/jira/browse/ZEPPELIN-2506
 Project: Zeppelin
  Issue Type: Bug
Affects Versions: 0.7.1, 0.7.2, 0.8.0
Reporter: Ruslan Dautkhanov


When we setup a new user or upgrade a note, interpreter.json file is copied 
from a template file. So "interpreterBindings" is empty or irrelevant to notes 
that an upgraded user has. It causes following exception:

{noformat}
org.apache.zeppelin.interpreter.InterpreterException: 
paragraph_1491878547585_-790720711's Interpreter null not found
at org.apache.zeppelin.notebook.Note.run(Note.java:625)
at 
org.apache.zeppelin.socket.NotebookServer.persistAndExecuteSingleParagraph(NotebookServer.java:1781)
at 
org.apache.zeppelin.socket.NotebookServer.runParagraph(NotebookServer.java:1741)
at 
org.apache.zeppelin.socket.NotebookServer.onMessage(NotebookServer.java:288)
{noformat}

It would be better to show error message *that the note is not bound to any 
notes, and not let execute any paragraphs until this is fixed*.

If a paragraph has an explicit interpreter type defined, then this error is 
slightly different but this does not change the matter 

{noformat}
paragraph_1491878547585_-790720711's Interpreter pyspark not found
org.apache.zeppelin.interpreter.InterpreterException: 
paragraph_1491878547585_-790720711's Interpreter pyspark not found
at org.apache.zeppelin.notebook.Note.run(Note.java:625)
at 
org.apache.zeppelin.socket.NotebookServer.persistAndExecuteSingleParagraph(NotebookServer.java:1781)
at 
org.apache.zeppelin.socket.NotebookServer.runParagraph(NotebookServer.java:1741)
at 
org.apache.zeppelin.socket.NotebookServer.onMessage(NotebookServer.java:288)
at 
org.apache.zeppelin.socket.NotebookSocket.onWebSocketText(NotebookSocket.java:59)
at 
org.eclipse.jetty.websocket.common.events.JettyListenerEventDriver.onTextMessage(JettyListenerEventDriver.java:128)
at 
org.eclipse.jetty.websocket.common.message.SimpleTextMessage.messageComplete(SimpleTextMessage.java:69)
at 
org.eclipse.jetty.websocket.common.events.AbstractEventDriver.appendMessage(AbstractEventDriver.java:65)
at 
org.eclipse.jetty.websocket.common.events.JettyListenerEventDriver.onTextFrame(JettyListenerEventDriver.java:122)
at 
org.eclipse.jetty.websocket.common.events.AbstractEventDriver.incomingFrame(AbstractEventDriver.java:161)
at 
org.eclipse.jetty.websocket.common.WebSocketSession.incomingFrame(WebSocketSession.java:309)
at 
org.eclipse.jetty.websocket.common.extensions.ExtensionStack.incomingFrame(ExtensionStack.java:214)
at 
org.eclipse.jetty.websocket.common.Parser.notifyFrame(Parser.java:220)
at org.eclipse.jetty.websocket.common.Parser.parse(Parser.java:258)
at 
org.eclipse.jetty.websocket.common.io.AbstractWebSocketConnection.readParse(AbstractWebSocketConnection.java:632)
at 
org.eclipse.jetty.websocket.common.io.AbstractWebSocketConnection.onFillable(AbstractWebSocketConnection.java:480)
at 
org.eclipse.jetty.io.AbstractConnection$2.run(AbstractConnection.java:544)
at 
org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:635)
at 
org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:555)
at java.lang.Thread.run(Thread.java:745)
{noformat}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (ZEPPELIN-2444) when timeout reached, change "Paragraph received a SIGTERM. ExitValue: 143" to a more user-friendly message

2017-04-23 Thread Ruslan Dautkhanov (JIRA)
Ruslan Dautkhanov created ZEPPELIN-2444:
---

 Summary: when timeout reached, change "Paragraph received a 
SIGTERM. ExitValue: 143" to a more user-friendly message
 Key: ZEPPELIN-2444
 URL: https://issues.apache.org/jira/browse/ZEPPELIN-2444
 Project: Zeppelin
  Issue Type: Improvement
Affects Versions: 0.7.1, 0.7.0, 0.8.0
Reporter: Ruslan Dautkhanov


By default %sh has 1 minute (60 seconds) timeout to run a command.
If timeout reached, command aborted with error `Paragraph received a SIGTERM. 
ExitValue: 143`.

Exception stack in log file:
{noformat}
ERROR [2017-04-23 21:31:17,578] ({pool-2-thread-3} 
ShellInterpreter.java[interpret]:97) - Can not run hadoop fs -put /vol/srcfile 
/user/rdautkha/dstfile
org.apache.commons.exec.ExecuteException: Process exited with an error: 143 
(Exit value: 143)
at 
org.apache.commons.exec.DefaultExecutor.executeInternal(DefaultExecutor.java:404)
at 
org.apache.commons.exec.DefaultExecutor.execute(DefaultExecutor.java:166)
at 
org.apache.commons.exec.DefaultExecutor.execute(DefaultExecutor.java:153)
at 
org.apache.zeppelin.shell.ShellInterpreter.interpret(ShellInterpreter.java:91)
at 
org.apache.zeppelin.interpreter.LazyOpenInterpreter.interpret(LazyOpenInterpreter.java:94)
at 
org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer$InterpretJob.jobRun(RemoteInterpreterServer.java:495)
at org.apache.zeppelin.scheduler.Job.run(Job.java:181)
at 
org.apache.zeppelin.scheduler.ParallelScheduler$JobRunner.run(ParallelScheduler.java:162)
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:178)
at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:292)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
{noformat}

Please change 
error message {quote}Paragraph received a SIGTERM. ExitValue: 143{quote}
to {quote}Timeout of 60 secounds to run %sh paragraph reach. Operation 
aborted{quote}





--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (ZEPPELIN-2368) Option to run all paragraphs *sequentially*

2017-04-06 Thread Ruslan Dautkhanov (JIRA)
Ruslan Dautkhanov created ZEPPELIN-2368:
---

 Summary: Option to run all paragraphs *sequentially*
 Key: ZEPPELIN-2368
 URL: https://issues.apache.org/jira/browse/ZEPPELIN-2368
 Project: Zeppelin
  Issue Type: Improvement
  Components: Core, zeppelin-server
Affects Versions: 0.7.1, 0.8.0
Reporter: Ruslan Dautkhanov


A user on zeppelin's user email list:
{quote}
I often have notebooks that have a %sh as the 1st paragraph. This scps some 
file from another server, and then a number of spark or sparksql paragraphs are 
after that.

If I click on the run-all paragraphs at the top of the notebook the 1st %sh 
paragraph kicks off as expected, but the 2nd %spark notebook starts too at the 
same time. The others go into pending state and then start once the spark one 
has completed.

Is this a bug? Or am I doing something wrong?
{quote}

Quoting [~moon]
{quote}
That's expected behavior at the moment. The reason is

Each interpreter has it's own scheduler (either FIFO, Parallel), and run-all 
just submit all paragraphs into target interpreter's scheduler.

I think we can add feature such as run-all-sequentially.
{quote}




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (ZEPPELIN-2221) Spark jobs UI from the paragraph is broken in some cases

2017-03-06 Thread Ruslan Dautkhanov (JIRA)
Ruslan Dautkhanov created ZEPPELIN-2221:
---

 Summary: Spark jobs UI from the paragraph is broken in some cases
 Key: ZEPPELIN-2221
 URL: https://issues.apache.org/jira/browse/ZEPPELIN-2221
 Project: Zeppelin
  Issue Type: Bug
Affects Versions: 0.7.0, 0.8.0
 Environment: Zeppelin from 3/1 master snapshot
Reporter: Ruslan Dautkhanov


Discussed on PR-1663 for ZEPPELIN-1692
https://github.com/apache/zeppelin/pull/1663#issuecomment-283477396

When we click on any of Spark jobs UI from the paragraph links, 
link is leading to
{noformat}
http://hostname.domain.com:8088/proxy/application_1488384993892_0001/jobs/job/

and that error shows HTTP ERROR 400

Page reads

Problem accessing /jobs/job/.

Reason: requirement failed: Missing id parameter
Powered by Jetty://
{noformat}

A little bit more information - the link on paragprah leads to
{noformat}
http://10.20.32.57:28009/jobs/job?id=123

 Spark Driver UI.
Spark Driver web server then redirects to a link like I sent in my previous 
post,
and it misses job id in the redirected URL, i.e.

http://host.domain.com:8088/proxy/application_1488384993892_0044/jobs/job/
To your question on consistency - yes it happens in 100% cases, I never saw 
this new feature works for us.
Is this Spark version dependent or something? We're running CDH Spark 2.

{noformat}

According to https://github.com/apache/spark/pull/5947 URL format is different 
in YARN and non-YARN modes? Was PR-1663 for ZEPPELIN-1692 tested on both of 
these modes? Not sure what else might break those links.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (ZEPPELIN-2197) Interpreter Idle timeout

2017-02-28 Thread Ruslan Dautkhanov (JIRA)
Ruslan Dautkhanov created ZEPPELIN-2197:
---

 Summary: Interpreter Idle timeout
 Key: ZEPPELIN-2197
 URL: https://issues.apache.org/jira/browse/ZEPPELIN-2197
 Project: Zeppelin
  Issue Type: Bug
  Components: Core, Interpreters
Affects Versions: 0.7.0
Reporter: Ruslan Dautkhanov


It would be great to have ability to set Interpreter idle timeout - globally or 
per interpreter. So if interpreter isn't used for X amount of time, it'll be 
killed. 

Some of our users leave their Zeppelin instances idle and not used for days - 
that's great that they can jump right where they left off. Although in many 
cases we would like to kill just the interpreter processes (so the Zeppelin 
itself is still running). 

For example, for Spark Interpreter it also leaves idle spark applications that 
is consuming yarn resources. We do use yarn dynamic allocation but can't set 
spark.dynamicAllocation.minExecutors to 0.

In our cases idle timeout for interpreters should be somewhere between 12 and 
24 hours. So if a specific interpreter process was idle (not used) for that 
time, it'll be terminated.

As a nice side effect of this feature, if idle timeout timer will be set in an 
interpreter process itself, it may also help in issues like ZEPPELIN-1832 .

Thoughts?



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (ZEPPELIN-2068) Change credentials.json and interpreter.json access permission to 0600

2017-02-06 Thread Ruslan Dautkhanov (JIRA)
Ruslan Dautkhanov created ZEPPELIN-2068:
---

 Summary: Change credentials.json and interpreter.json access 
permission to 0600
 Key: ZEPPELIN-2068
 URL: https://issues.apache.org/jira/browse/ZEPPELIN-2068
 Project: Zeppelin
  Issue Type: Improvement
Affects Versions: 0.7.0
Reporter: Ruslan Dautkhanov
Priority: Critical


credentials.json and interpreter.json are created with default group-readable 
and world-readable permissions.

Both files can store passwords.

interpreter.json can store passwords, for example, if we have a custom 
repository - it'll be stored there clear text.

credentials.json obviously store passwords too

Please change default file permissions for credentials.json and 
interpreter.json to 0600.

Other users should not see clear text passwords.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (ZEPPELIN-1984) Zeppelin Server doesn't catch all exception when launching a new interpreter process

2017-01-19 Thread Ruslan Dautkhanov (JIRA)
Ruslan Dautkhanov created ZEPPELIN-1984:
---

 Summary: Zeppelin Server doesn't catch all exception when 
launching a new interpreter process
 Key: ZEPPELIN-1984
 URL: https://issues.apache.org/jira/browse/ZEPPELIN-1984
 Project: Zeppelin
  Issue Type: Bug
  Components: zeppelin-interpreter, zeppelin-server
Affects Versions: 0.7.0
 Environment: Zeppelin server from a month old master snapshot
Reporter: Ruslan Dautkhanov


We saw below exception stack when Zeppelin server tries to start a new 
interpreter process, for example, Spark interpreter. It was really hard to 
debug and the only way to capture real root cause, was to add 
{code}
LOG="/tmp/interpreter.sh-$$.log"
date >> $LOG
set -x
exec >> $LOG
exec 2>&1
{code} to $zeppelinhome/bin/interpreter.sh file
so all stdout and stderr from the interpreter.sh would go to that file.
So it showed real problem 
{noformat}
Exception in thread "main" org.apache.spark.SparkException: Keytab file: 
/home//.kt does not exist
at 
org.apache.spark.deploy.SparkSubmit$.prepareSubmitEnvironment(SparkSubmit.scala:555)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:158)
...
{noformat}
while all other Zeppelin logs and note output was showing misleading 
"Connection refused" - see below stack

{noformat}
ERROR [2017-01-18 16:54:38,533] ({pool-2-thread-2} 
NotebookServer.java[afterStatusChange]:1645) - Error
org.apache.zeppelin.interpreter.InterpreterException: 
org.apache.zeppelin.interpreter.InterpreterException: 
org.apache.thrift.transport.TTransportException: java.net.ConnectException: 
Connection refused
at 
org.apache.zeppelin.interpreter.remote.RemoteInterpreter.init(RemoteInterpreter.java:232)
at 
org.apache.zeppelin.interpreter.remote.RemoteInterpreter.getFormType(RemoteInterpreter.java:400)
at 
org.apache.zeppelin.interpreter.LazyOpenInterpreter.getFormType(LazyOpenInterpreter.java:105)
at org.apache.zeppelin.notebook.Paragraph.jobRun(Paragraph.java:316)
at org.apache.zeppelin.scheduler.Job.run(Job.java:176)
at 
org.apache.zeppelin.scheduler.RemoteScheduler$JobRunner.run(RemoteScheduler.java:329)
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
...
{noformat}

The issue might be that after interpreter.sh is started and exits right away - 
https://github.com/apache/zeppelin/blob/master/zeppelin-interpreter/src/main/java/org/apache/zeppelin/interpreter/remote/RemoteInterpreterManagedProcess.java#L121
 
this does not get captured anywhere. The only sign you'll see on Zeppelin side 
is "Connection refused" as Zeppelin wouldn't be able to connect to a new 
interpreter process. We saw different root causes (above error from 
spark-submit that keytab file doesn't exist is just one of them), and every 
time we had to add tracing into interpreter.sh to capture real problem.

We think there are two possible ways to improve that:
1) capture fact that interpreter.sh bails out (and don't try to connect in 
https://github.com/apache/zeppelin/blob/master/zeppelin-interpreter/src/main/java/org/apache/zeppelin/interpreter/remote/RemoteInterpreterManagedProcess.java#L132
 as it'll produce expected "Connection refused")
2) if one point 1) isn't possible for some reason (although I don't why that 
would be) - at least capture errors produced by interpreter.sh so error stack 
in Zeppelin log files and paragraph output that kicked off interpreter start 
would have some meaningful information.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (ZEPPELIN-1979) 'File size limit Exceeded' when importing notes - even for small files

2017-01-18 Thread Ruslan Dautkhanov (JIRA)
Ruslan Dautkhanov created ZEPPELIN-1979:
---

 Summary: 'File size limit Exceeded' when importing notes - even 
for small files
 Key: ZEPPELIN-1979
 URL: https://issues.apache.org/jira/browse/ZEPPELIN-1979
 Project: Zeppelin
  Issue Type: Bug
  Components: zeppelin-server
Affects Versions: 0.7.0
 Environment: Zeppelin 0.7.0 from ~11/28 master snapshot
Reporter: Ruslan Dautkhanov


'File size limit Exceeded' when importing notes - even for small files
This happens even for tiny files - a few Kb.

See screenshot.
>From the screenshot "JSON file size cannot exceed MB".
Notice there is no number between "exceed" and "MB".

I had my ZEPPELIN_WEBSOCKET_MAX_TEXT_MESSAGE_SIZE commented out in 
zeppelin-env.sh
Uncommented now and set to 
{code}
export ZEPPELIN_WEBSOCKET_MAX_TEXT_MESSAGE_SIZE=4096000
{code}
but still getting the same error message.

It now prevents us from importing any notebooks.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (ZEPPELIN-1832) Zombie Interpreter processes

2016-12-16 Thread Ruslan Dautkhanov (JIRA)
Ruslan Dautkhanov created ZEPPELIN-1832:
---

 Summary: Zombie Interpreter processes
 Key: ZEPPELIN-1832
 URL: https://issues.apache.org/jira/browse/ZEPPELIN-1832
 Project: Zeppelin
  Issue Type: Bug
  Components: Core, Interpreters, zeppelin-server
Affects Versions: 0.7.0
Reporter: Ruslan Dautkhanov


When we restart Zeppelin server (main process), in many cases, Interpreter 
process keeps running, essential becoming a zombie processes.

In case of Spark interpreter, it also holds SparkContext - consuming 
server-side resources too.

As discussed in users@ and dev@ mailing lists, other users have confirmed this 
problem too.

[~luciano resende]
{quote}
I have also seen similar issues even using zeppelin-ddeamon but didn't have 
much time to investigate the issue when it was happening to me.
{quote}

[~zjffdu]
{quote}
I believe I see this before too.
{quote}

[~b...@apache.org]
{quote}
Have similar experience, although hard to say what's the reason as all process 
supposed to killed, as Moon pointed out.
Also noticed that with `mvn tests`, after almost every run, there are 1-2 
zombie RemoteInterpreter processes hanging around.
{quote}

@ blaubaer 
{quote}
We are seeing this problem as well, regularly actually. Especially in
situations when we have many concurrent interpreters running.
{quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (ZEPPELIN-1814) Accessing Zeppelin's rest API through `z` variable

2016-12-14 Thread Ruslan Dautkhanov (JIRA)
Ruslan Dautkhanov created ZEPPELIN-1814:
---

 Summary: Accessing Zeppelin's rest API through `z` variable
 Key: ZEPPELIN-1814
 URL: https://issues.apache.org/jira/browse/ZEPPELIN-1814
 Project: Zeppelin
  Issue Type: Improvement
  Components: Core, Interpreters, rest-api, zeppelin-server, 
zeppelin-zengine
Affects Versions: 0.7.0
Reporter: Ruslan Dautkhanov


We'd like to have paragraph's code generated by a preceding paragraph.

For example, one of the use cases we have  is when %pyspark generates Hive 
DDLs. (can't run those in Spark in some cases)
Any chance an output of a paragraph can be redirected to a following paragraph?

I was thinking something like this could be used
https://zeppelin.apache.org/docs/latest/rest-api/rest-notebook.html#create-a-new-paragraph
{noformat}
http://[zeppelin-server]:[zeppelin-port]/api/notebook/[notebookId]/paragraph
{noformat}
But not sure if there is a easy way to call Zeppelin API directly through "z" 
variable?

Something like z.addParagraph(...) 

In most cases a paragraph generates a SQL code that can't be run directly as 
Spark SQL and has to be run by a different engine, for example, by Hive or by a 
JDBC backend. It's one of the use cases that we have, but I am sure there are 
many more use cases too.

Reply from [~moon] in email distribution list
{quote}
Although you can always create your function that call Zeppelin's rest API
to add paragraph, providing capability to add paragraph through 'z' (more 
precisely, from Interpreter) helps provide user more interactive usage of 
notebook i think.
{quote}





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (ZEPPELIN-1803) CSV export doesn't conform to RFC-4180: exported csv is broken in some cases

2016-12-13 Thread Ruslan Dautkhanov (JIRA)
Ruslan Dautkhanov created ZEPPELIN-1803:
---

 Summary: CSV export doesn't conform to RFC-4180: exported csv is 
broken in some cases
 Key: ZEPPELIN-1803
 URL: https://issues.apache.org/jira/browse/ZEPPELIN-1803
 Project: Zeppelin
  Issue Type: Bug
  Components: Core, front-end, zeppelin-server
Affects Versions: 0.7.0
Reporter: Ruslan Dautkhanov


CSV export doesn't conform to RFC-4180: exported csv is broken in some cases

RFC-4180:
{quote}
If double-quotes are used to enclose fields, then a double-quote appearing 
inside a field must be escaped by preceding it with another double quote.
{quote}

It makes CSV with double quotes (") exported from Zeppelin not importable by 
any tools, including Excel.

It looks like CSV export has other issues too, like in some cases exported 
column value was a negative number instead of a character field. It could be a 
new bug or related again to the fact that Zeppelin CSV exported doesn't conform 
to RFC-4180 standard.

Some related quotes from RFC-4180:
{noformat}
   5.  Each field may or may not be enclosed in double quotes (however
   some programs, such as Microsoft Excel, do not use double quotes
   at all).  If fields are not enclosed with double quotes, then
   double quotes may not appear inside the fields.  For example:

   "aaa","bbb","ccc" CRLF
   zzz,yyy,xxx

   6.  Fields containing line breaks (CRLF), double quotes, and commas
   should be enclosed in double-quotes.  For example:

   "aaa","b CRLF
   bb","ccc" CRLF
   zzz,yyy,xxx

   7.  If double-quotes are used to enclose fields, then a double-quote
   appearing inside a field must be escaped by preceding it with
   another double quote.  For example:

   "aaa","b""bb","ccc"
{noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (ZEPPELIN-1791) Bar Chart - Stacked setting isn't get persisted

2016-12-12 Thread Ruslan Dautkhanov (JIRA)
Ruslan Dautkhanov created ZEPPELIN-1791:
---

 Summary: Bar Chart - Stacked setting isn't get persisted
 Key: ZEPPELIN-1791
 URL: https://issues.apache.org/jira/browse/ZEPPELIN-1791
 Project: Zeppelin
  Issue Type: Bug
  Components: Core, front-end, Interpreters
Affects Versions: 0.7.0
 Environment: 0.7 from master/snapshot
Reporter: Ruslan Dautkhanov


Bar Chart - Stacked setting isn't get persisted.

Every time Bar Chart is getting switched to 
Grouped 
from Stacked.

We have some particular visualizations that only need Stacked Bar Charts.
It's very incovenient to switch back to Stack every time.

Please make Bar Chart - Stacked setting persisted.

See screenshot.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (ZEPPELIN-1789) Improve data download file name from data.csv to ${paragraph_titile}.csv

2016-12-12 Thread Ruslan Dautkhanov (JIRA)
Ruslan Dautkhanov created ZEPPELIN-1789:
---

 Summary: Improve data download file name from data.csv to 
${paragraph_titile}.csv
 Key: ZEPPELIN-1789
 URL: https://issues.apache.org/jira/browse/ZEPPELIN-1789
 Project: Zeppelin
  Issue Type: Improvement
  Components: front-end, zeppelin-server
Affects Versions: 0.7.0
Reporter: Ruslan Dautkhanov


Currently data downloads are name just data.csv.

It's really confuzing when users have to download a lot of tables from 
different paragraphs.

Suggested naming 
- ${paragraph_name}.csv
- ${notebook_name}-${paragraph_name}.csv
- ${notebook_name}-${paragraph_name}-{paragraph_finishedAtTime}.csv




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (ZEPPELIN-1777) %spark is displayed twice in interpreter settings UI when %pyspark is made default

2016-12-08 Thread Ruslan Dautkhanov (JIRA)
Ruslan Dautkhanov created ZEPPELIN-1777:
---

 Summary: %spark is displayed twice in interpreter settings UI when 
%pyspark is made default
 Key: ZEPPELIN-1777
 URL: https://issues.apache.org/jira/browse/ZEPPELIN-1777
 Project: Zeppelin
  Issue Type: Bug
  Components: Core, GUI, Interpreters, pySpark
Affects Versions: 0.7.0
 Environment: Spark 2
Reporter: Ruslan Dautkhanov


%spark is displayed twice in interpreter settings UI when %pyspark is made 
default

When I made pySpark as default - it works as expected; 
except Setting UI. See screenshot below.

Notice it shows %spark twice.
First time as default. 2nd one is not.
It should have been %pyspark (default), %spark, ..
as I made pyspark default.

See screenshot.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (ZEPPELIN-1775) Add dropdown item "Download Data as XLSX"

2016-12-08 Thread Ruslan Dautkhanov (JIRA)
Ruslan Dautkhanov created ZEPPELIN-1775:
---

 Summary: Add dropdown item "Download Data as XLSX"
 Key: ZEPPELIN-1775
 URL: https://issues.apache.org/jira/browse/ZEPPELIN-1775
 Project: Zeppelin
  Issue Type: Improvement
  Components: Core, front-end, GUI
Affects Versions: 0.7.0
Reporter: Ruslan Dautkhanov


It's currently possible to download Data as CSV and TSV formats.

Please add dropdown item "Download Data as XLSX".

XLSX format has following advantages over CSV/TSV:
- it can carry also data types not just data;
- it can have compression built in;
- there is no csv import wizard to run - xlsx is self-descriptive.

XLSX format has several advantages over XLS:
- limits are 1,048,576 rows  (vs 65,536 in xls);
- 16,384 columns  (vs 256 in xls);
- main advantage is XLS is a proprietary binary format while XLSX is based on 
Office Open XML format.

Cloudera's Hue for example does allow do download query results as xlsx and our 
analysts find it very useful.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (ZEPPELIN-1774) Export notebook as a pixel-perfect printable document, i.e. export as a PDF

2016-12-08 Thread Ruslan Dautkhanov (JIRA)
Ruslan Dautkhanov created ZEPPELIN-1774:
---

 Summary: Export notebook as a pixel-perfect printable document, 
i.e. export as a PDF
 Key: ZEPPELIN-1774
 URL: https://issues.apache.org/jira/browse/ZEPPELIN-1774
 Project: Zeppelin
  Issue Type: Improvement
  Components: Core, front-end
Affects Versions: 0.7.0
Reporter: Ruslan Dautkhanov


Export notebook as a pixel-perfect printable document, i.e. export as a PDF.

Any other wide-adopted format would do too, for example, I guess that could be 
Open Office XML format (.docx). Although PDF is preferred.

Our users are looking for functionality similar to Jupyter's save notebook as a 
PDF.

It would be great to have the same in Zeppelin.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (ZEPPELIN-1730) impersonate spark interpreter using --proxy-user

2016-11-29 Thread Ruslan Dautkhanov (JIRA)
Ruslan Dautkhanov created ZEPPELIN-1730:
---

 Summary: impersonate spark interpreter using --proxy-user
 Key: ZEPPELIN-1730
 URL: https://issues.apache.org/jira/browse/ZEPPELIN-1730
 Project: Zeppelin
  Issue Type: New Feature
  Components: conf, Core, Interpreters, security, zeppelin-server
Affects Versions: 0.7.0
Reporter: Ruslan Dautkhanov


Cloudera's Hue is using proxy authentication quite successfully
in our organization. I.e. Hue does LDAP authentication, and then impersonates 
to that specific user and all requests are made on behalf of that user 
(although `hue` is actual OS user that runs Hue service). Other Hadoop services 
are just configured to trust user `hue` to impersonate to other users.

It might be easier to implement Spark's multitenancy support through 
spark-submit's --proxy-user parameter. 

(this is applicable to kerberized and non-kerberized environments)
See comments in 
https://github.com/apache/spark/pull/4405 
and https://issues.apache.org/jira/browse/SPARK-5493 (resolved in Spark 1.3)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (ZEPPELIN-1728) Assigning HiveContext(sc) to a variable 2nd time gives errors

2016-11-29 Thread Ruslan Dautkhanov (JIRA)
Ruslan Dautkhanov created ZEPPELIN-1728:
---

 Summary: Assigning HiveContext(sc) to a variable 2nd time gives 
errors
 Key: ZEPPELIN-1728
 URL: https://issues.apache.org/jira/browse/ZEPPELIN-1728
 Project: Zeppelin
  Issue Type: Bug
  Components: Core, pySpark, zeppelin-server
Affects Versions: 0.6.2
 Environment: Spark 1.6 that comes with CDH 5.8.3. 
Zeppelin 0.6.2, downloaded as zeppelin-0.6.2-bin-all.tgz from from apache.org

Reporter: Ruslan Dautkhanov


Assigning HiveContext(sc) to a variable 2nd time gives "You must build Spark 
with Hive. Export 'SPARK_HIVE=true'"

It's only fixable by restarting Zeppelin.

Getting 
You must build Spark with Hive. Export 'SPARK_HIVE=true'
See full stack (2) below.

I'm using Spark 1.6 that comes with CDH 5.8.3. 
So it's definitely compiled with Hive.
We use Jupyter notebooks without problems in the same environment.

Using Zeppelin 0.6.2, downloaded as zeppelin-0.6.2-bin-all.tgz from from 
apache.org

Is Zeppelin compiled with Hive too? I guess so.
Not sure what else is missing.

Tried to play with ZEPPELIN_SPARK_USEHIVECONTEXT but it does not make 
difference.


(1)
{noformat}
$ cat zeppelin-env.sh
export JAVA_HOME=/usr/java/java7
export SPARK_HOME=/opt/cloudera/parcels/CDH/lib/spark
export SPARK_SUBMIT_OPTIONS="--principal  --keytab yyy --conf 
spark.driver.memory=7g --conf spark.executor.cores=2 --conf 
spark.executor.memory=8g"
export SPARK_APP_NAME="Zeppelin notebook"
export HADOOP_CONF_DIR=/etc/hadoop/conf
export HIVE_CONF_DIR=/etc/hive/conf
export HIVE_HOME=/opt/cloudera/parcels/CDH/lib/hive
export PYSPARK_PYTHON="/opt/cloudera/parcels/Anaconda/bin/python2"
export 
PYTHONPATH="/opt/cloudera/parcels/CDH/lib/spark/python:/opt/cloudera/parcels/CDH/lib/spark/python/lib/py4j-0.9-src.zip"
export MASTER="yarn-client"
export ZEPPELIN_SPARK_USEHIVECONTEXT=true
{noformat}



(2)
{noformat}
You must build Spark with Hive. Export 'SPARK_HIVE=true' and run build/sbt 
assembly
Traceback (most recent call last):
  File "/tmp/zeppelin_pyspark-9143637669637506477.py", line 267, in 
raise Exception(traceback.format_exc())
Exception: Traceback (most recent call last):
  File "/tmp/zeppelin_pyspark-9143637669637506477.py", line 265, in 
exec(code)
  File "", line 9, in 
  File "/opt/cloudera/parcels/CDH/lib/spark/python/pyspark/sql/context.py", 
line 580, in sql
{noformat}

(3)
{noformat}
Also have correct symlinks in zeppelin_home/conf for
- hive-site.xml
- hdfs-site.xml
- core-site.xml
- yarn-site.xml
{noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (ZEPPELIN-1661) Ship Zeppelin with shiro-tools-hasher-X.X.X-cli.jar

2016-11-14 Thread Ruslan Dautkhanov (JIRA)
Ruslan Dautkhanov created ZEPPELIN-1661:
---

 Summary: Ship Zeppelin with shiro-tools-hasher-X.X.X-cli.jar
 Key: ZEPPELIN-1661
 URL: https://issues.apache.org/jira/browse/ZEPPELIN-1661
 Project: Zeppelin
  Issue Type: Improvement
  Components: build, Core
Affects Versions: 0.6.2
Reporter: Ruslan Dautkhanov
Priority: Minor


Please add shiro-tools-hasher-X.X.X-cli.jar to the Zeppelin distribution.

It would be nice to run 
java -jar shiro-tools-hasher-X.X.X-cli.jar -p
out of the box from Zeppelin
as described in
http://shiro.apache.org/command-line-hasher.html

Referenced from Zeppelin documentation in
http://shiro.apache.org/configuration.html#encrypting-passwords
as
{quote}
Easy Secure Passwords

To save time and use best-practices, you might want to use Shiro's Command Line 
Hasher
{quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (ZEPPELIN-1660) Home directory references (i.e. ~/zeppelin/) in zeppelin-env.sh don't work as expected

2016-11-14 Thread Ruslan Dautkhanov (JIRA)
Ruslan Dautkhanov created ZEPPELIN-1660:
---

 Summary: Home directory references (i.e. ~/zeppelin/) in 
zeppelin-env.sh don't work as expected
 Key: ZEPPELIN-1660
 URL: https://issues.apache.org/jira/browse/ZEPPELIN-1660
 Project: Zeppelin
  Issue Type: Bug
  Components: conf, zeppelin-server
Affects Versions: 0.6.2
 Environment: Java 7
RHEL 6
Reporter: Ruslan Dautkhanov


We want to have template for Zeppelin conf files and a skelton of user-owned 
directories for each Zeppelin instance, so have to have configuration file 
zeppelin-env.sh to not have absolute paths, but relative to each user's home 
directory. "~" unix-type directory references don't work in Zeppelin.

Have following settings in zeppelin-env.sh that reference current user's home 
directory subdirectories under "zeppelin":
{noformat}
export ZEPPELIN_LOG_DIR="~/zeppelin/log"
export ZEPPELIN_PID_DIR="~/zeppelin/run"
export ZEPPELIN_WAR_TEMPDIR="~/zeppelin/tmp"
export ZEPPELIN_NOTEBOOK_DIR="~/zeppelin/notebooks"
{noformat}

Attempt to start zeppelin.sh --config ~/zeppelin/conf/ 
shows
{noformat}
Log dir doesn't exist, create ~/zeppelin/log
Pid dir doesn't exist, create ~/zeppelin/run
Pid dir doesn't exist, create ~/zeppelin/notebooks
{noformat}

Zeppelin actually creates a directory names "~"/zeppelin (yes, with tilde 
character, with underneath directory named "zeppelin") in the current directory.

We want to have template for Zeppelin conf files and a skelton of user-owned 
directories for each Zeppelin instance, so have to have configuration file 
zeppelin-env.sh to not have absolute paths, but relative to each user's home 
directory. "~" unix-type directory references don't work in Zeppelin.

ps. We also tried to use "~user/zeppelin" - for the sake of completeness of the 
tests -- the same issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)