Nawid Sayed created ZEPPELIN-4358:
-
Summary: Seaborn renders plots slowly in apache zeppelin notebooks
Key: ZEPPELIN-4358
URL: https://issues.apache.org/jira/browse/ZEPPELIN-4358
Project: Zeppelin
Issue Type: Bug
Components: pySpark
Affects Versions: 0.8.1
Reporter: Nawid Sayed
I am currently trying to generate visualizations in zeppelin (0.8.1) notebooks
using the pyspark interpreter with python 3.7.3.
Generating the following simple plot with seaborn (0.9.0) takes around 5
minutes (with very high CPU usage throughout the duration):
```%pyspark
%pyspark
import seaborn as sns
import numpy as np
import pandas as pd
data = pd.DataFrame(np.random.rand(100,3))
sns.pairplot(data)
```
This behavior is rather inconsistent as the following (much more data
intensive) plot is rendered instantly
```%pyspark
%pyspark
import seaborn as sns
import numpy as np
import pandas as pd
df = pd.DataFrame(data = np.random.rand(1,2))
sns.lineplot(x = 0, y = 1, data = df)
```
I noticed that using matplotlib (3.1.0) is generally much faster for and almost
as snappy as I am used to from jupyter notebook environments.
I have already read about issue
[ZEPPELIN-1894](https://jira.apache.org/jira/browse/ZEPPELIN-1894) but I can
render the mentioned scatterplot instantly as well.
I already stated my question on StackOverflow but I think here is a better
place:
--
This message was sent by Atlassian Jira
(v8.3.4#803005)