Re: spark and plot data

Pedro Rodriguez Fri, 22 Jul 2016 15:16:10 -0700

As of the most recent 0.6.0 release its partially alleviated, but still not
great (compared to something like Jupyter).


They can be "downloaded" but its only really meaningful in importing it
back to Zeppelin. It would be great if they could be exported as HTML or
PDF, but at present they can't be. I know they have some sort of git
support, but it was never clear to me how it was suppose to be used since
the docs are sparse on that. So far what works best for us is S3 storage,
but you don't get the benefit of Github using that (history + commits etc).

There are a couple other notebooks floating around, Apache Toree seems the
most promising for portability since its based on jupyter
https://github.com/apache/incubator-toree

On Fri, Jul 22, 2016 at 3:53 PM, Gourav Sengupta <gourav.sengu...@gmail.com>
wrote:

> The biggest stumbling block to using Zeppelin has been that we cannot
> download the notebooks, cannot export them and certainly cannot sync them
> back to Github, without mind numbing and sometimes irritating hacks. Have
> those issues been resolved?
>
>
> Regards,
> Gourav
>
>
> On Fri, Jul 22, 2016 at 2:22 PM, Pedro Rodriguez <ski.rodrig...@gmail.com>
> wrote:
>
>> Zeppelin works great. The other thing that we have done in notebooks
>> (like Zeppelin or Databricks) which support multiple types of spark session
>> is register Spark SQL temp tables in our scala code then escape hatch to
>> python for plotting with seaborn/matplotlib when the built in plots are
>> insufficient.
>>
>> —
>> Pedro Rodriguez
>> PhD Student in Large-Scale Machine Learning | CU Boulder
>> Systems Oriented Data Scientist
>> UC Berkeley AMPLab Alumni
>>
>> pedrorodriguez.io | 909-353-4423
>> github.com/EntilZha | LinkedIn
>> <https://www.linkedin.com/in/pedrorodriguezscience>
>>
>> On July 22, 2016 at 3:04:48 AM, Marco Colombo (
>> ing.marco.colo...@gmail.com) wrote:
>>
>> Take a look at zeppelin
>>
>> http://zeppelin.apache.org
>>
>> Il giovedì 21 luglio 2016, Andy Davidson <a...@santacruzintegration.com>
>> ha scritto:
>>
>>> Hi Pseudo
>>>
>>> Plotting, graphing, data visualization, report generation are common
>>> needs in scientific and enterprise computing.
>>>
>>> Can you tell me more about your use case? What is it about the current
>>> process / workflow do you think could be improved by pushing plotting (I
>>> assume you mean plotting and graphing) into spark.
>>>
>>>
>>> In my personal work all the graphing is done in the driver on summary
>>> stats calculated using spark. So for me using standard python libs has not
>>> been a problem.
>>>
>>> Andy
>>>
>>> From: pseudo oduesp <pseudo20...@gmail.com>
>>> Date: Thursday, July 21, 2016 at 8:30 AM
>>> To: "user @spark" <user@spark.apache.org>
>>> Subject: spark and plot data
>>>
>>> Hi ,
>>> i know spark  it s engine  to compute large data set but for me i work
>>> with pyspark and it s very wonderful machine
>>>
>>> my question  we  don't have tools for ploting data each time we have to
>>> switch and go back to python for using plot.
>>> but when you have large result scatter plot or roc curve  you cant use
>>> collect to take data .
>>>
>>> somone have propostion for plot .
>>>
>>> thanks
>>>
>>>
>>
>> --
>> Ing. Marco Colombo
>>
>>
>


-- 
Pedro Rodriguez
PhD Student in Distributed Machine Learning | CU Boulder
UC Berkeley AMPLab Alumni

ski.rodrig...@gmail.com | pedrorodriguez.io | 909-353-4423
Github: github.com/EntilZha | LinkedIn:
https://www.linkedin.com/in/pedrorodriguezscience

Re: spark and plot data

Reply via email to