Hi Pseudo I do not know much about zeppelin . What languages are you using?
I have been doing my data exploration and graphing using python mostly because early on spark had good support for python. Its easy to collect() data as a local PANDAS object. I think at this point R should work well. You should be able to easily collect() your data as a R dataframe. I have not tried to Rstudio. I typically run the Jupiter notebook server in my data center. I find the notebooks really nice. I typically use matplotlib to generates my graph. There are a lot of graphing packages. Attached is the script I use to start the notebook server. This script and process works but is a little hacky You call it as follows # # on a machine in your cluster # $ cd dirWithNotebooks # all the logs will be in startIPythonNotebook.sh.out # nohup allows you to log in start your notebook server and log out. $ nohup startIPythonNotebook.sh > startIPythonNotebook.sh.out & # # on you local machine # # because of firewalls I need to open an ssh tunnel $ ssh -o ServerAliveInterval=120 -N -f -L localhost:8889:localhost:7000 myCluster # connect to the notebook server using the browser of you choice http://localhost:8889 # # If you need to stop your notebooks server you may need to kill the server # there is probably a cleaner way to do this # $ ps -el | head -1; ps -efl | grep python # <http://jupyter.org/> http://jupyter.org/ P.S. Jupiter is in the process of being released. The new Juypter lab alpha was just announced it looks really sweet. From: pseudo oduesp <pseudo20...@gmail.com> Date: Friday, July 22, 2016 at 2:08 AM To: Andrew Davidson <a...@santacruzintegration.com> Subject: Re: spark and plot data > HI andy , > thanks for reply , > i tell it just hard to each time switch between local concept and destributed > concept , for example zepplin give easy way to interact with data ok , but > it's hard to configure on huge cluster with lot of node in my case i have > cluster with 69 nodes and i process huge volume of data with pyspark and it > cool but when i want to plot some chart i get hard job to make it . > > i sampling my result or aggregate , take for example if i user randomforest > algorithme in machine learning i want to retrive most importante features > with my version alerady installed in our cluster (1.5.0) i can't get this. > > do you have any solution. > > Thanks > > 2016-07-21 18:44 GMT+02:00 Andy Davidson <a...@santacruzintegration.com>: >> Hi Pseudo >> >> Plotting, graphing, data visualization, report generation are common needs in >> scientific and enterprise computing. >> >> Can you tell me more about your use case? What is it about the current >> process / workflow do you think could be improved by pushing plotting (I >> assume you mean plotting and graphing) into spark. >> >> >> In my personal work all the graphing is done in the driver on summary stats >> calculated using spark. So for me using standard python libs has not been a >> problem. >> >> Andy >> >> From: pseudo oduesp <pseudo20...@gmail.com> >> Date: Thursday, July 21, 2016 at 8:30 AM >> To: "user @spark" <user@spark.apache.org> >> Subject: spark and plot data >> >>> Hi , >>> i know spark it s engine to compute large data set but for me i work with >>> pyspark and it s very wonderful machine >>> >>> my question we don't have tools for ploting data each time we have to >>> switch and go back to python for using plot. >>> but when you have large result scatter plot or roc curve you cant use >>> collect to take data . >>> >>> somone have propostion for plot . >>> >>> thanks >
startIPythonNotebook.sh
Description: Binary data
--------------------------------------------------------------------- To unsubscribe e-mail: user-unsubscr...@spark.apache.org