Unpersist RDD in Graphx

2016-01-31 Thread Zhang, Jingyu
Hi, What is he best way to unpersist the RDD in graphx to release memory? RDD.unpersist or RDD.unpersistVertices and RDD..edges.unpersist I study the source code of Pregel.scala, Both of above were used between line 148 and line 150. Can anyone please tell me what the different? In addition, what

Re: how to introduce spark to your colleague if he has no background about *** spark related

2016-01-31 Thread Jörn Franke
It depends of course on the background of the people but how about some examples ("word count") how it works in the background. > On 01 Feb 2016, at 07:31, charles li wrote: > > > Apache Spark™ is a fast and general engine for large-scale data processing. > > it's a

Re: how to introduce spark to your colleague if he has no background about *** spark related

2016-01-31 Thread Xiao Li
My 2 cents. Concepts are always boring to the people with zero background. Use examples to show how easy and powerful Spark is! Use cases are also useful for them. Downloaded the slides in Spark summit. I believe you can find a lot of interesting ideas! Tomorrow, I am facing similar issues, but

how to introduce spark to your colleague if he has no background about *** spark related

2016-01-31 Thread charles li
*Apache Spark™* is a fast and general engine for large-scale data processing. it's a good profile of spark, but it's really too short for lots of people if then have little background in this field. ok, frankly, I'll give a tech-talk about spark later this week, and now I'm writing a slide about

confusing about start ipython notebook with spark between 1.3.x and 1.6.x

2016-01-31 Thread charles li
I used to use spark 1.3.x before, and explore my data in ipython [3.2] notebook, which was very stable. but I came across an error " Java gateway process exited before sending the driver its port number " my code is as bellow: ``` import pyspark from pyspark import SparkConf sc_conf =

Re: Reading lzo+index with spark-csv (Splittable reads)

2016-01-31 Thread Hyukjin Kwon
Hm.. As I said here https://github.com/databricks/spark-csv/issues/245#issuecomment-177682354, It sounds reasonable in a way though. For me, this might be to deal with some narrow use-cases. How about using csvRdd(),

DAG visualization: no visualization information available with history server

2016-01-31 Thread Raghava
Hello All, I am running the history server for a completed application. This application was run with the following parameters bin/spark-submit --class --master local[2] --conf spark.local.dir=/mnt/ --conf spark.eventLog.dir=/mnt/sparklog/ --conf spark.eventLog.enabled=true --conf