Hi, What is he best way to unpersist the RDD in graphx to release memory?
RDD.unpersist
or
RDD.unpersistVertices and RDD..edges.unpersist
I study the source code of Pregel.scala, Both of above were used between
line 148 and line 150. Can anyone please tell me what the different? In
addition, what
It depends of course on the background of the people but how about some
examples ("word count") how it works in the background.
> On 01 Feb 2016, at 07:31, charles li wrote:
>
>
> Apache Spark™ is a fast and general engine for large-scale data processing.
>
> it's a
My 2 cents. Concepts are always boring to the people with zero background.
Use examples to show how easy and powerful Spark is! Use cases are also
useful for them. Downloaded the slides in Spark summit. I believe you can
find a lot of interesting ideas!
Tomorrow, I am facing similar issues, but
*Apache Spark™* is a fast and general engine for large-scale data
processing.
it's a good profile of spark, but it's really too short for lots of people
if then have little background in this field.
ok, frankly, I'll give a tech-talk about spark later this week, and now I'm
writing a slide about
I used to use spark 1.3.x before, and explore my data in ipython [3.2]
notebook, which was very stable. but I came across an error
" Java gateway process exited before sending the driver its port number "
my code is as bellow:
```
import pyspark
from pyspark import SparkConf
sc_conf =
Hm.. As I said here
https://github.com/databricks/spark-csv/issues/245#issuecomment-177682354,
It sounds reasonable in a way though. For me, this might be to deal with
some narrow use-cases.
How about using csvRdd(),
Hello All,
I am running the history server for a completed application. This
application was run with the following parameters
bin/spark-submit --class --master local[2] --conf
spark.local.dir=/mnt/ --conf spark.eventLog.dir=/mnt/sparklog/ --conf
spark.eventLog.enabled=true --conf