How to convert String data to RDD.

2015-01-02 Thread RP
= getUrlAsString(https://somehost.com/test.json;) val jsonDataRDD = ? val json1 = sqlContext.jsonRDD(jsonDataRDD) Thanks, RP

Re: The running time of spark

2014-08-23 Thread Denis RP
In fact I think it's highly impossible, but I just want some confirmation from you, please leave your option, thanks :) -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/The-running-time-of-spark-tp12624p12691.html Sent from the Apache Spark User List mailing

Re: The running time of spark

2014-08-23 Thread Denis RP
The algorithm uses Pregel of GraphX. It ran for more than one day and only reached the third stage, and I cancelled it because the consumption is unacceptable. The time expected is about ten minutes (not expected by me ...), but I think a couple of hours is acceptable. Bottleneck seems to be I/O,

Re: The running time of spark

2014-08-23 Thread Denis RP
Thanks for the suggestion, the program actually failed because of OutOfMemory: Java heap space, and I tried some modifications and it went further, but the exception might occur again anyway. How long did your test take? I can take it for reference. -- View this message in context:

The running time of spark

2014-08-21 Thread Denis RP
Hi, I'm using spark on a cluster of 8 VMs, each with two cores and 3.5GB RAM. But I need to run a shortest path algorithm on data of 500+GB(textfile, each line contains a node id and nodes it points to) I've tested it on the cluster, but the speed seems to be extremely slow, and haven't got any

the pregel operator of graphx throws NullPointerException

2014-07-29 Thread Denis RP
Hi, I'm running a spark standalone cluster to calculate single source shortest path. Here is the code, VertexRDD[(String, Long)], String for the path and Long for the distance codes before these lines related to reading graph data from file and building the graph. 71 val sssp =

Re: the pregel operator of graphx throws NullPointerException

2014-07-29 Thread Denis RP
I build it with sbt package, I run it with sbt run, and I do use SparkConf.set for deployment options and external jars. It seems that spark-submit can't load extra jars and will lead to noclassdeffounderror, should I pack all the jars to a giant one and give it a try? I run it on a cluster of 8

Re: Spark got stuck with a loop

2014-07-25 Thread Denis RP
Anyone can help? I'm using spark 1.0.1 I'm confusing that if the block is found, why no non-empty blocks is got, and the process keeps going forever? Thanks! -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-got-stuck-with-a-loop-tp10590p10663.html

Spark got stuck with a loop

2014-07-24 Thread Denis RP
Hi, I ran spark standalone mode on a cluster and it went well for approximately one hour, then the driver's output stopped with the following: 14/07/24 08:07:36 INFO MapOutputTrackerMasterActor: Asked to send map output locations for shuffle 36 to spark@worker5.local:47416 14/07/24 08:07:36 INFO