Hi David,
Your answers have solved my problem! Detailed and accurate. Thank you very
much!
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Why-I-can-t-use-broadcast-var-defined-in-a-global-object-tp27523p27531.html
Sent from the Apache Spark User List
Hi David,
Thank you for detailed reply. I understand what you said about the ideas on
broadcast variable. But I am still a little bit confused. In your reply, you
said:
*It has sent largeValue across the network to each worker already, and gave
you a/ key /to retrieve it.*
So my question is,
Hi all,
Here is a simplified example to show my concern. This example contains 3
files with 3 objects, depending on spark 1.6.1.
//file globalObject.scala
import org.apache.spark.broadcast.Broadcast
object globalObject {
var br_value: Broadcast[Map[Int, Double]] = null
}
//file
Hi all,
Recently I've ran into a scenario to conduct two sample tests between all
paired combination of columns of an RDD. But the networking load and
generation of pair-wise computation is too time consuming. That has puzzled
me for a long time. I want to conduct Wilcoxon rank-sum test
Hi all,
Recently in our project, we need to update a RDD using data regularly
received from DStream, I plan to use foreachRDD API to achieve this:
var MyRDD = ...
dstream.foreachRDD { rdd =
MyRDD = MyRDD.join(rdd)...
...
}
Is this usage correct? My concern is, as I am repeatedly and
Hi all,
I am using Spark 1.3.1 to write a Spectral Clustering algorithm. This really
confused me today. At first I thought my implementation is wrong. It turns
out it's an issue in MLlib. Fortunately, I've figured it out.
I suggest to add a hint on user document of MLlib ( as far as I know,
Our lab need to do some simulation on online social networks. We need to
handle a 5000*5000 adjacency matrix, namely, to get its largest eigenvalue
and corresponding eigenvector. Matlab can be used but it is time-consuming.
Is Spark effective in linear algebra calculations and transformations?