Re: Why I can't use broadcast var defined in a global object?

2016-08-13 Thread yaochunnan
Hi David, Your answers have solved my problem! Detailed and accurate. Thank you very much! -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Why-I-can-t-use-broadcast-var-defined-in-a-global-object-tp27523p27531.html Sent from the Apache Spark User List

Re: Why I can't use broadcast var defined in a global object?

2016-08-12 Thread yaochunnan
Hi David, Thank you for detailed reply. I understand what you said about the ideas on broadcast variable. But I am still a little bit confused. In your reply, you said: *It has sent largeValue across the network to each worker already, and gave you a/ key /to retrieve it.* So my question is,

Why I can't use broadcast var defined in a global object?

2016-08-12 Thread yaochunnan
Hi all, Here is a simplified example to show my concern. This example contains 3 files with 3 objects, depending on spark 1.6.1. //file globalObject.scala import org.apache.spark.broadcast.Broadcast object globalObject { var br_value: Broadcast[Map[Int, Double]] = null } //file

How can I do pair-wise computation between RDD feature columns?

2015-05-16 Thread yaochunnan
Hi all, Recently I've ran into a scenario to conduct two sample tests between all paired combination of columns of an RDD. But the networking load and generation of pair-wise computation is too time consuming. That has puzzled me for a long time. I want to conduct Wilcoxon rank-sum test

Possible long lineage issue when using DStream to update a normal RDD

2015-05-07 Thread yaochunnan
Hi all, Recently in our project, we need to update a RDD using data regularly received from DStream, I plan to use foreachRDD API to achieve this: var MyRDD = ... dstream.foreachRDD { rdd = MyRDD = MyRDD.join(rdd)... ... } Is this usage correct? My concern is, as I am repeatedly and

the indices of SparseVector must be ordered while computing SVD

2015-04-22 Thread yaochunnan
Hi all, I am using Spark 1.3.1 to write a Spectral Clustering algorithm. This really confused me today. At first I thought my implementation is wrong. It turns out it's an issue in MLlib. Fortunately, I've figured it out. I suggest to add a hint on user document of MLlib ( as far as I know,

How can I implement eigenvalue decomposition in Spark?

2014-08-07 Thread yaochunnan
Our lab need to do some simulation on online social networks. We need to handle a 5000*5000 adjacency matrix, namely, to get its largest eigenvalue and corresponding eigenvector. Matlab can be used but it is time-consuming. Is Spark effective in linear algebra calculations and transformations?