No idea how feasible this is. Has anyone done it?
To clarify: I don't need the actual paths, just the distances.
On Wed, Mar 26, 2014 at 3:04 PM, Ryan Compton compton.r...@gmail.com wrote:
No idea how feasible this is. Has anyone done it?
Does this continue in newer versions? (I'm on 0.8.0 now)
When I use .distinct() on moderately large datasets (224GB, 8.5B rows,
I'm guessing about 500M are distinct) my jobs fail with:
14/04/17 15:04:02 INFO cluster.ClusterTaskSetManager: Loss was due to
java.io.FileNotFoundException
Btw, I've got System.setProperty(spark.shuffle.consolidate.files,
true) and use ext3 (CentOS...)
On Thu, Apr 17, 2014 at 3:20 PM, Ryan Compton compton.r...@gmail.com wrote:
Does this continue in newer versions? (I'm on 0.8.0 now)
When I use .distinct() on moderately large datasets (224GB, 8.5B
I am trying to read an edge list into a Graph. My data looks like
394365859 -- 136153151
589404147 -- 1361045425
I read it into a Graph via:
val edgeFullStrRDD: RDD[String] = sc.textFile(unidirFName)
val edgeTupRDD = edgeFullStrRDD.map(x = x.split(\t))
.map(x
Try this: https://www.dropbox.com/s/xf34l0ta496bdsn/.txt
This code:
println(g.numEdges)
println(g.numVertices)
println(g.edges.distinct().count())
gave me
1
9294
2
On Tue, Apr 22, 2014 at 5:14 PM, Ankur Dave ankurd...@gmail.com wrote:
I wasn't able to reproduce this
, Ryan Compton compton.r...@gmail.com
wrote:
I'm trying shoehorn a label propagation-ish algorithm into GraphX. I
need to update each vertex with the median value of their neighbors.
Unlike PageRank, which updates each vertex with the mean of their
neighbors, I don't have a simple commutative
I use both Pig and Spark. All my code is built with Maven into a giant
*-jar-with-dependencies.jar. I recently upgraded to Spark 1.0 and now
all my pig scripts fail with:
Caused by: java.lang.RuntimeException: Could not resolve error that
occured when launching map reduce job:
/bidirectional-network-current/part-r-1'
USING PigStorage() AS (id1:long, id2:long, weight:int);
ttt = LIMIT edgeList0 10;
DUMP ttt;
On Wed, May 28, 2014 at 12:55 PM, Ryan Compton compton.r...@gmail.com wrote:
It appears to be Spark 1.0 related. I made a pom.xml with a single
dependency on Spark
posted a JIRA https://issues.apache.org/jira/browse/SPARK-1952
On Wed, May 28, 2014 at 1:14 PM, Ryan Compton compton.r...@gmail.com wrote:
Remark, just including the jar built by sbt will produce the same
error. i,.e this pig script will fail:
REGISTER
/usr/share/osi1/spark-1.0.0/assembly
Just ran into this today myself. I'm on branch-1.0 using a CDH3
cluster (no modifications to Spark or its dependencies). The error
appeared trying to run GraphX's .connectedComponents() on a ~200GB
edge list (GraphX worked beautifully on smaller data).
Here's the stacktrace (it's quite similar to
Fwiw if you do decide to handle language detection on your machine this
library works great on tweets https://github.com/carrotsearch/langid-java
On Tue, Nov 11, 2014, 7:52 PM Tobias Pfeiffer t...@preferred.jp wrote:
Hi,
On Wed, Nov 12, 2014 at 5:42 AM, SK skrishna...@gmail.com wrote:
But
12 matches
Mail list logo