It's been reduced to a single line of code.
http://technicaltidbit.blogspot.com/2016/03/dataframedataset-swap-places-in-spark-20.html
From: Gerhard Fiedler
To: "dev@spark.apache.org"
Sent: Friday, June 3, 2016 9:01 AM
Subject: Where
I see you've been burning the midnight oil.
From: Reynold Xin
To: "dev@spark.apache.org"
Sent: Friday, April 1, 2016 1:15 AM
Subject: [discuss] using deep learning to improve Spark
Hi all,
Hope you all enjoyed the Tesla 3 unveiling
Would it make sense (in terms of feasibility, code organization, and
politically) to have a JavaDataFrame, as a way to isolate the 1000+ extra lines
to a Java compatibility layer/class?
From: Reynold Xin
To: "dev@spark.apache.org"
Sent:
I believe that in the initialization portion of GraphX SVDPlusPluS, the
initialization of biases is incorrect. Specifically, in line
https://github.com/apache/spark/blob/master/graphx/src/main/scala/org/apache/spark/graphx/lib/SVDPlusPlus.scala#L96
instead of
(vd._1, vd._2, msg.get._2 /
Since RDDs are generally unordered, aren't things like textFile().first() not
guaranteed to return the first row (such as looking for a header row)? If so,
doesn't that make the example in
http://spark.apache.org/docs/1.2.1/quick-start.html#basics misleading?
1. Is IndexedRDD planned for 1.3?
https://issues.apache.org/jira/browse/SPARK-2365
2. Once IndexedRDD is in, is it planned to convert Word2VecModel to it from its
current Map[String,Array[Float]]?
Message -
From: Evan R. Sparks evan.spa...@gmail.com
To: Matei Zaharia matei.zaha...@gmail.com
Cc: Koert Kuipers ko...@tresata.com; Michael Malak michaelma...@yahoo.com;
Patrick Wendell pwend...@gmail.com; Reynold Xin r...@databricks.com;
dev@spark.apache.org dev@spark.apache.org
Sent: Tuesday
I created https://issues.apache.org/jira/browse/SPARK-5343 for this.
- Original Message -
From: Michael Malak michaelma...@yahoo.com
To: dev@spark.apache.org dev@spark.apache.org
Cc:
Sent: Monday, January 19, 2015 5:09 PM
Subject: GraphX ShortestPaths backwards?
GraphX ShortestPaths
GraphX ShortestPaths seems to be following edges backwards instead of forwards:
import org.apache.spark.graphx._
val g = Graph(sc.makeRDD(Array((1L,), (2L,), (3L,))),
sc.makeRDD(Array(Edge(1L,2L,), Edge(2L,3L,
lib.ShortestPaths.run(g,Array(3)).vertices.collect
res1:
But wouldn't the gain be greater under something similar to EdgePartition1D
(but perhaps better load-balanced based on number of edges for each vertex) and
an algorithm that primarily follows edges in the forward direction?
From: Ankur Dave ankurd...@gmail.com
To: Michael Malak michaelma
Does GraphX make an effort to co-locate vertices onto the same workers as the
majority (or even some) of its edges?
-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail:
According to:
https://spark.apache.org/docs/1.2.0/graphx-programming-guide.html#triangle-counting
Note that TriangleCount requires the edges to be in canonical orientation
(srcId dstId)
But isn't this overstating the requirement? Isn't the requirement really that
IF there are duplicate
Thank you. I created
https://issues.apache.org/jira/browse/SPARK-5064
- Original Message -
From: xhudik xhu...@gmail.com
To: dev@spark.apache.org
Cc:
Sent: Saturday, January 3, 2015 2:04 PM
Subject: Re: GraphX rmatGraph hangs
Hi Michael,
yes, I can confirm the behavior.
It get stuck
The following single line just hangs, when executed in either Spark Shell or
standalone:
org.apache.spark.graphx.util.GraphGenerators.rmatGraph(sc, 4, 8)
It just outputs 0 edges and then locks up.
The only other information I've found via Google is:
At Spark Summit, Patrick Wendell indicated the number of MLlib algorithms would
roughly double in 1.1 from the current approx. 15.
http://spark-summit.org/wp-content/uploads/2014/07/Future-of-Spark-Patrick-Wendell.pdf
What are the planned additional algorithms?
In Jira, I only see two when
Shouldn't I be seeing N2 and N4 in the output below? (Spark 0.9.0 REPL) Or am I
missing something fundamental?
val nodes = sc.parallelize(Array((1L, N1), (2L, N2), (3L, N3), (4L,
N4), (5L, N5)))
val edges = sc.parallelize(Array(Edge(1L, 2L, E1), Edge(1L, 3L, E2),
Edge(2L, 4L, E3), Edge(3L,
While developers may appreciate 1.0 == API stability, I'm not sure that will
be the understanding of the VP who gives the green light to a Spark-based
development effort.
I fear a bug that silently produces erroneous results will be perceived like
the FDIV bug, but in this case without the
Reposting here on dev since I didn't see a response on user:
I'm seeing different Serializable behavior in Spark Shell vs. Scala Shell. In
the Spark Shell, equals() fails when I use the canonical equals() pattern of
match{}, but works when I subsitute with isInstanceOf[]. I am using Spark
:26 AM, Michael Malak michaelma...@yahoo.com wrote:
Reposting here on dev since I didn't see a response on user:
I'm seeing different Serializable behavior in Spark Shell vs. Scala Shell. In
the Spark Shell, equals() fails when I use the canonical equals() pattern of
match{}, but works when I
19 matches
Mail list logo