Re: Where is DataFrame.scala in 2.0?

2016-06-03 Thread Michael Malak
It's been reduced to a single line of code. http://technicaltidbit.blogspot.com/2016/03/dataframedataset-swap-places-in-spark-20.html From: Gerhard Fiedler To: "dev@spark.apache.org" Sent: Friday, June 3, 2016 9:01 AM Subject: Where

Re: [discuss] using deep learning to improve Spark

2016-04-01 Thread Michael Malak
I see you've been burning the midnight oil. From: Reynold Xin To: "dev@spark.apache.org" Sent: Friday, April 1, 2016 1:15 AM Subject: [discuss] using deep learning to improve Spark Hi all, Hope you all enjoyed the Tesla 3 unveiling

Re: [discuss] DataFrame vs Dataset in Spark 2.0

2016-02-25 Thread Michael Malak
Would it make sense (in terms of feasibility, code organization, and politically) to have a JavaDataFrame, as a way to isolate the 1000+ extra lines to a Java compatibility layer/class? From: Reynold Xin To: "dev@spark.apache.org" Sent:

Wrong initial bias in GraphX SVDPlusPlus?

2015-04-03 Thread Michael Malak
I believe that in the initialization portion of GraphX SVDPlusPluS, the initialization of biases is incorrect. Specifically, in line https://github.com/apache/spark/blob/master/graphx/src/main/scala/org/apache/spark/graphx/lib/SVDPlusPlus.scala#L96 instead of (vd._1, vd._2, msg.get._2 /

textFile() ordering and header rows

2015-02-22 Thread Michael Malak
Since RDDs are generally unordered, aren't things like textFile().first() not guaranteed to return the first row (such as looking for a header row)? If so, doesn't that make the example in http://spark.apache.org/docs/1.2.1/quick-start.html#basics misleading?

Word2Vec IndexedRDD

2015-02-01 Thread Michael Malak
1. Is IndexedRDD planned for 1.3? https://issues.apache.org/jira/browse/SPARK-2365 2. Once IndexedRDD is in, is it planned to convert Word2VecModel to it from its current Map[String,Array[Float]]?

Re: renaming SchemaRDD - DataFrame

2015-01-27 Thread Michael Malak
Message - From: Evan R. Sparks evan.spa...@gmail.com To: Matei Zaharia matei.zaha...@gmail.com Cc: Koert Kuipers ko...@tresata.com; Michael Malak michaelma...@yahoo.com; Patrick Wendell pwend...@gmail.com; Reynold Xin r...@databricks.com; dev@spark.apache.org dev@spark.apache.org Sent: Tuesday

Re: GraphX ShortestPaths backwards?

2015-01-20 Thread Michael Malak
I created https://issues.apache.org/jira/browse/SPARK-5343 for this. - Original Message - From: Michael Malak michaelma...@yahoo.com To: dev@spark.apache.org dev@spark.apache.org Cc: Sent: Monday, January 19, 2015 5:09 PM Subject: GraphX ShortestPaths backwards? GraphX ShortestPaths

GraphX ShortestPaths backwards?

2015-01-19 Thread Michael Malak
GraphX ShortestPaths seems to be following edges backwards instead of forwards: import org.apache.spark.graphx._ val g = Graph(sc.makeRDD(Array((1L,), (2L,), (3L,))), sc.makeRDD(Array(Edge(1L,2L,), Edge(2L,3L, lib.ShortestPaths.run(g,Array(3)).vertices.collect res1:

Re: GraphX vertex partition/location strategy

2015-01-19 Thread Michael Malak
But wouldn't the gain be greater under something similar to EdgePartition1D (but perhaps better load-balanced based on number of edges for each vertex) and an algorithm that primarily follows edges in the forward direction? From: Ankur Dave ankurd...@gmail.com To: Michael Malak michaelma

GraphX vertex partition/location strategy

2015-01-19 Thread Michael Malak
Does GraphX make an effort to co-locate vertices onto the same workers as the majority (or even some) of its edges? - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail:

GraphX doc: triangleCount() requirement overstatement?

2015-01-18 Thread Michael Malak
According to: https://spark.apache.org/docs/1.2.0/graphx-programming-guide.html#triangle-counting Note that TriangleCount requires the edges to be in canonical orientation (srcId dstId) But isn't this overstating the requirement? Isn't the requirement really that IF there are duplicate

Re: GraphX rmatGraph hangs

2015-01-04 Thread Michael Malak
Thank you. I created https://issues.apache.org/jira/browse/SPARK-5064 - Original Message - From: xhudik xhu...@gmail.com To: dev@spark.apache.org Cc: Sent: Saturday, January 3, 2015 2:04 PM Subject: Re: GraphX rmatGraph hangs Hi Michael, yes, I can confirm the behavior. It get stuck

GraphX rmatGraph hangs

2015-01-03 Thread Michael Malak
The following single line just hangs, when executed in either Spark Shell or standalone: org.apache.spark.graphx.util.GraphGenerators.rmatGraph(sc, 4, 8) It just outputs 0 edges and then locks up. The only other information I've found via Google is:

15 new MLlib algorithms

2014-07-09 Thread Michael Malak
At Spark Summit, Patrick Wendell indicated the number of MLlib algorithms would roughly double in 1.1 from the current approx. 15. http://spark-summit.org/wp-content/uploads/2014/07/Future-of-Spark-Patrick-Wendell.pdf What are the planned additional algorithms? In Jira, I only see two when

GraphX triplets on 5-node graph

2014-05-29 Thread Michael Malak
Shouldn't I be seeing N2 and N4 in the output below? (Spark 0.9.0 REPL) Or am I missing something fundamental? val nodes = sc.parallelize(Array((1L, N1), (2L, N2), (3L, N3), (4L, N4), (5L, N5))) val edges = sc.parallelize(Array(Edge(1L, 2L, E1), Edge(1L, 3L, E2), Edge(2L, 4L, E3), Edge(3L,

Re: [VOTE] Release Apache Spark 1.0.0 (rc5)

2014-05-17 Thread Michael Malak
While developers may appreciate 1.0 == API stability, I'm not sure that will be the understanding of the VP who gives the green light to a Spark-based development effort. I fear a bug that silently produces erroneous results will be perceived like the FDIV bug, but in this case without the

Serializable different behavior Spark Shell vs. Scala Shell

2014-05-13 Thread Michael Malak
Reposting here on dev since I didn't see a response on user: I'm seeing different Serializable behavior in Spark Shell vs. Scala Shell. In the Spark Shell, equals() fails when I use the canonical equals() pattern of match{}, but works when I subsitute with isInstanceOf[]. I am using Spark

Re: Serializable different behavior Spark Shell vs. Scala Shell

2014-05-13 Thread Michael Malak
:26 AM, Michael Malak michaelma...@yahoo.com wrote: Reposting here on dev since I didn't see a response on user: I'm seeing different Serializable behavior in Spark Shell vs. Scala Shell. In the Spark Shell, equals() fails when I use the canonical equals() pattern of match{}, but works when I