I assume we will have an rc10 to fix the issues Matei found?
Tom
On Sunday, May 18, 2014 9:08 PM, Patrick Wendell pwend...@gmail.com wrote:
Hey Matei - the issue you found is not related to security. This patch
a few days ago broke builds for Hadoop 1 with YARN support enabled.
The patch
Oh... ha, good point. Sorry, I'm new to mapreduce programming and forgot
about that... I'll have to adjust my reduce function to output a vector/RDD
as the element to return. Thanks for reminding me of this!
--
View this message in context:
It's an Iterator in both Java and Scala. In both cases you need to
copy the stream of values into something List-like to sort it. An
Iterable would not change that (not sure the API can promise many
iterations anyway).
If you just want the equivalent of toArray, you can use a utility
method in
Thanks Sean, I had seen that post you mentioned.
What you suggest looks an in-memory sort, which is fine if each partition is
small enough to fit in memory. Is it true that rdd.sortByKey(...) requires
partitions to fit in memory? I wasn't sure if there was some magic behind
the scenes that
Voted :)
https://issues.apache.org/jira/browse/SPARK-983
On Tue, May 20, 2014 at 10:21 AM, Sandy Ryza sandy.r...@cloudera.comwrote:
There is: SPARK-545
On Tue, May 20, 2014 at 10:16 AM, Andrew Ash and...@andrewash.com wrote:
Sandy, is there a Jira ticket for that?
On Tue, May 20,
Sean,
No, I don't want to sort the whole RDD, sortByKey seems to be good enough
for that.
Right now, I think the code I have will work for me, but I can imagine
conditions where it will run out of memory.
I'm not completely sure if SPARK-983
https://issues.apache.org/jira/browse/SPARK-983
Wait a minute... doesn't a reduce function return 1 element PER key pair?
For example, word-count mapreduce functions return a {word, count} element
for every unique word. Is this supposed to be a 1-element RDD object?
The .reduce function for a MappedRDD or FlatMappedRDD both are of the form
You are probably looking for reduceByKey in that case.
reduce just reduces everything in the collection into a single element.
On Tue, May 20, 2014 at 12:16 PM, GlennStrycker glenn.stryc...@gmail.comwrote:
Wait a minute... doesn't a reduce function return 1 element PER key pair?
For example,
I don't seem to have this function in my Spark installation for this object,
or the classes MappedRDD, FlatMappedRDD, EdgeRDD, VertexRDD, or Graph.
Which class should have the reduceByKey function, and how do I cast my
current RDD as this class?
Perhaps this is still due to my Spark installation
That's all very old functionality in Spark terms, so it shouldn't have
anything to do with your installation being out-of-date. There is also no
need to cast as long as the relevant implicit conversions are in scope:
import org.apache.spark.SparkContext._
On Tue, May 20, 2014 at 1:00 PM,
http://spark.apache.org/docs/0.9.1/api/core/index.html#org.apache.spark.rdd.PairRDDFunctions
It becomes automagically available when your RDD contains pairs.
On Tue, May 20, 2014 at 9:00 PM, GlennStrycker glenn.stryc...@gmail.com wrote:
I don't seem to have this function in my Spark
For some reason it does not appear when I hit tab in Spark shell, but when
I put everything together in one line, it DOES WORK!
orig_graph.edges.map(_.copy()).cartesian(orig_graph.edges.map(_.copy())).flatMap(
A = Seq(if (A._1.srcId == A._2.dstId) Edge(A._2.srcId,A._1.dstId,1) else if
(A._1.dstId
Please vote on releasing the following candidate as Apache Spark version 1.0.0!
This has a few bug fixes on top of rc9:
SPARK-1875: https://github.com/apache/spark/pull/824
SPARK-1876: https://github.com/apache/spark/pull/819
SPARK-1878: https://github.com/apache/spark/pull/822
SPARK-1879:
I fixed the bug, but I kept the parameter i instead of _ since that (1)
keeps it more parallel to the python and java versions which also use
functions with a named variable and (2) doesn't require readers to know
this particular use of the _ syntax in Scala.
Thanks for catching this Glenn.
Andy
+1
On Tue, May 20, 2014 at 5:26 PM, Andrew Or and...@databricks.com wrote:
+1
2014-05-20 13:13 GMT-07:00 Tathagata Das tathagata.das1...@gmail.com:
Please vote on releasing the following candidate as Apache Spark version
1.0.0!
This has a few bug fixes on top of rc9:
SPARK-1875:
+1 (non-binding)
I have:
- checked signatures and checksums of the files
- built the code from the git repo using both sbt and mvn (against hadoop 2.3.0)
- ran a few simple jobs in local, yarn-client and yarn-cluster mode
Haven't explicitly tested any of the recent fixes, streaming nor sql.
On
Talked with Sandy and DB offline. I think the best solution is sending
the secondary jars to the distributed cache of all containers rather
than just the master, and set the classpath to include spark jar,
primary app jar, and secondary jars before executor starts. In this
way, user only needs to
+1
Tested it on both Windows and Mac OS X, with both Scala and Python. Confirmed
that the issues in the previous RC were fixed.
Matei
On May 20, 2014, at 5:28 PM, Marcelo Vanzin van...@cloudera.com wrote:
+1 (non-binding)
I have:
- checked signatures and checksums of the files
- built
18 matches
Mail list logo