Re: [VOTE] Release Apache Spark 1.6.0 (RC1)
+1 OSX 10.10.5, java version "1.8.0_40", scala 2.10 mvn clean package -DskipTests [INFO] Spark Project External Kafka ... SUCCESS [ 18.161 s] [INFO] Spark Project Examples . SUCCESS [01:18 min] [INFO] Spark Project External Kafka Assembly .. SUCCESS [ 5.724 s] [INFO] [INFO] BUILD SUCCESS [INFO] [INFO] Total time: 14:59 min [INFO] Finished at: 2015-12-03T09:46:38+00:00 [INFO] Final Memory: 105M/2668M [INFO] Basic graph tests Load graph using edgeListFile...SUCCESS Run PageRank...SUCCESS Connected Components tests Kaggle social circles competition...SUCCESS Minimum Spanning Tree Algorithm Run basic Minimum Spanning Tree algorithm...SUCCESS Run Minimum Spanning Tree taxonomy creation...SUCCESS -- View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/VOTE-Release-Apache-Spark-1-6-0-RC1-tp15424p15449.html Sent from the Apache Spark Developers List mailing list archive at Nabble.com. - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org
Re: [VOTE] Release Apache Spark 1.5.1 (RC1)
+1 build/mvn clean package -DskipTests -Pyarn -Phadoop-2.6 OK Basic graph tests Load graph using edgeListFile...SUCCESS Run PageRank...SUCCESS Minimum Spanning Tree Algorithm Run basic Minimum Spanning Tree algorithm...SUCCESS Run Minimum Spanning Tree taxonomy creation...SUCCESS -- View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/VOTE-Release-Apache-Spark-1-5-1-RC1-tp14310p14380.html Sent from the Apache Spark Developers List mailing list archive at Nabble.com. - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org
Re: RDD API patterns
I'm not sure the problem is quite as bad as you state. Both sampleByKey and sampleByKeyExact are implemented using a function from StratifiedSamplingUtils which does one of two things depending on whether the exact implementation is needed. The exact version requires double the number of lines of code (17) than the non-exact and has to do extra passes over the data to get, for example, the counts per key. As far as I can see your problem 2 and sampleByKeyExact are very similar and could be solved the same way. It has been decided that sampleByKeyExact is a widely useful function and so is provided out of the box as part of the PairRDD API. I don't see any reason why your problem 2 couldn't be provided in the same way as part of the API if there was the demand for it. An alternative design would perhaps be something like an extension to PairRDD, let's call it TwoPassPairRDD, where certain information for the key could be provided along with an Iterable e.g. the counts for the key. Both sampleByKeyExact and your problem 2 could be implemented in a few less lines of code. -- View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/RDD-API-patterns-tp14116p14148.html Sent from the Apache Spark Developers List mailing list archive at Nabble.com. - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org
Re: [HELP] Spark 1.4.1 tasks take ridiculously long time to complete
I would suggest you move this to the Spark User list, this is the development list for discussion on development of Spark. It would help if you could give some more information about what you are trying to do e.g. what code you are running, how you submitted the job (spark-shell, spark-submit) and what sort of cluster (standalone, Yarn, Mesos) -- View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/HELP-Spark-1-4-1-tasks-take-ridiculously-long-time-to-complete-tp13942p13946.html Sent from the Apache Spark Developers List mailing list archive at Nabble.com. - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org