Re: [VOTE] Release Apache Spark 1.6.0 (RC1)

2015-12-03 Thread robineast
+1

OSX 10.10.5, java version "1.8.0_40", scala 2.10

mvn clean package -DskipTests

[INFO] Spark Project External Kafka ... SUCCESS [ 18.161
s]
[INFO] Spark Project Examples . SUCCESS [01:18
min]
[INFO] Spark Project External Kafka Assembly .. SUCCESS [  5.724
s]
[INFO]

[INFO] BUILD SUCCESS
[INFO]

[INFO] Total time: 14:59 min
[INFO] Finished at: 2015-12-03T09:46:38+00:00
[INFO] Final Memory: 105M/2668M
[INFO]


Basic graph tests
  Load graph using edgeListFile...SUCCESS
  Run PageRank...SUCCESS
Connected Components tests
  Kaggle social circles competition...SUCCESS
Minimum Spanning Tree Algorithm
  Run basic Minimum Spanning Tree algorithm...SUCCESS
  Run Minimum Spanning Tree taxonomy creation...SUCCESS



--
View this message in context: 
http://apache-spark-developers-list.1001551.n3.nabble.com/VOTE-Release-Apache-Spark-1-6-0-RC1-tp15424p15449.html
Sent from the Apache Spark Developers List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org



Re: [VOTE] Release Apache Spark 1.5.1 (RC1)

2015-09-26 Thread robineast
+1


build/mvn clean package -DskipTests -Pyarn -Phadoop-2.6
OK
Basic graph tests
  Load graph using edgeListFile...SUCCESS
  Run PageRank...SUCCESS
Minimum Spanning Tree Algorithm
  Run basic Minimum Spanning Tree algorithm...SUCCESS
  Run Minimum Spanning Tree taxonomy creation...SUCCESS



--
View this message in context: 
http://apache-spark-developers-list.1001551.n3.nabble.com/VOTE-Release-Apache-Spark-1-5-1-RC1-tp14310p14380.html
Sent from the Apache Spark Developers List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org



Re: RDD API patterns

2015-09-16 Thread robineast
I'm not sure the problem is quite as bad as you state. Both sampleByKey and
sampleByKeyExact are implemented using a function from
StratifiedSamplingUtils which does one of two things depending on whether
the exact implementation is needed. The exact version requires double the
number of lines of code (17) than the non-exact and has to do extra passes
over the data to get, for example, the counts per key.

As far as I can see your problem 2 and sampleByKeyExact are very similar and
could be solved the same way. It has been decided that sampleByKeyExact is a
widely useful function and so is provided out of the box as part of the
PairRDD API. I don't see any reason why your problem 2 couldn't be provided
in the same way as part of the API if there was the demand for it. 

An alternative design would perhaps be something like an extension to
PairRDD, let's call it TwoPassPairRDD, where certain information for the key
could be provided along with an Iterable e.g. the counts for the key. Both
sampleByKeyExact and your problem 2 could be implemented in a few less lines
of code.



--
View this message in context: 
http://apache-spark-developers-list.1001551.n3.nabble.com/RDD-API-patterns-tp14116p14148.html
Sent from the Apache Spark Developers List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org



Re: [HELP] Spark 1.4.1 tasks take ridiculously long time to complete

2015-09-03 Thread robineast
I would suggest you move this to the Spark User list, this is the development
list for discussion on development of Spark. It would help if you could give
some more information about what you are trying to do e.g. what code you are
running, how you submitted the job (spark-shell, spark-submit) and what sort
of cluster (standalone, Yarn, Mesos)





--
View this message in context: 
http://apache-spark-developers-list.1001551.n3.nabble.com/HELP-Spark-1-4-1-tasks-take-ridiculously-long-time-to-complete-tp13942p13946.html
Sent from the Apache Spark Developers List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org