Hi,
I am using spark 1.0.0. The bug is fixed by 1.0.1.
Hao
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/groupBy-gives-non-deterministic-results-tp13698p13864.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
Ah, thank you. I did not notice that.
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/groupBy-gives-non-deterministic-results-tp13698p13871.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
Thank you for your replies.
More details here:
The prog is executed on local mode (single node). Default env params are
used.
The test code and the result are in this gist:
https://gist.github.com/coderh/0147467f0b185462048c
Here is 10 first lines of the data: 3 fields each row, the delimiter
Hi,
I have a key-value RDD called rdd below. After a groupBy, I tried to count
rows.
But the result is not unique, somehow non deterministic.
Here is the test code:
val step1 = ligneReceipt_cleTable.persist
val step2 = step1.groupByKey
val s1size = step1.count
val s2size =
Update:
Just test with HashPartitioner(8) and count on each partition:
List((0,657824), (1,658549), (2,659199), (3,658684), (4,659394),
*(5,657591*), (*6,658327*), (*7,658434*)),
List((0,657824), (1,658549), (2,659199), (3,658684), (4,659394),
*(5,657594)*, (6,658326), (*7,658434*)),
Hi,
I have started a EC2 cluster using Spark by running spark-ec2 script.
Just a little confused, I can not find sbt/ directory under /spark.
I have checked spark-version, it's 1.0.0 (default). When I was working
0.9.x, sbt/ has been there.
Is the script changed in 1.0.X ? I can not find any
update:
Just checked the python launch script, when retrieving spark, it will refer
to this script:
https://github.com/mesos/spark-ec2/blob/v3/spark/init.sh
where each version number is mapped to a tar file,
0.9.2)
if [[ $HADOOP_MAJOR_VERSION == 1 ]]; then
wget
Thank you for your reply.
I need sbt for packaging my project and then submit it.
Could you tell me how to run a spark project on 1.0 AMI without sbt?
I don't understand why 1.0 only contains the prebuilt packages. I dont think
it makes sense, since sbt is essential.
User has to download sbt
Hi,
The real-world dataset is a bit more large, so I tested on the MovieLens
data set, and find the same results:
alpha
lambda
rank
top1
top5
EPR_in
EPR_out
40
0.001
50
297
559
0.05855
Thank you for your quick reply.
As far as I know, the update does not require negative observations, because
the update rule
Xu = (YtCuY + λI)^-1 Yt Cu P(u)
can be simplified by taking advantage of its algebraic structure, so
negative observations are not needed. This is what I think at the
10 matches
Mail list logo