I think takeSample itself runs multiple jobs if the amount of samples collected in the first pass is not enough. The comment and code path at https://github.com/apache/spark/blob/412b0e8969215411b97efd3d0984dc6cac5d31e0/core/src/main/scala/org/apache/spark/rdd/RDD.scala#L508 should explain when this happens. Also you can confirm this by checking if the logWarning shows up in your logs.
Thanks Shivaram On Tue, Aug 30, 2016 at 9:50 AM, Georgios Samaras <georgesamaras...@gmail.com> wrote: > > ---------- Forwarded message ---------- > From: Georgios Samaras <georgesamaras...@gmail.com> > Date: Tue, Aug 30, 2016 at 9:49 AM > Subject: Re: KMeans calls takeSample() twice? > To: "Sean Owen [via Apache Spark Developers List]" > <ml-node+s1001551n18788...@n3.nabble.com> > > > I am not sure what you want me to check. Note that I see two takeSample()s > being invoked every single time I execute KMeans(). In a current job I have, > I did view the details and updated the: > > StackOverflow question. > > > > On Tue, Aug 30, 2016 at 9:25 AM, Sean Owen [via Apache Spark Developers > List] <ml-node+s1001551n18788...@n3.nabble.com> wrote: >> >> I'm not sure it's a UI bug; it really does record two different >> stages, the second of which executes quickly. I am not sure why that >> would happen off the top of my head. I don't see anything that failed >> here. >> >> Digging into those two stages and what they executed might give a clue >> to what's really going on there. >> >> On Tue, Aug 30, 2016 at 5:18 PM, gsamaras <[hidden email]> wrote: >> > Yanbo thank you for your reply. So you are saying that this is a bug in >> > the >> > Spark UI in general, and not in the local Spark UI of our cluster, where >> > I >> > work, right? >> > >> > George >> >> --------------------------------------------------------------------- >> To unsubscribe e-mail: [hidden email] >> >> >> >> ________________________________ >> If you reply to this email, your message will be added to the discussion >> below: >> >> http://apache-spark-developers-list.1001551.n3.nabble.com/KMeans-calls-takeSample-twice-tp18761p18788.html >> To unsubscribe from KMeans calls takeSample() twice?, click here. >> NAML > > > --------------------------------------------------------------------- To unsubscribe e-mail: dev-unsubscr...@spark.apache.org