Good catch Shivaram. However, the very next line states: // this shouldn't happen often because we use a big multiplier for the initial size
which makes me wondering if that is the case, really, since I am experimenting heavily right now and I launched 30~40 jobs, and from a glance on them I can see takeSample() being called twice! George On Tue, Aug 30, 2016 at 10:20 AM, Shivaram Venkataraman < shiva...@eecs.berkeley.edu> wrote: > I think takeSample itself runs multiple jobs if the amount of samples > collected in the first pass is not enough. The comment and code path > at https://github.com/apache/spark/blob/412b0e8969215411b97efd3d0984dc > 6cac5d31e0/core/src/main/scala/org/apache/spark/rdd/RDD.scala#L508 > should explain when this happens. Also you can confirm this by > checking if the logWarning shows up in your logs. > > Thanks > Shivaram > > On Tue, Aug 30, 2016 at 9:50 AM, Georgios Samaras > <georgesamaras...@gmail.com> wrote: > > > > ---------- Forwarded message ---------- > > From: Georgios Samaras <georgesamaras...@gmail.com> > > Date: Tue, Aug 30, 2016 at 9:49 AM > > Subject: Re: KMeans calls takeSample() twice? > > To: "Sean Owen [via Apache Spark Developers List]" > > <ml-node+s1001551n18788...@n3.nabble.com> > > > > > > I am not sure what you want me to check. Note that I see two > takeSample()s > > being invoked every single time I execute KMeans(). In a current job I > have, > > I did view the details and updated the: > > > > StackOverflow question. > > > > > > > > On Tue, Aug 30, 2016 at 9:25 AM, Sean Owen [via Apache Spark Developers > > List] <ml-node+s1001551n18788...@n3.nabble.com> wrote: > >> > >> I'm not sure it's a UI bug; it really does record two different > >> stages, the second of which executes quickly. I am not sure why that > >> would happen off the top of my head. I don't see anything that failed > >> here. > >> > >> Digging into those two stages and what they executed might give a clue > >> to what's really going on there. > >> > >> On Tue, Aug 30, 2016 at 5:18 PM, gsamaras <[hidden email]> wrote: > >> > Yanbo thank you for your reply. So you are saying that this is a bug > in > >> > the > >> > Spark UI in general, and not in the local Spark UI of our cluster, > where > >> > I > >> > work, right? > >> > > >> > George > >> > >> --------------------------------------------------------------------- > >> To unsubscribe e-mail: [hidden email] > >> > >> > >> > >> ________________________________ > >> If you reply to this email, your message will be added to the discussion > >> below: > >> > >> http://apache-spark-developers-list.1001551.n3.nabble.com/KMeans-calls- > takeSample-twice-tp18761p18788.html > >> To unsubscribe from KMeans calls takeSample() twice?, click here. > >> NAML > > > > > > >