I added println at the start of function takeSample, and found it was printed only once for each run of KMeans.
Thanks Yanbo On Tue, Aug 30, 2016 at 10:31 AM, Georgios Samaras < georgesamaras...@gmail.com> wrote: > Good catch Shivaram. However, the very next line states: > > // this shouldn't happen often because we use a big multiplier for the > initial size > > which makes me wondering if that is the case, really, since I am > experimenting heavily right now and I launched 30~40 jobs, and from a > glance on them I can see takeSample() being called twice! > > George > > > On Tue, Aug 30, 2016 at 10:20 AM, Shivaram Venkataraman < > shiva...@eecs.berkeley.edu> wrote: > >> I think takeSample itself runs multiple jobs if the amount of samples >> collected in the first pass is not enough. The comment and code path >> at https://github.com/apache/spark/blob/412b0e8969215411b97efd3 >> d0984dc6cac5d31e0/core/src/main/scala/org/apache/spark/rdd/RDD.scala#L508 >> should explain when this happens. Also you can confirm this by >> checking if the logWarning shows up in your logs. >> >> Thanks >> Shivaram >> >> On Tue, Aug 30, 2016 at 9:50 AM, Georgios Samaras >> <georgesamaras...@gmail.com> wrote: >> > >> > ---------- Forwarded message ---------- >> > From: Georgios Samaras <georgesamaras...@gmail.com> >> > Date: Tue, Aug 30, 2016 at 9:49 AM >> > Subject: Re: KMeans calls takeSample() twice? >> > To: "Sean Owen [via Apache Spark Developers List]" >> > <ml-node+s1001551n18788...@n3.nabble.com> >> > >> > >> > I am not sure what you want me to check. Note that I see two >> takeSample()s >> > being invoked every single time I execute KMeans(). In a current job I >> have, >> > I did view the details and updated the: >> > >> > StackOverflow question. >> > >> > >> > >> > On Tue, Aug 30, 2016 at 9:25 AM, Sean Owen [via Apache Spark Developers >> > List] <ml-node+s1001551n18788...@n3.nabble.com> wrote: >> >> >> >> I'm not sure it's a UI bug; it really does record two different >> >> stages, the second of which executes quickly. I am not sure why that >> >> would happen off the top of my head. I don't see anything that failed >> >> here. >> >> >> >> Digging into those two stages and what they executed might give a clue >> >> to what's really going on there. >> >> >> >> On Tue, Aug 30, 2016 at 5:18 PM, gsamaras <[hidden email]> wrote: >> >> > Yanbo thank you for your reply. So you are saying that this is a bug >> in >> >> > the >> >> > Spark UI in general, and not in the local Spark UI of our cluster, >> where >> >> > I >> >> > work, right? >> >> > >> >> > George >> >> >> >> --------------------------------------------------------------------- >> >> To unsubscribe e-mail: [hidden email] >> >> >> >> >> >> >> >> ________________________________ >> >> If you reply to this email, your message will be added to the >> discussion >> >> below: >> >> >> >> http://apache-spark-developers-list.1001551.n3.nabble.com/ >> KMeans-calls-takeSample-twice-tp18761p18788.html >> >> To unsubscribe from KMeans calls takeSample() twice?, click here. >> >> NAML >> > >> > >> > >> > >