Good catch Shivaram. However, the very next line states:

// this shouldn't happen often because we use a big multiplier for the
initial size

which makes me wondering if that is the case, really, since I am
experimenting heavily right now and I launched 30~40 jobs, and from a
glance on them I can see takeSample() being called twice!

George


On Tue, Aug 30, 2016 at 10:20 AM, Shivaram Venkataraman <
shiva...@eecs.berkeley.edu> wrote:

> I think takeSample itself runs multiple jobs if the amount of samples
> collected in the first pass is not enough. The comment and code path
> at https://github.com/apache/spark/blob/412b0e8969215411b97efd3d0984dc
> 6cac5d31e0/core/src/main/scala/org/apache/spark/rdd/RDD.scala#L508
> should explain when this happens. Also you can confirm this by
> checking if the logWarning shows up in your logs.
>
> Thanks
> Shivaram
>
> On Tue, Aug 30, 2016 at 9:50 AM, Georgios Samaras
> <georgesamaras...@gmail.com> wrote:
> >
> > ---------- Forwarded message ----------
> > From: Georgios Samaras <georgesamaras...@gmail.com>
> > Date: Tue, Aug 30, 2016 at 9:49 AM
> > Subject: Re: KMeans calls takeSample() twice?
> > To: "Sean Owen [via Apache Spark Developers List]"
> > <ml-node+s1001551n18788...@n3.nabble.com>
> >
> >
> > I am not sure what you want me to check. Note that I see two
> takeSample()s
> > being invoked every single time I execute KMeans(). In a current job I
> have,
> > I did view the details and updated the:
> >
> > StackOverflow question.
> >
> >
> >
> > On Tue, Aug 30, 2016 at 9:25 AM, Sean Owen [via Apache Spark Developers
> > List] <ml-node+s1001551n18788...@n3.nabble.com> wrote:
> >>
> >> I'm not sure it's a UI bug; it really does record two different
> >> stages, the second of which executes quickly. I am not sure why that
> >> would happen off the top of my head. I don't see anything that failed
> >> here.
> >>
> >> Digging into those two stages and what they executed might give a clue
> >> to what's really going on there.
> >>
> >> On Tue, Aug 30, 2016 at 5:18 PM, gsamaras <[hidden email]> wrote:
> >> > Yanbo thank you for your reply. So you are saying that this is a bug
> in
> >> > the
> >> > Spark UI in general, and not in the local Spark UI of our cluster,
> where
> >> > I
> >> > work, right?
> >> >
> >> > George
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe e-mail: [hidden email]
> >>
> >>
> >>
> >> ________________________________
> >> If you reply to this email, your message will be added to the discussion
> >> below:
> >>
> >> http://apache-spark-developers-list.1001551.n3.nabble.com/KMeans-calls-
> takeSample-twice-tp18761p18788.html
> >> To unsubscribe from KMeans calls takeSample() twice?, click here.
> >> NAML
> >
> >
> >
>

Reply via email to