I added println at the start of function takeSample, and found it was
printed only once for each run of KMeans.

Thanks
Yanbo

On Tue, Aug 30, 2016 at 10:31 AM, Georgios Samaras <
georgesamaras...@gmail.com> wrote:

> Good catch Shivaram. However, the very next line states:
>
> // this shouldn't happen often because we use a big multiplier for the
> initial size
>
> which makes me wondering if that is the case, really, since I am
> experimenting heavily right now and I launched 30~40 jobs, and from a
> glance on them I can see takeSample() being called twice!
>
> George
>
>
> On Tue, Aug 30, 2016 at 10:20 AM, Shivaram Venkataraman <
> shiva...@eecs.berkeley.edu> wrote:
>
>> I think takeSample itself runs multiple jobs if the amount of samples
>> collected in the first pass is not enough. The comment and code path
>> at https://github.com/apache/spark/blob/412b0e8969215411b97efd3
>> d0984dc6cac5d31e0/core/src/main/scala/org/apache/spark/rdd/RDD.scala#L508
>> should explain when this happens. Also you can confirm this by
>> checking if the logWarning shows up in your logs.
>>
>> Thanks
>> Shivaram
>>
>> On Tue, Aug 30, 2016 at 9:50 AM, Georgios Samaras
>> <georgesamaras...@gmail.com> wrote:
>> >
>> > ---------- Forwarded message ----------
>> > From: Georgios Samaras <georgesamaras...@gmail.com>
>> > Date: Tue, Aug 30, 2016 at 9:49 AM
>> > Subject: Re: KMeans calls takeSample() twice?
>> > To: "Sean Owen [via Apache Spark Developers List]"
>> > <ml-node+s1001551n18788...@n3.nabble.com>
>> >
>> >
>> > I am not sure what you want me to check. Note that I see two
>> takeSample()s
>> > being invoked every single time I execute KMeans(). In a current job I
>> have,
>> > I did view the details and updated the:
>> >
>> > StackOverflow question.
>> >
>> >
>> >
>> > On Tue, Aug 30, 2016 at 9:25 AM, Sean Owen [via Apache Spark Developers
>> > List] <ml-node+s1001551n18788...@n3.nabble.com> wrote:
>> >>
>> >> I'm not sure it's a UI bug; it really does record two different
>> >> stages, the second of which executes quickly. I am not sure why that
>> >> would happen off the top of my head. I don't see anything that failed
>> >> here.
>> >>
>> >> Digging into those two stages and what they executed might give a clue
>> >> to what's really going on there.
>> >>
>> >> On Tue, Aug 30, 2016 at 5:18 PM, gsamaras <[hidden email]> wrote:
>> >> > Yanbo thank you for your reply. So you are saying that this is a bug
>> in
>> >> > the
>> >> > Spark UI in general, and not in the local Spark UI of our cluster,
>> where
>> >> > I
>> >> > work, right?
>> >> >
>> >> > George
>> >>
>> >> ---------------------------------------------------------------------
>> >> To unsubscribe e-mail: [hidden email]
>> >>
>> >>
>> >>
>> >> ________________________________
>> >> If you reply to this email, your message will be added to the
>> discussion
>> >> below:
>> >>
>> >> http://apache-spark-developers-list.1001551.n3.nabble.com/
>> KMeans-calls-takeSample-twice-tp18761p18788.html
>> >> To unsubscribe from KMeans calls takeSample() twice?, click here.
>> >> NAML
>> >
>> >
>> >
>>
>
>

Reply via email to