I think takeSample itself runs multiple jobs if the amount of samples
collected in the first pass is not enough. The comment and code path
at 
https://github.com/apache/spark/blob/412b0e8969215411b97efd3d0984dc6cac5d31e0/core/src/main/scala/org/apache/spark/rdd/RDD.scala#L508
should explain when this happens. Also you can confirm this by
checking if the logWarning shows up in your logs.

Thanks
Shivaram

On Tue, Aug 30, 2016 at 9:50 AM, Georgios Samaras
<georgesamaras...@gmail.com> wrote:
>
> ---------- Forwarded message ----------
> From: Georgios Samaras <georgesamaras...@gmail.com>
> Date: Tue, Aug 30, 2016 at 9:49 AM
> Subject: Re: KMeans calls takeSample() twice?
> To: "Sean Owen [via Apache Spark Developers List]"
> <ml-node+s1001551n18788...@n3.nabble.com>
>
>
> I am not sure what you want me to check. Note that I see two takeSample()s
> being invoked every single time I execute KMeans(). In a current job I have,
> I did view the details and updated the:
>
> StackOverflow question.
>
>
>
> On Tue, Aug 30, 2016 at 9:25 AM, Sean Owen [via Apache Spark Developers
> List] <ml-node+s1001551n18788...@n3.nabble.com> wrote:
>>
>> I'm not sure it's a UI bug; it really does record two different
>> stages, the second of which executes quickly. I am not sure why that
>> would happen off the top of my head. I don't see anything that failed
>> here.
>>
>> Digging into those two stages and what they executed might give a clue
>> to what's really going on there.
>>
>> On Tue, Aug 30, 2016 at 5:18 PM, gsamaras <[hidden email]> wrote:
>> > Yanbo thank you for your reply. So you are saying that this is a bug in
>> > the
>> > Spark UI in general, and not in the local Spark UI of our cluster, where
>> > I
>> > work, right?
>> >
>> > George
>>
>> ---------------------------------------------------------------------
>> To unsubscribe e-mail: [hidden email]
>>
>>
>>
>> ________________________________
>> If you reply to this email, your message will be added to the discussion
>> below:
>>
>> http://apache-spark-developers-list.1001551.n3.nabble.com/KMeans-calls-takeSample-twice-tp18761p18788.html
>> To unsubscribe from KMeans calls takeSample() twice?, click here.
>> NAML
>
>
>

---------------------------------------------------------------------
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org

Reply via email to