Re: KMeans calls takeSample() twice?

2016-08-31 Thread Yanbo Liang
I added println at the start of function takeSample, and found it was
printed only once for each run of KMeans.

Thanks
Yanbo

On Tue, Aug 30, 2016 at 10:31 AM, Georgios Samaras <
georgesamaras...@gmail.com> wrote:

> Good catch Shivaram. However, the very next line states:
>
> // this shouldn't happen often because we use a big multiplier for the
> initial size
>
> which makes me wondering if that is the case, really, since I am
> experimenting heavily right now and I launched 30~40 jobs, and from a
> glance on them I can see takeSample() being called twice!
>
> George
>
>
> On Tue, Aug 30, 2016 at 10:20 AM, Shivaram Venkataraman <
> shiva...@eecs.berkeley.edu> wrote:
>
>> I think takeSample itself runs multiple jobs if the amount of samples
>> collected in the first pass is not enough. The comment and code path
>> at https://github.com/apache/spark/blob/412b0e8969215411b97efd3
>> d0984dc6cac5d31e0/core/src/main/scala/org/apache/spark/rdd/RDD.scala#L508
>> should explain when this happens. Also you can confirm this by
>> checking if the logWarning shows up in your logs.
>>
>> Thanks
>> Shivaram
>>
>> On Tue, Aug 30, 2016 at 9:50 AM, Georgios Samaras
>> <georgesamaras...@gmail.com> wrote:
>> >
>> > ------ Forwarded message --
>> > From: Georgios Samaras <georgesamaras...@gmail.com>
>> > Date: Tue, Aug 30, 2016 at 9:49 AM
>> > Subject: Re: KMeans calls takeSample() twice?
>> > To: "Sean Owen [via Apache Spark Developers List]"
>> > <ml-node+s1001551n18788...@n3.nabble.com>
>> >
>> >
>> > I am not sure what you want me to check. Note that I see two
>> takeSample()s
>> > being invoked every single time I execute KMeans(). In a current job I
>> have,
>> > I did view the details and updated the:
>> >
>> > StackOverflow question.
>> >
>> >
>> >
>> > On Tue, Aug 30, 2016 at 9:25 AM, Sean Owen [via Apache Spark Developers
>> > List] <ml-node+s1001551n18788...@n3.nabble.com> wrote:
>> >>
>> >> I'm not sure it's a UI bug; it really does record two different
>> >> stages, the second of which executes quickly. I am not sure why that
>> >> would happen off the top of my head. I don't see anything that failed
>> >> here.
>> >>
>> >> Digging into those two stages and what they executed might give a clue
>> >> to what's really going on there.
>> >>
>> >> On Tue, Aug 30, 2016 at 5:18 PM, gsamaras <[hidden email]> wrote:
>> >> > Yanbo thank you for your reply. So you are saying that this is a bug
>> in
>> >> > the
>> >> > Spark UI in general, and not in the local Spark UI of our cluster,
>> where
>> >> > I
>> >> > work, right?
>> >> >
>> >> > George
>> >>
>> >> -
>> >> To unsubscribe e-mail: [hidden email]
>> >>
>> >>
>> >>
>> >> 
>> >> If you reply to this email, your message will be added to the
>> discussion
>> >> below:
>> >>
>> >> http://apache-spark-developers-list.1001551.n3.nabble.com/
>> KMeans-calls-takeSample-twice-tp18761p18788.html
>> >> To unsubscribe from KMeans calls takeSample() twice?, click here.
>> >> NAML
>> >
>> >
>> >
>>
>
>


Re: KMeans calls takeSample() twice?

2016-08-30 Thread Georgios Samaras
Good catch Shivaram. However, the very next line states:

// this shouldn't happen often because we use a big multiplier for the
initial size

which makes me wondering if that is the case, really, since I am
experimenting heavily right now and I launched 30~40 jobs, and from a
glance on them I can see takeSample() being called twice!

George


On Tue, Aug 30, 2016 at 10:20 AM, Shivaram Venkataraman <
shiva...@eecs.berkeley.edu> wrote:

> I think takeSample itself runs multiple jobs if the amount of samples
> collected in the first pass is not enough. The comment and code path
> at https://github.com/apache/spark/blob/412b0e8969215411b97efd3d0984dc
> 6cac5d31e0/core/src/main/scala/org/apache/spark/rdd/RDD.scala#L508
> should explain when this happens. Also you can confirm this by
> checking if the logWarning shows up in your logs.
>
> Thanks
> Shivaram
>
> On Tue, Aug 30, 2016 at 9:50 AM, Georgios Samaras
> <georgesamaras...@gmail.com> wrote:
> >
> > -- Forwarded message --
> > From: Georgios Samaras <georgesamaras...@gmail.com>
> > Date: Tue, Aug 30, 2016 at 9:49 AM
> > Subject: Re: KMeans calls takeSample() twice?
> > To: "Sean Owen [via Apache Spark Developers List]"
> > <ml-node+s1001551n18788...@n3.nabble.com>
> >
> >
> > I am not sure what you want me to check. Note that I see two
> takeSample()s
> > being invoked every single time I execute KMeans(). In a current job I
> have,
> > I did view the details and updated the:
> >
> > StackOverflow question.
> >
> >
> >
> > On Tue, Aug 30, 2016 at 9:25 AM, Sean Owen [via Apache Spark Developers
> > List] <ml-node+s1001551n18788...@n3.nabble.com> wrote:
> >>
> >> I'm not sure it's a UI bug; it really does record two different
> >> stages, the second of which executes quickly. I am not sure why that
> >> would happen off the top of my head. I don't see anything that failed
> >> here.
> >>
> >> Digging into those two stages and what they executed might give a clue
> >> to what's really going on there.
> >>
> >> On Tue, Aug 30, 2016 at 5:18 PM, gsamaras <[hidden email]> wrote:
> >> > Yanbo thank you for your reply. So you are saying that this is a bug
> in
> >> > the
> >> > Spark UI in general, and not in the local Spark UI of our cluster,
> where
> >> > I
> >> > work, right?
> >> >
> >> > George
> >>
> >> -
> >> To unsubscribe e-mail: [hidden email]
> >>
> >>
> >>
> >> 
> >> If you reply to this email, your message will be added to the discussion
> >> below:
> >>
> >> http://apache-spark-developers-list.1001551.n3.nabble.com/KMeans-calls-
> takeSample-twice-tp18761p18788.html
> >> To unsubscribe from KMeans calls takeSample() twice?, click here.
> >> NAML
> >
> >
> >
>


Re: KMeans calls takeSample() twice?

2016-08-30 Thread Shivaram Venkataraman
I think takeSample itself runs multiple jobs if the amount of samples
collected in the first pass is not enough. The comment and code path
at 
https://github.com/apache/spark/blob/412b0e8969215411b97efd3d0984dc6cac5d31e0/core/src/main/scala/org/apache/spark/rdd/RDD.scala#L508
should explain when this happens. Also you can confirm this by
checking if the logWarning shows up in your logs.

Thanks
Shivaram

On Tue, Aug 30, 2016 at 9:50 AM, Georgios Samaras
<georgesamaras...@gmail.com> wrote:
>
> -- Forwarded message --
> From: Georgios Samaras <georgesamaras...@gmail.com>
> Date: Tue, Aug 30, 2016 at 9:49 AM
> Subject: Re: KMeans calls takeSample() twice?
> To: "Sean Owen [via Apache Spark Developers List]"
> <ml-node+s1001551n18788...@n3.nabble.com>
>
>
> I am not sure what you want me to check. Note that I see two takeSample()s
> being invoked every single time I execute KMeans(). In a current job I have,
> I did view the details and updated the:
>
> StackOverflow question.
>
>
>
> On Tue, Aug 30, 2016 at 9:25 AM, Sean Owen [via Apache Spark Developers
> List] <ml-node+s1001551n18788...@n3.nabble.com> wrote:
>>
>> I'm not sure it's a UI bug; it really does record two different
>> stages, the second of which executes quickly. I am not sure why that
>> would happen off the top of my head. I don't see anything that failed
>> here.
>>
>> Digging into those two stages and what they executed might give a clue
>> to what's really going on there.
>>
>> On Tue, Aug 30, 2016 at 5:18 PM, gsamaras <[hidden email]> wrote:
>> > Yanbo thank you for your reply. So you are saying that this is a bug in
>> > the
>> > Spark UI in general, and not in the local Spark UI of our cluster, where
>> > I
>> > work, right?
>> >
>> > George
>>
>> -
>> To unsubscribe e-mail: [hidden email]
>>
>>
>>
>> 
>> If you reply to this email, your message will be added to the discussion
>> below:
>>
>> http://apache-spark-developers-list.1001551.n3.nabble.com/KMeans-calls-takeSample-twice-tp18761p18788.html
>> To unsubscribe from KMeans calls takeSample() twice?, click here.
>> NAML
>
>
>

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



Re: KMeans calls takeSample() twice?

2016-08-30 Thread gsamaras
I am not sure what you want me to check. Note that I see two takeSample()s
being invoked every single time I execute KMeans(). In a current job I
have, I did view the details and updated the:

StackOverflow question.




On Tue, Aug 30, 2016 at 9:25 AM, Sean Owen [via Apache Spark Developers
List]  wrote:

> I'm not sure it's a UI bug; it really does record two different
> stages, the second of which executes quickly. I am not sure why that
> would happen off the top of my head. I don't see anything that failed
> here.
>
> Digging into those two stages and what they executed might give a clue
> to what's really going on there.
>
> On Tue, Aug 30, 2016 at 5:18 PM, gsamaras <[hidden email]
> > wrote:
> > Yanbo thank you for your reply. So you are saying that this is a bug in
> the
> > Spark UI in general, and not in the local Spark UI of our cluster, where
> I
> > work, right?
> >
> > George
>
> -
> To unsubscribe e-mail: [hidden email]
> 
>
>
>
> --
> If you reply to this email, your message will be added to the discussion
> below:
> http://apache-spark-developers-list.1001551.n3.nabble.com/KMeans-calls-
> takeSample-twice-tp18761p18788.html
> To unsubscribe from KMeans calls takeSample() twice?, click here
> 
> .
> NAML
> 
>




--
View this message in context: 
http://apache-spark-developers-list.1001551.n3.nabble.com/KMeans-calls-takeSample-twice-tp18761p18789.html
Sent from the Apache Spark Developers List mailing list archive at Nabble.com.

Re: KMeans calls takeSample() twice?

2016-08-30 Thread gsamaras
Yanbo thank you for your reply. So you are saying that this is a bug in the
Spark UI in general, and not in the local Spark UI of our cluster, where I
work, right?

George

On Mon, Aug 29, 2016 at 11:55 PM, Yanbo Liang-2 [via Apache Spark
Developers List]  wrote:

> I run KMeans with probes and found that takeSample() was called only once
> actually. It looks like this issue was caused by mistake display at Spark
> UI.
>
> Thanks
> Yanbo
>
> On Mon, Aug 29, 2016 at 2:34 PM, gsamaras <[hidden email]
> > wrote:
>
>> After reading the internal code of Spark about it, I wasn't able to
>> understand why it calls takeSample() twice? Can someone please explain?
>>
>> There is a relevant  StackOverflow question
>> > calls-takesample-twice>
>> .
>>
>>
>>
>> --
>> View this message in context: http://apache-spark-developers
>> -list.1001551.n3.nabble.com/KMeans-calls-takeSample-twice-tp18761.html
>> Sent from the Apache Spark Developers List mailing list archive at
>> Nabble.com.
>>
>> -
>> To unsubscribe e-mail: [hidden email]
>> 
>>
>>
>
>
> --
> If you reply to this email, your message will be added to the discussion
> below:
> http://apache-spark-developers-list.1001551.n3.nabble.com/KMeans-calls-
> takeSample-twice-tp18761p18768.html
> To unsubscribe from KMeans calls takeSample() twice?, click here
> 
> .
> NAML
> 
>




--
View this message in context: 
http://apache-spark-developers-list.1001551.n3.nabble.com/KMeans-calls-takeSample-twice-tp18761p18786.html
Sent from the Apache Spark Developers List mailing list archive at Nabble.com.

Re: KMeans calls takeSample() twice?

2016-08-30 Thread Yanbo Liang
I run KMeans with probes and found that takeSample() was called only once
actually. It looks like this issue was caused by mistake display at Spark
UI.

Thanks
Yanbo

On Mon, Aug 29, 2016 at 2:34 PM, gsamaras 
wrote:

> After reading the internal code of Spark about it, I wasn't able to
> understand why it calls takeSample() twice? Can someone please explain?
>
> There is a relevant  StackOverflow question
>  twice>
> .
>
>
>
> --
> View this message in context: http://apache-spark-
> developers-list.1001551.n3.nabble.com/KMeans-calls-
> takeSample-twice-tp18761.html
> Sent from the Apache Spark Developers List mailing list archive at
> Nabble.com.
>
> -
> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>
>