Did you cache the data? Was it fully cached? The k-means
implementation doesn't create many temporary objects. I guess you need
more RAM to avoid GC triggered frequently. Please monitor the memory
usage using YourKit or VisualVM. -Xiangrui

On Wed, Feb 11, 2015 at 1:35 AM, lihu <lihu...@gmail.com> wrote:
> I just want to make the best use of CPU,  and test the performance of spark
> if there is a lot of task in a single node.
>
> On Wed, Feb 11, 2015 at 5:29 PM, Sean Owen <so...@cloudera.com> wrote:
>>
>> Good, worth double-checking that's what you got. That's barely 1GB per
>> task though. Why run 48 if you have 24 cores?
>>
>> On Wed, Feb 11, 2015 at 9:03 AM, lihu <lihu...@gmail.com> wrote:
>> > I give 50GB to the executor,  so it seem that  there is no reason the
>> > memory
>> > is not enough.
>> >
>> > On Wed, Feb 11, 2015 at 4:50 PM, Sean Owen <so...@cloudera.com> wrote:
>> >>
>> >> Meaning, you have 128GB per machine but how much memory are you giving
>> >> the executors?
>> >>
>> >> On Wed, Feb 11, 2015 at 8:49 AM, lihu <lihu...@gmail.com> wrote:
>> >> > What do you mean?  Yes,I an see there  is some data put in the memory
>> >> > from
>> >> > the web ui.
>> >> >
>> >> > On Wed, Feb 11, 2015 at 4:25 PM, Sean Owen <so...@cloudera.com>
>> >> > wrote:
>> >> >>
>> >> >> Are you actually using that memory for executors?
>> >> >>
>> >> >> On Wed, Feb 11, 2015 at 8:17 AM, lihu <lihu...@gmail.com> wrote:
>> >> >> > Hi,
>> >> >> >     I  run the kmeans(MLlib) in a cluster with 12 workers.  Every
>> >> >> > work
>> >> >> > own a
>> >> >> > 128G RAM, 24Core. I run 48 task in one machine. the total data is
>> >> >> > just
>> >> >> > 40GB.
>> >> >> >
>> >> >> >    When the dimension of the data set is about 10^7, for every
>> >> >> > task
>> >> >> > the
>> >> >> > duration is about 30s, but the cost for GC is about 20s.
>> >> >> >
>> >> >> >    When I reduce the dimension to 10^4, then the gc is small.
>> >> >> >
>> >> >> >     So why gc is so high when the dimension is larger? or this is
>> >> >> > the
>> >> >> > reason
>> >> >> > caused by MLlib?
>> >> >> >
>> >> >> >
>> >> >> >
>> >> >> >
>> >> >
>> >> >
>> >> >
>> >> >
>> >> > --
>> >> > Best Wishes!
>> >> >
>> >> > Li Hu(李浒) | Graduate Student
>> >> > Institute for Interdisciplinary Information Sciences(IIIS)
>> >> > Tsinghua University, China
>> >> >
>> >> > Email: lihu...@gmail.com
>> >> > Homepage: http://iiis.tsinghua.edu.cn/zh/lihu/
>> >> >
>> >> >
>> >
>> >
>> >
>> >
>> > --
>> > Best Wishes!
>> >
>> > Li Hu(李浒) | Graduate Student
>> > Institute for Interdisciplinary Information Sciences(IIIS)
>> > Tsinghua University, China
>> >
>> > Email: lihu...@gmail.com
>> > Homepage: http://iiis.tsinghua.edu.cn/zh/lihu/
>> >
>> >
>
>
>
>
> --
> Best Wishes!
>
> Li Hu(李浒) | Graduate Student
> Institute for Interdisciplinary Information Sciences(IIIS)
> Tsinghua University, China
>
> Email: lihu...@gmail.com
> Homepage: http://iiis.tsinghua.edu.cn/zh/lihu/
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Reply via email to