Did you cache the data? Was it fully cached? The k-means implementation doesn't create many temporary objects. I guess you need more RAM to avoid GC triggered frequently. Please monitor the memory usage using YourKit or VisualVM. -Xiangrui
On Wed, Feb 11, 2015 at 1:35 AM, lihu <lihu...@gmail.com> wrote: > I just want to make the best use of CPU, and test the performance of spark > if there is a lot of task in a single node. > > On Wed, Feb 11, 2015 at 5:29 PM, Sean Owen <so...@cloudera.com> wrote: >> >> Good, worth double-checking that's what you got. That's barely 1GB per >> task though. Why run 48 if you have 24 cores? >> >> On Wed, Feb 11, 2015 at 9:03 AM, lihu <lihu...@gmail.com> wrote: >> > I give 50GB to the executor, so it seem that there is no reason the >> > memory >> > is not enough. >> > >> > On Wed, Feb 11, 2015 at 4:50 PM, Sean Owen <so...@cloudera.com> wrote: >> >> >> >> Meaning, you have 128GB per machine but how much memory are you giving >> >> the executors? >> >> >> >> On Wed, Feb 11, 2015 at 8:49 AM, lihu <lihu...@gmail.com> wrote: >> >> > What do you mean? Yes,I an see there is some data put in the memory >> >> > from >> >> > the web ui. >> >> > >> >> > On Wed, Feb 11, 2015 at 4:25 PM, Sean Owen <so...@cloudera.com> >> >> > wrote: >> >> >> >> >> >> Are you actually using that memory for executors? >> >> >> >> >> >> On Wed, Feb 11, 2015 at 8:17 AM, lihu <lihu...@gmail.com> wrote: >> >> >> > Hi, >> >> >> > I run the kmeans(MLlib) in a cluster with 12 workers. Every >> >> >> > work >> >> >> > own a >> >> >> > 128G RAM, 24Core. I run 48 task in one machine. the total data is >> >> >> > just >> >> >> > 40GB. >> >> >> > >> >> >> > When the dimension of the data set is about 10^7, for every >> >> >> > task >> >> >> > the >> >> >> > duration is about 30s, but the cost for GC is about 20s. >> >> >> > >> >> >> > When I reduce the dimension to 10^4, then the gc is small. >> >> >> > >> >> >> > So why gc is so high when the dimension is larger? or this is >> >> >> > the >> >> >> > reason >> >> >> > caused by MLlib? >> >> >> > >> >> >> > >> >> >> > >> >> >> > >> >> > >> >> > >> >> > >> >> > >> >> > -- >> >> > Best Wishes! >> >> > >> >> > Li Hu(李浒) | Graduate Student >> >> > Institute for Interdisciplinary Information Sciences(IIIS) >> >> > Tsinghua University, China >> >> > >> >> > Email: lihu...@gmail.com >> >> > Homepage: http://iiis.tsinghua.edu.cn/zh/lihu/ >> >> > >> >> > >> > >> > >> > >> > >> > -- >> > Best Wishes! >> > >> > Li Hu(李浒) | Graduate Student >> > Institute for Interdisciplinary Information Sciences(IIIS) >> > Tsinghua University, China >> > >> > Email: lihu...@gmail.com >> > Homepage: http://iiis.tsinghua.edu.cn/zh/lihu/ >> > >> > > > > > > -- > Best Wishes! > > Li Hu(李浒) | Graduate Student > Institute for Interdisciplinary Information Sciences(IIIS) > Tsinghua University, China > > Email: lihu...@gmail.com > Homepage: http://iiis.tsinghua.edu.cn/zh/lihu/ > > --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org