since spark holds data structures on heap (and by default tries to work with all data in memory) and its written in Scala seeing lots of scala Tuple2 is not unexpected. how do these numbers relate to your data size? On Oct 27, 2014 2:26 PM, "Sonal Goyal" <sonalgoy...@gmail.com> wrote:
> Hi, > > I wanted to understand what kind of memory overheads are expected if at > all while using the Java API. My application seems to have a lot of live > Tuple2 instances and I am hitting a lot of gc so I am wondering if I am > doing something fundamentally wrong. Here is what the top of my heap looks > like. I actually create reifier.tuple.Tuple objects and pass them to map > methods and mostly return Tuple2<Tuple,Tuple>. The heap seems to have far > too many Tuple2 and $colon$colon. > > > num #instances #bytes class name > ---------------------------------------------- > 1: 85414872 2049956928 > scala.collection.immutable.$colon$colon > 2: 85414852 2049956448 scala.Tuple2 > 3: 304221 14765832 [C > 4: 302923 7270152 java.lang.String > 5: 44111 2624624 [Ljava.lang.Object; > 6: 1210 1495256 [B > 7: 39839 956136 java.util.ArrayList > 8: 29 950736 > [Lscala.concurrent.forkjoin.ForkJoinTask; > 9: 8129 827792 java.lang.Class > 10: 33839 812136 java.lang.Long > 11: 33400 801600 reifier.tuple.Tuple > 12: 6116 538208 java.lang.reflect.Method > 13: 12767 408544 > java.util.concurrent.ConcurrentHashMap$Node > 14: 5994 383616 org.apache.spark.scheduler.ResultTask > 15: 10298 329536 java.util.HashMap$Node > 16: 11988 287712 > org.apache.spark.rdd.NarrowCoGroupSplitDep > 17: 5708 228320 reifier.block.Canopy > 18: 9 215784 [Lscala.collection.Seq; > 19: 12078 193248 java.lang.Integer > 20: 12019 192304 java.lang.Object > 21: 5708 182656 reifier.block.Tree > 22: 2776 173152 [I > 23: 6013 144312 scala.collection.mutable.ArrayBuffer > 24: 5994 143856 [Lorg.apache.spark.rdd.CoGroupSplitDep; > 25: 5994 143856 org.apache.spark.rdd.CoGroupPartition > 26: 5994 143856 > org.apache.spark.rdd.ShuffledRDDPartition > 27: 4486 143552 java.util.Hashtable$Entry > 28: 6284 132800 [Ljava.lang.Class; > 29: 1819 130968 java.lang.reflect.Field > 30: 605 101208 [Ljava.util.HashMap$Node; > > > > Best Regards, > Sonal > Nube Technologies <http://www.nubetech.co> > > <http://in.linkedin.com/in/sonalgoyal> > > >