Re: Java api overhead?

Koert Kuipers Wed, 29 Oct 2014 07:16:02 -0700

since spark holds data structures on heap (and by default tries to work
with all data in memory) and its written in Scala seeing lots of scala
Tuple2 is not unexpected. how do these numbers relate to your data size?
On Oct 27, 2014 2:26 PM, "Sonal Goyal" <sonalgoy...@gmail.com> wrote:


> Hi,
>
> I wanted to understand what kind of memory overheads are expected if at
> all while using the Java API. My application seems to have a lot of live
> Tuple2 instances and I am hitting a lot of gc so I am wondering if I am
> doing something fundamentally wrong. Here is what the top of my heap looks
> like. I actually create reifier.tuple.Tuple objects and pass them to map
> methods and mostly return Tuple2<Tuple,Tuple>. The heap seems to have far
> too many Tuple2 and $colon$colon.
>
>
> num     #instances         #bytes  class name
> ----------------------------------------------
>    1:      85414872     2049956928
> scala.collection.immutable.$colon$colon
>    2:      85414852     2049956448  scala.Tuple2
>    3:        304221       14765832  [C
>    4:        302923        7270152  java.lang.String
>    5:         44111        2624624  [Ljava.lang.Object;
>    6:          1210        1495256  [B
>    7:         39839         956136  java.util.ArrayList
>    8:            29         950736
> [Lscala.concurrent.forkjoin.ForkJoinTask;
>    9:          8129         827792  java.lang.Class
>   10:         33839         812136  java.lang.Long
>   11:         33400         801600  reifier.tuple.Tuple
>   12:          6116         538208  java.lang.reflect.Method
>   13:         12767         408544
> java.util.concurrent.ConcurrentHashMap$Node
>   14:          5994         383616  org.apache.spark.scheduler.ResultTask
>   15:         10298         329536  java.util.HashMap$Node
>   16:         11988         287712
> org.apache.spark.rdd.NarrowCoGroupSplitDep
>   17:          5708         228320  reifier.block.Canopy
>   18:             9         215784  [Lscala.collection.Seq;
>   19:         12078         193248  java.lang.Integer
>   20:         12019         192304  java.lang.Object
>   21:          5708         182656  reifier.block.Tree
>   22:          2776         173152  [I
>   23:          6013         144312  scala.collection.mutable.ArrayBuffer
>   24:          5994         143856  [Lorg.apache.spark.rdd.CoGroupSplitDep;
>   25:          5994         143856  org.apache.spark.rdd.CoGroupPartition
>   26:          5994         143856
> org.apache.spark.rdd.ShuffledRDDPartition
>   27:          4486         143552  java.util.Hashtable$Entry
>   28:          6284         132800  [Ljava.lang.Class;
>   29:          1819         130968  java.lang.reflect.Field
>   30:           605         101208  [Ljava.util.HashMap$Node;
>
>
>
> Best Regards,
> Sonal
> Nube Technologies <http://www.nubetech.co>
>
> <http://in.linkedin.com/in/sonalgoyal>
>
>
>

Re: Java api overhead?

Reply via email to