Hi All, We have recently moved to Spark 1.1 from 0.9 for an application handling a fair number of very large datasets partitioned across multiple nodes. About half of each of these large datasets is stored in off heap byte arrays and about half in the standard Java heap.
While these datasets are being loaded from our custom HDFS 2.3 RDD and before we are using even a fraction of the available Java Heap and the native off heap memory the loading slows to an absolute crawl. It appears clear from our profiling of the Spark Executor that in the Spark SizeEstimator an extremely high cpu load is being demanded along with a fast and furious allocation of Object[] instances. We do not believe we were seeing this sort of behavior in 0.9 and we have noticed rather significant changes in this part of the BlockManager code going from 0.9 to 1.1 and beyond. A GC run gets rid of all of the Object[] instances. Before we start spending large amounts of time either switching back to 0.9 or further tracing to the root cause of this, I was wondering if anyone out there had enough experience with that part of the code (or had run into the same problem) and could help us understand what sort of root causes might lay behind this strange behavior and even better what we could do to resolve them. Any help would be very much appreciated. cheers, Erik