Hello, I'm running JavaRDD.count() repeteadly on a small RDD, and it seems to increase the size of the Java heap over time until the default limit is reached and an OutOfMemoryException is thrown. I'd expect this program to run in constant space, and the problem carries over to some more complicated tests I need to get working.
My spark version is 2.1.0 and I'm running this using nix in debian jessie. Is there anything elemental that I could do to keep memory bounded? I'm copying the program below and an example of the output. Thanks in advance, Facundo /* Leak.java */ import java.util.*; import java.nio.charset.StandardCharsets; import java.nio.file.Files; import java.nio.file.Paths; import java.io.IOException; import java.io.Serializable; import org.apache.spark.api.java.*; import org.apache.spark.SparkConf; import org.apache.spark.api.java.function.*; import org.apache.spark.sql.*; public class Leak { public static void main(String[] args) throws IOException { SparkConf conf = new SparkConf().setAppName("Leak"); JavaSparkContext sc = new JavaSparkContext(conf); SQLContext sqlc = new SQLContext(sc); for(int i=0;i<50;i++) { System.gc(); long mem = Runtime.getRuntime().totalMemory(); System.out.println("java total memory: " + mem); for(String s : Files.readAllLines(Paths.get("/proc/self/status"), StandardCharsets.UTF_8)) { if (0 <= s.indexOf("VmRSS")) System.out.println(s); } for(int j=0;j<2999;j++) { JavaRDD<Double> rdd = sc.parallelize(Arrays.asList(1.0,2.0,3.0)); rdd.count(); } } sc.stop(); } } # example output $ spark-submit --master local[1] --class Leak leak/build/libs/leak.jar 17/03/08 11:26:37 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 17/03/08 11:26:37 WARN Utils: Your hostname, fd-tweag resolves to a loopback address: 127.0.0.1; using 192.168.1.42 instead (on interface wlan0) 17/03/08 11:26:37 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to another address java total memory: 211288064 VmRSS: 200488 kB java total memory: 456654848 VmRSS: 656472 kB java total memory: 562036736 VmRSS: 677156 kB java total memory: 562561024 VmRSS: 689424 kB java total memory: 562561024 VmRSS: 701760 kB java total memory: 562561024 VmRSS: 732540 kB java total memory: 562561024 VmRSS: 748468 kB java total memory: 562036736 VmRSS: 770680 kB java total memory: 705691648 VmRSS: 789632 kB java total memory: 706740224 VmRSS: 802720 kB java total memory: 704118784 VmRSS: 832740 kB java total memory: 705691648 VmRSS: 850808 kB java total memory: 704118784 VmRSS: 875232 kB java total memory: 705691648 VmRSS: 898716 kB java total memory: 701497344 VmRSS: 919388 kB java total memory: 905445376 VmRSS: 942628 kB java total memory: 904921088 VmRSS: 989176 kB java total memory: 901251072 VmRSS: 999540 kB java total memory: 902823936 VmRSS: 1027212 kB java total memory: 903348224 VmRSS: 1057668 kB java total memory: 902299648 VmRSS: 1070976 kB java total memory: 904396800 VmRSS: 1094640 kB java total memory: 897056768 VmRSS: 1114612 kB java total memory: 903872512 VmRSS: 1142324 kB java total memory: 1050148864 VmRSS: 1147836 kB java total memory: 1061158912 VmRSS: 1183668 kB java total memory: 1052246016 VmRSS: 1211496 kB java total memory: 1058013184 VmRSS: 1230696 kB java total memory: 1059061760 VmRSS: 1259428 kB java total memory: 1060634624 VmRSS: 1284252 kB java total memory: 1055916032 VmRSS: 1319460 kB java total memory: 1052246016 VmRSS: 1323044 kB java total memory: 1052246016 VmRSS: 1323572 kB java total memory: 1052246016 VmRSS: 1323836 kB java total memory: 1052246016 VmRSS: 1323836 kB java total memory: 1052246016 VmRSS: 1324096 kB java total memory: 1052246016 VmRSS: 1324096 kB java total memory: 1052246016 VmRSS: 1324096 kB ... --------------------------------------------------------------------- To unsubscribe e-mail: user-unsubscr...@spark.apache.org