Hello,

I'm running JavaRDD.count() repeteadly on a small RDD, and it seems to
increase the size of the Java heap over time until the default limit
is reached and an OutOfMemoryException is thrown. I'd expect this
program to run in constant space, and the problem carries over to some
more complicated tests I need to get working.

My spark version is 2.1.0 and I'm running this using nix in debian jessie.

Is there anything elemental that I could do to keep memory bounded?

I'm copying the program below and an example of the output.

Thanks in advance,
Facundo

/* Leak.java */
import java.util.*;
import java.nio.charset.StandardCharsets;
import java.nio.file.Files;
import java.nio.file.Paths;
import java.io.IOException;
import java.io.Serializable;
import org.apache.spark.api.java.*;
import org.apache.spark.SparkConf;
import org.apache.spark.api.java.function.*;
import org.apache.spark.sql.*;

public class Leak {

  public static void main(String[] args) throws IOException {

    SparkConf conf = new SparkConf().setAppName("Leak");
    JavaSparkContext sc = new JavaSparkContext(conf);
    SQLContext sqlc = new SQLContext(sc);

    for(int i=0;i<50;i++) {
      System.gc();
      long mem = Runtime.getRuntime().totalMemory();
      System.out.println("java total memory: " + mem);
      for(String s :
Files.readAllLines(Paths.get("/proc/self/status"),
StandardCharsets.UTF_8)) {
          if (0 <= s.indexOf("VmRSS"))
            System.out.println(s);
      }
      for(int j=0;j<2999;j++) {
        JavaRDD<Double> rdd = sc.parallelize(Arrays.asList(1.0,2.0,3.0));
        rdd.count();
      }
    }
    sc.stop();
  }
}

# example output
$ spark-submit --master local[1] --class Leak leak/build/libs/leak.jar
17/03/08 11:26:37 WARN NativeCodeLoader: Unable to load native-hadoop
library for your platform... using builtin-java classes where
applicable
17/03/08 11:26:37 WARN Utils: Your hostname, fd-tweag resolves to a
loopback address: 127.0.0.1; using 192.168.1.42 instead (on interface
wlan0)
17/03/08 11:26:37 WARN Utils: Set SPARK_LOCAL_IP if you need to bind
to another address
java total memory: 211288064
VmRSS:  200488 kB
java total memory: 456654848
VmRSS:  656472 kB
java total memory: 562036736
VmRSS:  677156 kB
java total memory: 562561024
VmRSS:  689424 kB
java total memory: 562561024
VmRSS:  701760 kB
java total memory: 562561024
VmRSS:  732540 kB
java total memory: 562561024
VmRSS:  748468 kB
java total memory: 562036736
VmRSS:  770680 kB
java total memory: 705691648
VmRSS:  789632 kB
java total memory: 706740224
VmRSS:  802720 kB
java total memory: 704118784
VmRSS:  832740 kB
java total memory: 705691648
VmRSS:  850808 kB
java total memory: 704118784
VmRSS:  875232 kB
java total memory: 705691648
VmRSS:  898716 kB
java total memory: 701497344
VmRSS:  919388 kB
java total memory: 905445376
VmRSS:  942628 kB
java total memory: 904921088
VmRSS:  989176 kB
java total memory: 901251072
VmRSS:  999540 kB
java total memory: 902823936
VmRSS: 1027212 kB
java total memory: 903348224
VmRSS: 1057668 kB
java total memory: 902299648
VmRSS: 1070976 kB
java total memory: 904396800
VmRSS: 1094640 kB
java total memory: 897056768
VmRSS: 1114612 kB
java total memory: 903872512
VmRSS: 1142324 kB
java total memory: 1050148864
VmRSS: 1147836 kB
java total memory: 1061158912
VmRSS: 1183668 kB
java total memory: 1052246016
VmRSS: 1211496 kB
java total memory: 1058013184
VmRSS: 1230696 kB
java total memory: 1059061760
VmRSS: 1259428 kB
java total memory: 1060634624
VmRSS: 1284252 kB
java total memory: 1055916032
VmRSS: 1319460 kB
java total memory: 1052246016
VmRSS: 1323044 kB
java total memory: 1052246016
VmRSS: 1323572 kB
java total memory: 1052246016
VmRSS: 1323836 kB
java total memory: 1052246016
VmRSS: 1323836 kB
java total memory: 1052246016
VmRSS: 1324096 kB
java total memory: 1052246016
VmRSS: 1324096 kB
java total memory: 1052246016
VmRSS: 1324096 kB
...

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscr...@spark.apache.org

Reply via email to