Dear All,
I need to iterator some job / rdd quite a lot of times, but just lost in the 
problem of spark only accept to call around 350 number of map before it meets 
one action Function , besides, dozens of action will obviously increase the run 
time.Is there any proper way ...
As tested, there is piece of codes as follows:
......
 83     int count = 0; 84     JavaRDD<Integer> dataSet = jsc.parallelize(list, 
1).cache(); //with only 1 partition  85     int m = 350; 86     
JavaRDD<Integer> r = dataSet.cache(); 87     JavaRDD<Integer> t = null; 88 89   
  for(int j=0; j < m; ++j) { //outer loop to temporarily convert the rdd r to t 
 90       if(null != t) { 91         r = t; 92       }            //inner loop 
to call map 350 times , if m is much more than 350 (for instance, around 400), 
then the job will throw exception message               "15/12/21 19:36:17 
ERROR yarn.ApplicationMaster: User class threw exception: 
java.lang.StackOverflowError java.lang.StackOverflowError") 93       for(int 
i=0; i < m; ++i) {  94         r = r.map(new Function<Integer, Integer>() { 95  
         @Override 96           public Integer call(Integer integer) { 97       
      double x = Math.random() * 2 - 1; 98             double y = Math.random() 
* 2 - 1; 99             return (x * x + y * y < 1) ? 1 : 0;100           }101   
      });
104       }105106       List<Integer> lt = r.collect(); //then collect this rdd 
to get another rdd, however, dozens of action Function as collect is VERY MUCH 
COST107       t = jsc.parallelize(lt, 1).cache();108109     }110......
Thanks very much in advance!Zhiliang

Reply via email to