I’ve seen this when I specified “too many” where clauses in the SQL query. I
was able to adjust my query to use a single ‘in’ clause rather than many ‘=’
clauses but I realize that may not be an option in all cases.
Jeff
On 5/4/16, 2:04 PM, "BenD" wrote:
>I am getting a java.lang.StackOverflowError somewhere in my program. I am not
>able to pinpoint which part causes it because the stack trace seems to be
>incomplete (see end of message). The error doesn't happen all the time, and
>I think it is based on the number of files that I load. I am running on AWS
>EMR with Spark 1.6.0 with an m1.xlarge as driver and 3 r8.xlarge (244GB ram
>+ 32 cores each) and an r2.xlarge (61GB ram + 8 cores) as executor machines
>with the following configuration:
>
>spark.driver.cores2
>spark.yarn.executor.memoryOverhead5000
>spark.dynamicAllocation.enabledtrue
>spark.executor.cores2
>spark.driver.memory14g
>spark.executor.memory12g
>
>While I can't post the full code or data, I will give a basic outline. I am
>loading many json files from S3 into a JavaRDD which is then mapped
>to a JavaPairRDD where the Long is the timestamp of the file.
>I then union the RDDs into a single RDD which is then turned into a
>DataFrame. After I have this dataframe, I run an SQL query on it and then
>dump the result to S3.
>
>A cut down version of the code would look similar to this:
>
>List> linesList = validFiles.map(x -> {
> try {
> Long date = dateMapper.call(x);
> return context.textFile(x.asPath()).mapToPair(y -> new
>Tuple2<>(date, y));
> } catch (Exception e) {
> throw new RuntimeException(e);
> }
>}).collect(Collectors.toList());
>
>JavaPairRDD unionedRDD = linesList.get(0);
>if (linesList.size() > 1) {
>unionedRDD = context.union(unionedRDD , linesList.subList(1,
>linesList.size()));
>}
>
>HiveContext sqlContext = new HiveContext(context);
>DataFrame table = sqlContext.read().json(unionedRDD.values());
>table.registerTempTable("table");
>sqlContext.cacheTable("table");
>dumpToS3(sqlContext.sql("query"));
>
>
>This runs fine some times, but other times I get the
>java.lang.StackOverflowError. I know the error happens on a run where 7800
>files are loaded. Based on the error message mentioning mapped values, I
>assume the problem occurs in the mapToPair function, but I don't know why it
>happens. Does anyone have some insight into this problem?
>
>This is the whole print out of the error as seen in the container log:
>java.lang.StackOverflowError
>at
>scala.collection.MapLike$MappedValues$$anonfun$foreach$3.apply(MapLike.scala:245)
>at
>scala.collection.MapLike$MappedValues$$anonfun$foreach$3.apply(MapLike.scala:245)
>at
>scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:772)
>at
>scala.collection.MapLike$MappedValues$$anonfun$foreach$3.apply(MapLike.scala:245)
>at
>scala.collection.MapLike$MappedValues$$anonfun$foreach$3.apply(MapLike.scala:245)
>at
>scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:772)
>at
>scala.collection.MapLike$MappedValues$$anonfun$foreach$3.apply(MapLike.scala:245)
>at
>scala.collection.MapLike$MappedValues$$anonfun$foreach$3.apply(MapLike.scala:245)
>at
>scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:772)
>at
>scala.collection.MapLike$MappedValues$$anonfun$foreach$3.apply(MapLike.scala:245)
>at
>scala.collection.MapLike$MappedValues$$anonfun$foreach$3.apply(MapLike.scala:245)
>at
>scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:772)
>at
>scala.collection.MapLike$MappedValues$$anonfun$foreach$3.apply(MapLike.scala:245)
>at
>scala.collection.MapLike$MappedValues$$anonfun$foreach$3.apply(MapLike.scala:245)
>at
>scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:772)
>at
>scala.collection.MapLike$MappedValues$$anonfun$foreach$3.apply(MapLike.scala:245)
>at
>scala.collection.MapLike$MappedValues$$anonfun$foreach$3.apply(MapLike.scala:245)
>at
>scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:772)
>at
>scala.collection.MapLike$MappedValues$$anonfun$foreach$3.apply(MapLike.scala:245)
>at
>scala.collection.MapLike$MappedValues$$anonfun$foreach$3.apply(MapLike.scala:245)
>at
>scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:772)
>at
>scala.collection.MapLike$MappedValues$$anonfun$foreach$3.apply(MapLike.scala:245)
>at
>scala.collection.MapLike$MappedValues$$anonfun$foreach$3.apply(MapLike.scala:245)
>at
>scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:772)
>at
>scala.collection.MapLike$MappedValues$$anonfun$foreach$3.apply(MapLike.scala:245)
>at
>scala.collection.MapLike$MappedValues$$anonfun$foreach$3.apply(MapLike.scala:245)
>at