Re: Stackoverflowerror in scala.collection

2016-05-26 Thread Jeff Jones
I’ve seen this when I specified “too many” where clauses in the SQL query. I 
was able to adjust my query to use a single ‘in’ clause rather than many ‘=’ 
clauses but I realize that may not be an option in all cases.

Jeff

On 5/4/16, 2:04 PM, "BenD"  wrote:

>I am getting a java.lang.StackOverflowError somewhere in my program. I am not
>able to pinpoint which part causes it because the stack trace seems to be
>incomplete (see end of message). The error doesn't happen all the time, and
>I think it is based on the number of files that I load. I am running on AWS
>EMR with Spark 1.6.0 with an m1.xlarge as driver and 3 r8.xlarge (244GB ram
>+ 32 cores each) and an r2.xlarge (61GB ram + 8 cores) as executor machines
>with the following configuration:
>
>spark.driver.cores2
>spark.yarn.executor.memoryOverhead5000
>spark.dynamicAllocation.enabledtrue
>spark.executor.cores2
>spark.driver.memory14g
>spark.executor.memory12g
>
>While I can't post the full code or data, I will give a basic outline. I am
>loading many json files from S3 into a JavaRDD which is then mapped
>to a JavaPairRDD where the Long is the timestamp of the file.
>I then union the RDDs into a single RDD which is then turned into a
>DataFrame. After I have this dataframe, I run an SQL query on it and then
>dump the result to S3.
>
>A cut down version of the code would look similar to this:
>
>List> linesList = validFiles.map(x -> {
> try {
>   Long date = dateMapper.call(x);
>   return context.textFile(x.asPath()).mapToPair(y -> new
>Tuple2<>(date, y));
> } catch (Exception e) {
>  throw new RuntimeException(e);
> }
>}).collect(Collectors.toList());
>
>JavaPairRDD unionedRDD = linesList.get(0);
>if (linesList.size() > 1) {
>unionedRDD = context.union(unionedRDD , linesList.subList(1,
>linesList.size()));
>}
>
>HiveContext sqlContext = new HiveContext(context);
>DataFrame table = sqlContext.read().json(unionedRDD.values());
>table.registerTempTable("table");
>sqlContext.cacheTable("table");
>dumpToS3(sqlContext.sql("query"));
>
>
>This runs fine some times, but other times I get the
>java.lang.StackOverflowError. I know the error happens on a run where 7800
>files are loaded. Based on the error message mentioning mapped values, I
>assume the problem occurs in the mapToPair function, but I don't know why it
>happens. Does anyone have some insight into this problem?
>
>This is the whole print out of the error as seen in the container log:
>java.lang.StackOverflowError
>at
>scala.collection.MapLike$MappedValues$$anonfun$foreach$3.apply(MapLike.scala:245)
>at
>scala.collection.MapLike$MappedValues$$anonfun$foreach$3.apply(MapLike.scala:245)
>at
>scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:772)
>at
>scala.collection.MapLike$MappedValues$$anonfun$foreach$3.apply(MapLike.scala:245)
>at
>scala.collection.MapLike$MappedValues$$anonfun$foreach$3.apply(MapLike.scala:245)
>at
>scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:772)
>at
>scala.collection.MapLike$MappedValues$$anonfun$foreach$3.apply(MapLike.scala:245)
>at
>scala.collection.MapLike$MappedValues$$anonfun$foreach$3.apply(MapLike.scala:245)
>at
>scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:772)
>at
>scala.collection.MapLike$MappedValues$$anonfun$foreach$3.apply(MapLike.scala:245)
>at
>scala.collection.MapLike$MappedValues$$anonfun$foreach$3.apply(MapLike.scala:245)
>at
>scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:772)
>at
>scala.collection.MapLike$MappedValues$$anonfun$foreach$3.apply(MapLike.scala:245)
>at
>scala.collection.MapLike$MappedValues$$anonfun$foreach$3.apply(MapLike.scala:245)
>at
>scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:772)
>at
>scala.collection.MapLike$MappedValues$$anonfun$foreach$3.apply(MapLike.scala:245)
>at
>scala.collection.MapLike$MappedValues$$anonfun$foreach$3.apply(MapLike.scala:245)
>at
>scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:772)
>at
>scala.collection.MapLike$MappedValues$$anonfun$foreach$3.apply(MapLike.scala:245)
>at
>scala.collection.MapLike$MappedValues$$anonfun$foreach$3.apply(MapLike.scala:245)
>at
>scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:772)
>at
>scala.collection.MapLike$MappedValues$$anonfun$foreach$3.apply(MapLike.scala:245)
>at
>scala.collection.MapLike$MappedValues$$anonfun$foreach$3.apply(MapLike.scala:245)
>at
>scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:772)
>at
>scala.collection.MapLike$MappedValues$$anonfun$foreach$3.apply(MapLike.scala:245)
>at
>scala.collection.MapLike$MappedValues$$anonfun$foreach$3.apply(MapLike.scala:245)
>at

Stackoverflowerror in scala.collection

2016-05-04 Thread BenD
I am getting a java.lang.StackOverflowError somewhere in my program. I am not
able to pinpoint which part causes it because the stack trace seems to be
incomplete (see end of message). The error doesn't happen all the time, and
I think it is based on the number of files that I load. I am running on AWS
EMR with Spark 1.6.0 with an m1.xlarge as driver and 3 r8.xlarge (244GB ram
+ 32 cores each) and an r2.xlarge (61GB ram + 8 cores) as executor machines
with the following configuration:

spark.driver.cores  2
spark.yarn.executor.memoryOverhead  5000
spark.dynamicAllocation.enabled true
spark.executor.cores2
spark.driver.memory 14g
spark.executor.memory   12g

While I can't post the full code or data, I will give a basic outline. I am
loading many json files from S3 into a JavaRDD which is then mapped
to a JavaPairRDD where the Long is the timestamp of the file.
I then union the RDDs into a single RDD which is then turned into a
DataFrame. After I have this dataframe, I run an SQL query on it and then
dump the result to S3.

A cut down version of the code would look similar to this:

List> linesList = validFiles.map(x -> {
 try {
   Long date = dateMapper.call(x);
   return context.textFile(x.asPath()).mapToPair(y -> new
Tuple2<>(date, y));
 } catch (Exception e) {
  throw new RuntimeException(e);
 }
}).collect(Collectors.toList());

JavaPairRDD unionedRDD = linesList.get(0);
if (linesList.size() > 1) {
unionedRDD = context.union(unionedRDD , linesList.subList(1,
linesList.size()));
}

HiveContext sqlContext = new HiveContext(context);
DataFrame table = sqlContext.read().json(unionedRDD.values());
table.registerTempTable("table");
sqlContext.cacheTable("table");
dumpToS3(sqlContext.sql("query"));


This runs fine some times, but other times I get the
java.lang.StackOverflowError. I know the error happens on a run where 7800
files are loaded. Based on the error message mentioning mapped values, I
assume the problem occurs in the mapToPair function, but I don't know why it
happens. Does anyone have some insight into this problem?

This is the whole print out of the error as seen in the container log:
java.lang.StackOverflowError
at
scala.collection.MapLike$MappedValues$$anonfun$foreach$3.apply(MapLike.scala:245)
at
scala.collection.MapLike$MappedValues$$anonfun$foreach$3.apply(MapLike.scala:245)
at
scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:772)
at
scala.collection.MapLike$MappedValues$$anonfun$foreach$3.apply(MapLike.scala:245)
at
scala.collection.MapLike$MappedValues$$anonfun$foreach$3.apply(MapLike.scala:245)
at
scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:772)
at
scala.collection.MapLike$MappedValues$$anonfun$foreach$3.apply(MapLike.scala:245)
at
scala.collection.MapLike$MappedValues$$anonfun$foreach$3.apply(MapLike.scala:245)
at
scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:772)
at
scala.collection.MapLike$MappedValues$$anonfun$foreach$3.apply(MapLike.scala:245)
at
scala.collection.MapLike$MappedValues$$anonfun$foreach$3.apply(MapLike.scala:245)
at
scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:772)
at
scala.collection.MapLike$MappedValues$$anonfun$foreach$3.apply(MapLike.scala:245)
at
scala.collection.MapLike$MappedValues$$anonfun$foreach$3.apply(MapLike.scala:245)
at
scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:772)
at
scala.collection.MapLike$MappedValues$$anonfun$foreach$3.apply(MapLike.scala:245)
at
scala.collection.MapLike$MappedValues$$anonfun$foreach$3.apply(MapLike.scala:245)
at
scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:772)
at
scala.collection.MapLike$MappedValues$$anonfun$foreach$3.apply(MapLike.scala:245)
at
scala.collection.MapLike$MappedValues$$anonfun$foreach$3.apply(MapLike.scala:245)
at
scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:772)
at
scala.collection.MapLike$MappedValues$$anonfun$foreach$3.apply(MapLike.scala:245)
at
scala.collection.MapLike$MappedValues$$anonfun$foreach$3.apply(MapLike.scala:245)
at
scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:772)
at
scala.collection.MapLike$MappedValues$$anonfun$foreach$3.apply(MapLike.scala:245)
at
scala.collection.MapLike$MappedValues$$anonfun$foreach$3.apply(MapLike.scala:245)
at
scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:772)
at