[ 
https://issues.apache.org/jira/browse/KYLIN-3824?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16775179#comment-16775179
 ] 

ASF GitHub Bot commented on KYLIN-3824:
---------------------------------------

Sidonet commented on pull request #478: KYLIN-3824
URL: https://github.com/apache/kylin/pull/478
 
 
   For environment with low RAM, and huge Cube, remove List to avoid 
java.lang.OutOfMemory and iterate by Iterator.
 
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Spark - Extract Fact Table Distinct Columns step causes 
> java.lang.OutOfMemoryError: Java heap space
> ---------------------------------------------------------------------------------------------------
>
>                 Key: KYLIN-3824
>                 URL: https://issues.apache.org/jira/browse/KYLIN-3824
>             Project: Kylin
>          Issue Type: Bug
>          Components: Spark Engine
>    Affects Versions: v2.6.1
>         Environment: CentOS 7
> 3 workers and 1 master.
> 4 cpu, 16GB RAM each
>            Reporter: Alexander
>            Priority: Major
>         Attachments: KYLIN-3824.master.001.patch
>
>
> Try to build huge cube on weak envirment.
> Environment:
> Cluster with 3 nodes.
> Max AM container size - 5GB.
>  
> kylin_intermediate table ~500 files of size started from 4kb up to 300mb.
>  
> When spark job executor take file larger than ~70MB on step 
> mapPartitionsToPair (194) it got exception:
> 2019-02-21 20:29:40 ERROR SparkUncaughtExceptionHandler:91 - [Container in 
> shutdown] Uncaught exception in thread Thread[Executor task launch worker for 
> task 1,5,main]
> java.lang.OutOfMemoryError: Java heap space
>  at java.util.Arrays.copyOfRange(Arrays.java:3664)
>  at java.lang.String.<init>(String.java:207)
>  at java.lang.String.substring(String.java:1969)
>  at java.lang.String.split(String.java:2353)
>  at java.lang.String.split(String.java:2422)
>  at org.apache.kylin.engine.spark.SparkUtil$1.call(SparkUtil.java:164)
>  at org.apache.kylin.engine.spark.SparkUtil$1.call(SparkUtil.java:160)
>  at 
> org.apache.spark.api.java.JavaPairRDD$$anonfun$toScalaFunction$1.apply(JavaPairRDD.scala:1040)
>  at scala.collection.Iterator$$anon$11.next(Iterator.scala:409)
>  at scala.collection.convert.Wrappers$IteratorWrapper.next(Wrappers.scala:31)
>  at com.google.common.collect.Lists.newArrayList(Lists.java:145)
>  at 
> org.apache.kylin.engine.spark.SparkFactDistinct$FlatOutputFucntion.call(SparkFactDistinct.java:313)
>  at 
> org.apache.kylin.engine.spark.SparkFactDistinct$FlatOutputFucntion.call(SparkFactDistinct.java:239)
>  at 
> org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$7$1.apply(JavaRDDLike.scala:186)
>  at 
> org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$7$1.apply(JavaRDDLike.scala:186)
>  at 
> org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$23.apply(RDD.scala:801)
>  at 
> org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$23.apply(RDD.scala:801)
>  at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:49)
>  at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
>  at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
>  at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:96)
>  at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:53)
>  at org.apache.spark.scheduler.Task.run(Task.scala:109)
>  at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:345)
>  at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>  at java.lang.Thread.run(Thread.java:745)
>  
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to