[ https://issues.apache.org/jira/browse/SPARK-26320?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Hyukjin Kwon resolved SPARK-26320. ---------------------------------- Resolution: Incomplete Leaving this resolved due to no feedback. > udf with multiple arrays as input > --------------------------------- > > Key: SPARK-26320 > URL: https://issues.apache.org/jira/browse/SPARK-26320 > Project: Spark > Issue Type: Bug > Components: SQL > Affects Versions: 2.1.1, 2.3.1 > Reporter: Rio Wibowo > Priority: Major > > Spark get GC out of memory error when passing many arrays when we use many > arrays input > Description : > # 3 different Input arrays > # Every array has size 10 > > UDF : > {code} > val getResult = udf( > (index: Integer, array1: Seq[Integer], array2: Seq[Integer], array3: > Seq[Double]) => 0 > doSomeThing(index, array1, array2, array3) > ) > {code} > DataFrame Schema : > {code} > root > |-- frequency_1: integer (nullable = true) > |-- code_1: integer (nullable = true) > |-- power_1: integer (nullable = false) > |-- frequency_2: integer (nullable = true) > |-- code_2: integer (nullable = true) > |-- power_2: integer (nullable = false) > |-- frequency_3: integer (nullable = true) > |-- code_3: integer (nullable = true) > |-- power_3: integer (nullable = false) > |-- frequency_4: integer (nullable = true) > |-- code_4: integer (nullable = true) > |-- power_4: integer (nullable = false) > |-- frequency_5: integer (nullable = true) > |-- code_5: integer (nullable = true) > |-- power_5: integer (nullable = false) > |-- frequency_6: integer (nullable = true) > |-- code_6: integer (nullable = true) > |-- power_6: integer (nullable = false) > |-- frequency_7: integer (nullable = true) > |-- code_7: integer (nullable = true) > |-- power_7: double (nullable = true) > |-- frequency_8: integer (nullable = true) > |-- code_8: integer (nullable = true) > |-- power_8: double (nullable = true) > |-- frequency_9: integer (nullable = true) > |-- code_9: integer (nullable = true) > |-- power_9: double (nullable = true) > |-- frequency_10: integer (nullable = true) > |-- code_10: integer (nullable = true) > |-- power_10: double (nullable = true) > {code} > Call the UDF to enrich dataframe using withColumn function 10 times: > > {code:java} > df.withColumn("out1", getResult(0, > array(col("frequency_1"), col("frequency_2), ....,col("frequency_10)), > array(col("code_1"), col("code_2), ....,col("code_10)), > array(col("power_1"), col("power_2), ....,col("power_10))) > .withColumn("out2", getResult(0, > array(col("frequency_1"), col("frequency_2), ....,col("frequency_10)), > array(col("code_1"), col("code_2), ....,col("code_10)), > array(col("power_1"), col("power_2), ....,col("power_10))) > > .withColumn("out9", getResult(9,.... > > .withColumn("out10", getResult(10, > array(col("frequency_1"), col("frequency_2), ....,col("frequency_10)), > array(col("code_1"), col("code_2), ....,col("code_10)), > array(col("power_1"), col("power_2), ....,col("power_10))) > {code} > > > Error Log : > {code:java} > 12:56:08.461 [dispatcher-event-loop-3] ERROR > o.a.s.scheduler.TaskSchedulerImpl - Lost executor driver on localhost: > Executor heartbeat timed out after 150014 ms > [info] com.xxx.xx.xx.xx.xx.xx.xx.xx.CSVExporterSpec *** ABORTED *** (9 > minutes, 24 seconds) > [info] java.lang.OutOfMemoryError: GC overhead limit exceeded > [info] ... > [error] Uncaught exception when running > com.xxx.xx.xx.xx.xx.xx.xx.xx.CSVExporterSpec: java.lang.OutOfMemoryError: GC > overhead limit exceeded > java.lang.OutOfMemoryError: GC overhead limit exceeded > at java.util.AbstractList.iterator(AbstractList.java:288) > at > org.apache.cassandra.gms.Gossiper.addLocalApplicationStates(Gossiper.java:1513) > at > org.apache.cassandra.gms.Gossiper.addLocalApplicationState(Gossiper.java:1505) > at > org.apache.cassandra.service.LoadBroadcaster$1.run(LoadBroadcaster.java:92) > at > org.apache.cassandra.concurrent.DebuggableScheduledThreadPoolExecutor$UncomplainingRunnable.run(DebuggableScheduledThreadPoolExecutor.java:118) > at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180) > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at > org.apache.cassandra.concurrent.NamedThreadFactory.lambda$threadLocalDeallocator$0(NamedThreadFactory.java:81) > at > org.apache.cassandra.concurrent.NamedThreadFactory$$Lambda$4/821749187.run(Unknown > Source) > at java.lang.Thread.run(Thread.java:745) > 12:56:13.552 [ScheduledTasks:1] ERROR o.a.c.utils.JVMStabilityInspector - JVM > state determined to be unstable. Exiting forcefully due to: > java.lang.OutOfMemoryError: GC overhead limit exceeded > at java.util.AbstractList.iterator(AbstractList.java:288) > at > org.apache.cassandra.gms.Gossiper.addLocalApplicationStates(Gossiper.java:1513) > at > org.apache.cassandra.gms.Gossiper.addLocalApplicationState(Gossiper.java:1505) > at > org.apache.cassandra.service.LoadBroadcaster$1.run(LoadBroadcaster.java:92) > at > org.apache.cassandra.concurrent.DebuggableScheduledThreadPoolExecutor$UncomplainingRunnable.run(DebuggableScheduledThreadPoolExecutor.java:118) > at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180) > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at > org.apache.cassandra.concurrent.NamedThreadFactory.lambda$threadLocalDeallocator$0(NamedThreadFactory.java:81) > at > org.apache.cassandra.concurrent.NamedThreadFactory$$Lambda$4/821749187.run(Unknown > Source) > at java.lang.Thread.run(Thread.java:745){code} > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org