Hi guys, There is another memory issue. Not sure if this is related to Tungsten this time because I have it disable (spark.sql.tungsten.enabled=false). It happens more there are too many tasks running (300). I need to limit the number of task to avoid this. The executor has 6G. Spark 1.5.1 is been used.
Best Regards, Jerry org.apache.spark.SparkException: Task failed while writing rows. at org.apache.spark.sql.execution.datasources.DynamicPartitionWriterContainer.writeRows(WriterContainer.scala:393) at org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelation$$anonfun$run$1$$anonfun$apply$mcV$sp$3.apply(InsertIntoHadoopFsRelation.scala:150) at org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelation$$anonfun$run$1$$anonfun$apply$mcV$sp$3.apply(InsertIntoHadoopFsRelation.scala:150) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66) at org.apache.spark.scheduler.Task.run(Task.scala:88) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) Caused by: java.io.IOException: Unable to acquire 67108864 bytes of memory at org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter.acquireNewPage(UnsafeExternalSorter.java:351) at org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter.<init>(UnsafeExternalSorter.java:138) at org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter.create(UnsafeExternalSorter.java:106) at org.apache.spark.sql.execution.UnsafeKVExternalSorter.<init>(UnsafeKVExternalSorter.java:74) at org.apache.spark.sql.execution.UnsafeKVExternalSorter.<init>(UnsafeKVExternalSorter.java:56) at org.apache.spark.sql.execution.datasources.DynamicPartitionWriterContainer.writeRows(WriterContainer.scala:339) On Tue, Oct 20, 2015 at 9:10 PM, Reynold Xin <r...@databricks.com> wrote: > With Jerry's permission, sending this back to the dev list to close the > loop. > > > ---------- Forwarded message ---------- > From: Jerry Lam <chiling...@gmail.com> > Date: Tue, Oct 20, 2015 at 3:54 PM > Subject: Re: If you use Spark 1.5 and disabled Tungsten mode ... > To: Reynold Xin <r...@databricks.com> > > > Yup, coarse grained mode works just fine. :) > The difference is that by default, coarse grained mode uses 1 core per > task. If I constraint 20 cores in total, there can be only 20 tasks running > at the same time. However, with fine grained, I cannot set the total number > of cores and therefore, it could be +200 tasks running at the same time (It > is dynamic). So it might be the calculation of how much memory to acquire > fail when the number of cores cannot be known ahead of time because you > cannot make the assumption that X tasks running in an executor? Just my > guess... > > > On Tue, Oct 20, 2015 at 6:24 PM, Reynold Xin <r...@databricks.com> wrote: > >> Can you try coarse-grained mode and see if it is the same? >> >> >> On Tue, Oct 20, 2015 at 3:20 PM, Jerry Lam <chiling...@gmail.com> wrote: >> >>> Hi Reynold, >>> >>> Yes, I'm using 1.5.1. I see them quite often. Sometimes it recovers but >>> sometimes it does not. For one particular job, it failed all the time with >>> the acquire-memory issue. I'm using spark on mesos with fine grained mode. >>> Does it make a difference? >>> >>> Best Regards, >>> >>> Jerry >>> >>> On Tue, Oct 20, 2015 at 5:27 PM, Reynold Xin <r...@databricks.com> >>> wrote: >>> >>>> Jerry - I think that's been fixed in 1.5.1. Do you still see it? >>>> >>>> On Tue, Oct 20, 2015 at 2:11 PM, Jerry Lam <chiling...@gmail.com> >>>> wrote: >>>> >>>>> I disabled it because of the "Could not acquire 65536 bytes of >>>>> memory". It happens to fail the job. So for now, I'm not touching it. >>>>> >>>>> On Tue, Oct 20, 2015 at 4:48 PM, charmee <charm...@gmail.com> wrote: >>>>> >>>>>> We had disabled tungsten after we found few performance issues, but >>>>>> had to >>>>>> enable it back because we found that when we had large number of >>>>>> group by >>>>>> fields, if tungsten is disabled the shuffle keeps failing. >>>>>> >>>>>> Here is an excerpt from one of our engineers with his analysis. >>>>>> >>>>>> With Tungsten Enabled (default in spark 1.5): >>>>>> ~90 files of 0.5G each: >>>>>> >>>>>> Ingest (after applying broadcast lookups) : 54 min >>>>>> Aggregation (~30 fields in group by and another 40 in aggregation) : >>>>>> 18 min >>>>>> >>>>>> With Tungsten Disabled: >>>>>> >>>>>> Ingest : 30 min >>>>>> Aggregation : Erroring out >>>>>> >>>>>> On smaller tests we found that joins are slow with tungsten enabled. >>>>>> With >>>>>> GROUP BY, disabling tungsten is not working in the first place. >>>>>> >>>>>> Hope this helps. >>>>>> >>>>>> -Charmee >>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> View this message in context: >>>>>> http://apache-spark-developers-list.1001551.n3.nabble.com/If-you-use-Spark-1-5-and-disabled-Tungsten-mode-tp14604p14711.html >>>>>> Sent from the Apache Spark Developers List mailing list archive at >>>>>> Nabble.com. >>>>>> >>>>>> --------------------------------------------------------------------- >>>>>> To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org >>>>>> For additional commands, e-mail: dev-h...@spark.apache.org >>>>>> >>>>>> >>>>> >>>> >>> >> > >