Re: If you use Spark 1.5 and disabled Tungsten mode ...
Hi Reynold, I had version 2.6.1 in my project which was provided by the fine folks from spring-boot-dependencies. Now have overridden it to 2.7.8 :) Sjoerd 2015-11-01 8:22 GMT+01:00 Reynold Xin: > Thanks for reporting it, Sjoerd. You might have a different version of > Janino brought in from somewhere else. > > This should fix your problem: https://github.com/apache/spark/pull/9372 > > Can you give it a try? > > > > On Tue, Oct 27, 2015 at 9:12 PM, Sjoerd Mulder > wrote: > >> No the job actually doesn't fail, but since our tests is generating all >> these stacktraces i have disabled the tungsten mode just to be sure (and >> don't have gazilion stacktraces in production). >> >> 2015-10-27 20:59 GMT+01:00 Josh Rosen : >> >>> Hi Sjoerd, >>> >>> Did your job actually *fail* or did it just generate many spurious >>> exceptions? While the stacktrace that you posted does indicate a bug, I >>> don't think that it should have stopped query execution because Spark >>> should have fallen back to an interpreted code path (note the "Failed >>> to generate ordering, fallback to interpreted" in the error message). >>> >>> On Tue, Oct 27, 2015 at 12:56 PM Sjoerd Mulder >>> wrote: >>> I have disabled it because of it started generating ERROR's when upgrading from Spark 1.4 to 1.5.1 2015-10-27T20:50:11.574+0100 ERROR TungstenSort.newOrdering() - Failed to generate ordering, fallback to interpreted java.util.concurrent.ExecutionException: java.lang.Exception: failed to compile: org.codehaus.commons.compiler.CompileException: Line 15, Column 9: Invalid character input "@" (character code 64) public SpecificOrdering generate(org.apache.spark.sql.catalyst.expressions.Expression[] expr) { return new SpecificOrdering(expr); } class SpecificOrdering extends org.apache.spark.sql.catalyst.expressions.codegen.BaseOrdering { private org.apache.spark.sql.catalyst.expressions.Expression[] expressions; public SpecificOrdering(org.apache.spark.sql.catalyst.expressions.Expression[] expr) { expressions = expr; } @Override public int compare(InternalRow a, InternalRow b) { InternalRow i = null; // Holds current row being evaluated. i = a; boolean isNullA2; long primitiveA3; { /* input[2, LongType] */ boolean isNull0 = i.isNullAt(2); long primitive1 = isNull0 ? -1L : (i.getLong(2)); isNullA2 = isNull0; primitiveA3 = primitive1; } i = b; boolean isNullB4; long primitiveB5; { /* input[2, LongType] */ boolean isNull0 = i.isNullAt(2); long primitive1 = isNull0 ? -1L : (i.getLong(2)); isNullB4 = isNull0; primitiveB5 = primitive1; } if (isNullA2 && isNullB4) { // Nothing } else if (isNullA2) { return 1; } else if (isNullB4) { return -1; } else { int comp = (primitiveA3 > primitiveB5 ? 1 : primitiveA3 < primitiveB5 ? -1 : 0); if (comp != 0) { return -comp; } } return 0; } } at org.spark-project.guava.util.concurrent.AbstractFuture$Sync.getValue(AbstractFuture.java:306) at org.spark-project.guava.util.concurrent.AbstractFuture$Sync.get(AbstractFuture.java:293) at org.spark-project.guava.util.concurrent.AbstractFuture.get(AbstractFuture.java:116) at org.spark-project.guava.util.concurrent.Uninterruptibles.getUninterruptibly(Uninterruptibles.java:135) at org.spark-project.guava.cache.LocalCache$Segment.getAndRecordStats(LocalCache.java:2410) at org.spark-project.guava.cache.LocalCache$Segment.loadSync(LocalCache.java:2380) at org.spark-project.guava.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2342) at org.spark-project.guava.cache.LocalCache$Segment.get(LocalCache.java:2257) at org.spark-project.guava.cache.LocalCache.get(LocalCache.java:4000) at org.spark-project.guava.cache.LocalCache.getOrLoad(LocalCache.java:4004) at org.spark-project.guava.cache.LocalCache$LocalLoadingCache.get(LocalCache.java:4874) at org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator.compile(CodeGenerator.scala:362) at org.apache.spark.sql.catalyst.expressions.codegen.GenerateOrdering$.create(GenerateOrdering.scala:139) at org.apache.spark.sql.catalyst.expressions.codegen.GenerateOrdering$.create(GenerateOrdering.scala:37) at org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator.generate(CodeGenerator.scala:425) at
Re: If you use Spark 1.5 and disabled Tungsten mode ...
Thanks for reporting it, Sjoerd. You might have a different version of Janino brought in from somewhere else. This should fix your problem: https://github.com/apache/spark/pull/9372 Can you give it a try? On Tue, Oct 27, 2015 at 9:12 PM, Sjoerd Mulderwrote: > No the job actually doesn't fail, but since our tests is generating all > these stacktraces i have disabled the tungsten mode just to be sure (and > don't have gazilion stacktraces in production). > > 2015-10-27 20:59 GMT+01:00 Josh Rosen : > >> Hi Sjoerd, >> >> Did your job actually *fail* or did it just generate many spurious >> exceptions? While the stacktrace that you posted does indicate a bug, I >> don't think that it should have stopped query execution because Spark >> should have fallen back to an interpreted code path (note the "Failed to >> generate ordering, fallback to interpreted" in the error message). >> >> On Tue, Oct 27, 2015 at 12:56 PM Sjoerd Mulder >> wrote: >> >>> I have disabled it because of it started generating ERROR's when >>> upgrading from Spark 1.4 to 1.5.1 >>> >>> 2015-10-27T20:50:11.574+0100 ERROR TungstenSort.newOrdering() - Failed >>> to generate ordering, fallback to interpreted >>> java.util.concurrent.ExecutionException: java.lang.Exception: failed to >>> compile: org.codehaus.commons.compiler.CompileException: Line 15, Column 9: >>> Invalid character input "@" (character code 64) >>> >>> public SpecificOrdering >>> generate(org.apache.spark.sql.catalyst.expressions.Expression[] expr) { >>> return new SpecificOrdering(expr); >>> } >>> >>> class SpecificOrdering extends >>> org.apache.spark.sql.catalyst.expressions.codegen.BaseOrdering { >>> >>> private org.apache.spark.sql.catalyst.expressions.Expression[] >>> expressions; >>> >>> >>> >>> public >>> SpecificOrdering(org.apache.spark.sql.catalyst.expressions.Expression[] >>> expr) { >>> expressions = expr; >>> >>> } >>> >>> @Override >>> public int compare(InternalRow a, InternalRow b) { >>> InternalRow i = null; // Holds current row being evaluated. >>> >>> i = a; >>> boolean isNullA2; >>> long primitiveA3; >>> { >>> /* input[2, LongType] */ >>> >>> boolean isNull0 = i.isNullAt(2); >>> long primitive1 = isNull0 ? -1L : (i.getLong(2)); >>> >>> isNullA2 = isNull0; >>> primitiveA3 = primitive1; >>> } >>> i = b; >>> boolean isNullB4; >>> long primitiveB5; >>> { >>> /* input[2, LongType] */ >>> >>> boolean isNull0 = i.isNullAt(2); >>> long primitive1 = isNull0 ? -1L : (i.getLong(2)); >>> >>> isNullB4 = isNull0; >>> primitiveB5 = primitive1; >>> } >>> if (isNullA2 && isNullB4) { >>> // Nothing >>> } else if (isNullA2) { >>> return 1; >>> } else if (isNullB4) { >>> return -1; >>> } else { >>> int comp = (primitiveA3 > primitiveB5 ? 1 : primitiveA3 < >>> primitiveB5 ? -1 : 0); >>> if (comp != 0) { >>> return -comp; >>> } >>> } >>> >>> return 0; >>> } >>> } >>> >>> at >>> org.spark-project.guava.util.concurrent.AbstractFuture$Sync.getValue(AbstractFuture.java:306) >>> at >>> org.spark-project.guava.util.concurrent.AbstractFuture$Sync.get(AbstractFuture.java:293) >>> at >>> org.spark-project.guava.util.concurrent.AbstractFuture.get(AbstractFuture.java:116) >>> at >>> org.spark-project.guava.util.concurrent.Uninterruptibles.getUninterruptibly(Uninterruptibles.java:135) >>> at >>> org.spark-project.guava.cache.LocalCache$Segment.getAndRecordStats(LocalCache.java:2410) >>> at >>> org.spark-project.guava.cache.LocalCache$Segment.loadSync(LocalCache.java:2380) >>> at >>> org.spark-project.guava.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2342) >>> at >>> org.spark-project.guava.cache.LocalCache$Segment.get(LocalCache.java:2257) >>> at org.spark-project.guava.cache.LocalCache.get(LocalCache.java:4000) >>> at >>> org.spark-project.guava.cache.LocalCache.getOrLoad(LocalCache.java:4004) >>> at >>> org.spark-project.guava.cache.LocalCache$LocalLoadingCache.get(LocalCache.java:4874) >>> at >>> org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator.compile(CodeGenerator.scala:362) >>> at >>> org.apache.spark.sql.catalyst.expressions.codegen.GenerateOrdering$.create(GenerateOrdering.scala:139) >>> at >>> org.apache.spark.sql.catalyst.expressions.codegen.GenerateOrdering$.create(GenerateOrdering.scala:37) >>> at >>> org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator.generate(CodeGenerator.scala:425) >>> at >>> org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator.generate(CodeGenerator.scala:422) >>> at >>> org.apache.spark.sql.execution.SparkPlan.newOrdering(SparkPlan.scala:294) >>> at org.apache.spark.sql.execution.TungstenSort.org >>> $apache$spark$sql$execution$TungstenSort$$preparePartition$1(sort.scala:131) >>> at >>>
Re: If you use Spark 1.5 and disabled Tungsten mode ...
Hi guys, There is another memory issue. Not sure if this is related to Tungsten this time because I have it disable (spark.sql.tungsten.enabled=false). It happens more there are too many tasks running (300). I need to limit the number of task to avoid this. The executor has 6G. Spark 1.5.1 is been used. Best Regards, Jerry org.apache.spark.SparkException: Task failed while writing rows. at org.apache.spark.sql.execution.datasources.DynamicPartitionWriterContainer.writeRows(WriterContainer.scala:393) at org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelation$$anonfun$run$1$$anonfun$apply$mcV$sp$3.apply(InsertIntoHadoopFsRelation.scala:150) at org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelation$$anonfun$run$1$$anonfun$apply$mcV$sp$3.apply(InsertIntoHadoopFsRelation.scala:150) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66) at org.apache.spark.scheduler.Task.run(Task.scala:88) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) Caused by: java.io.IOException: Unable to acquire 67108864 bytes of memory at org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter.acquireNewPage(UnsafeExternalSorter.java:351) at org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter.(UnsafeExternalSorter.java:138) at org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter.create(UnsafeExternalSorter.java:106) at org.apache.spark.sql.execution.UnsafeKVExternalSorter.(UnsafeKVExternalSorter.java:74) at org.apache.spark.sql.execution.UnsafeKVExternalSorter.(UnsafeKVExternalSorter.java:56) at org.apache.spark.sql.execution.datasources.DynamicPartitionWriterContainer.writeRows(WriterContainer.scala:339) On Tue, Oct 20, 2015 at 9:10 PM, Reynold Xin <r...@databricks.com> wrote: > With Jerry's permission, sending this back to the dev list to close the > loop. > > > -- Forwarded message -- > From: Jerry Lam <chiling...@gmail.com> > Date: Tue, Oct 20, 2015 at 3:54 PM > Subject: Re: If you use Spark 1.5 and disabled Tungsten mode ... > To: Reynold Xin <r...@databricks.com> > > > Yup, coarse grained mode works just fine. :) > The difference is that by default, coarse grained mode uses 1 core per > task. If I constraint 20 cores in total, there can be only 20 tasks running > at the same time. However, with fine grained, I cannot set the total number > of cores and therefore, it could be +200 tasks running at the same time (It > is dynamic). So it might be the calculation of how much memory to acquire > fail when the number of cores cannot be known ahead of time because you > cannot make the assumption that X tasks running in an executor? Just my > guess... > > > On Tue, Oct 20, 2015 at 6:24 PM, Reynold Xin <r...@databricks.com> wrote: > >> Can you try coarse-grained mode and see if it is the same? >> >> >> On Tue, Oct 20, 2015 at 3:20 PM, Jerry Lam <chiling...@gmail.com> wrote: >> >>> Hi Reynold, >>> >>> Yes, I'm using 1.5.1. I see them quite often. Sometimes it recovers but >>> sometimes it does not. For one particular job, it failed all the time with >>> the acquire-memory issue. I'm using spark on mesos with fine grained mode. >>> Does it make a difference? >>> >>> Best Regards, >>> >>> Jerry >>> >>> On Tue, Oct 20, 2015 at 5:27 PM, Reynold Xin <r...@databricks.com> >>> wrote: >>> >>>> Jerry - I think that's been fixed in 1.5.1. Do you still see it? >>>> >>>> On Tue, Oct 20, 2015 at 2:11 PM, Jerry Lam <chiling...@gmail.com> >>>> wrote: >>>> >>>>> I disabled it because of the "Could not acquire 65536 bytes of >>>>> memory". It happens to fail the job. So for now, I'm not touching it. >>>>> >>>>> On Tue, Oct 20, 2015 at 4:48 PM, charmee <charm...@gmail.com> wrote: >>>>> >>>>>> We had disabled tungsten after we found few performance issues, but >>>>>> had to >>>>>> enable it back because we found that when we had large number of >>>>>> group by >>>>>> fields, if tungsten is disabled the shuffle keeps failing. >>>>>> >>>>>> Here is an excerpt from one of our engineers with his analysis. >>>>>> >>>>>> With Tungsten
Re: If you use Spark 1.5 and disabled Tungsten mode ...
Is this still Mesos fine grained mode? On Wed, Oct 21, 2015 at 1:16 PM, Jerry Lam <chiling...@gmail.com> wrote: > Hi guys, > > There is another memory issue. Not sure if this is related to Tungsten > this time because I have it disable (spark.sql.tungsten.enabled=false). It > happens more there are too many tasks running (300). I need to limit the > number of task to avoid this. The executor has 6G. Spark 1.5.1 is been used. > > Best Regards, > > Jerry > > org.apache.spark.SparkException: Task failed while writing rows. > at > org.apache.spark.sql.execution.datasources.DynamicPartitionWriterContainer.writeRows(WriterContainer.scala:393) > at > org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelation$$anonfun$run$1$$anonfun$apply$mcV$sp$3.apply(InsertIntoHadoopFsRelation.scala:150) > at > org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelation$$anonfun$run$1$$anonfun$apply$mcV$sp$3.apply(InsertIntoHadoopFsRelation.scala:150) > at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66) > at org.apache.spark.scheduler.Task.run(Task.scala:88) > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:745) > Caused by: java.io.IOException: Unable to acquire 67108864 bytes of memory > at > org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter.acquireNewPage(UnsafeExternalSorter.java:351) > at > org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter.(UnsafeExternalSorter.java:138) > at > org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter.create(UnsafeExternalSorter.java:106) > at > org.apache.spark.sql.execution.UnsafeKVExternalSorter.(UnsafeKVExternalSorter.java:74) > at > org.apache.spark.sql.execution.UnsafeKVExternalSorter.(UnsafeKVExternalSorter.java:56) > at > org.apache.spark.sql.execution.datasources.DynamicPartitionWriterContainer.writeRows(WriterContainer.scala:339) > > > On Tue, Oct 20, 2015 at 9:10 PM, Reynold Xin <r...@databricks.com> wrote: > >> With Jerry's permission, sending this back to the dev list to close the >> loop. >> >> >> ------ Forwarded message -- >> From: Jerry Lam <chiling...@gmail.com> >> Date: Tue, Oct 20, 2015 at 3:54 PM >> Subject: Re: If you use Spark 1.5 and disabled Tungsten mode ... >> To: Reynold Xin <r...@databricks.com> >> >> >> Yup, coarse grained mode works just fine. :) >> The difference is that by default, coarse grained mode uses 1 core per >> task. If I constraint 20 cores in total, there can be only 20 tasks running >> at the same time. However, with fine grained, I cannot set the total number >> of cores and therefore, it could be +200 tasks running at the same time (It >> is dynamic). So it might be the calculation of how much memory to acquire >> fail when the number of cores cannot be known ahead of time because you >> cannot make the assumption that X tasks running in an executor? Just my >> guess... >> >> >> On Tue, Oct 20, 2015 at 6:24 PM, Reynold Xin <r...@databricks.com> wrote: >> >>> Can you try coarse-grained mode and see if it is the same? >>> >>> >>> On Tue, Oct 20, 2015 at 3:20 PM, Jerry Lam <chiling...@gmail.com> wrote: >>> >>>> Hi Reynold, >>>> >>>> Yes, I'm using 1.5.1. I see them quite often. Sometimes it recovers but >>>> sometimes it does not. For one particular job, it failed all the time with >>>> the acquire-memory issue. I'm using spark on mesos with fine grained mode. >>>> Does it make a difference? >>>> >>>> Best Regards, >>>> >>>> Jerry >>>> >>>> On Tue, Oct 20, 2015 at 5:27 PM, Reynold Xin <r...@databricks.com> >>>> wrote: >>>> >>>>> Jerry - I think that's been fixed in 1.5.1. Do you still see it? >>>>> >>>>> On Tue, Oct 20, 2015 at 2:11 PM, Jerry Lam <chiling...@gmail.com> >>>>> wrote: >>>>> >>>>>> I disabled it because of the "Could not acquire 65536 bytes of >>>>>> memory". It happens to fail the job. So for now, I'm not touching it. >>>>>> >>>>>> On Tue, Oct 20, 2015 at 4:48 PM, charmee <charm...@gmail.com> wrote: >>>>>>
Re: If you use Spark 1.5 and disabled Tungsten mode ...
Yes. The crazy thing about mesos running in fine grained mode is that there is no way (correct me if I'm wrong) to set the number of cores per executor. If one of my slaves on mesos has 32 cores, the fine grained mode can allocate 32 cores on this executor for the job and if there are 32 tasks running on this executor at the same time, that is when the acquire memory issue appears. Of course the 32 cores are dynamically allocated. So mesos can take them back or put them in again depending on the cluster utilization. On Wed, Oct 21, 2015 at 5:13 PM, Reynold Xin <r...@databricks.com> wrote: > Is this still Mesos fine grained mode? > > > On Wed, Oct 21, 2015 at 1:16 PM, Jerry Lam <chiling...@gmail.com> wrote: > >> Hi guys, >> >> There is another memory issue. Not sure if this is related to Tungsten >> this time because I have it disable (spark.sql.tungsten.enabled=false). It >> happens more there are too many tasks running (300). I need to limit the >> number of task to avoid this. The executor has 6G. Spark 1.5.1 is been used. >> >> Best Regards, >> >> Jerry >> >> org.apache.spark.SparkException: Task failed while writing rows. >> at >> org.apache.spark.sql.execution.datasources.DynamicPartitionWriterContainer.writeRows(WriterContainer.scala:393) >> at >> org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelation$$anonfun$run$1$$anonfun$apply$mcV$sp$3.apply(InsertIntoHadoopFsRelation.scala:150) >> at >> org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelation$$anonfun$run$1$$anonfun$apply$mcV$sp$3.apply(InsertIntoHadoopFsRelation.scala:150) >> at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66) >> at org.apache.spark.scheduler.Task.run(Task.scala:88) >> at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214) >> at >> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) >> at >> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) >> at java.lang.Thread.run(Thread.java:745) >> Caused by: java.io.IOException: Unable to acquire 67108864 bytes of memory >> at >> org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter.acquireNewPage(UnsafeExternalSorter.java:351) >> at >> org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter.(UnsafeExternalSorter.java:138) >> at >> org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter.create(UnsafeExternalSorter.java:106) >> at >> org.apache.spark.sql.execution.UnsafeKVExternalSorter.(UnsafeKVExternalSorter.java:74) >> at >> org.apache.spark.sql.execution.UnsafeKVExternalSorter.(UnsafeKVExternalSorter.java:56) >> at >> org.apache.spark.sql.execution.datasources.DynamicPartitionWriterContainer.writeRows(WriterContainer.scala:339) >> >> >> On Tue, Oct 20, 2015 at 9:10 PM, Reynold Xin <r...@databricks.com> wrote: >> >>> With Jerry's permission, sending this back to the dev list to close the >>> loop. >>> >>> >>> -- Forwarded message -- >>> From: Jerry Lam <chiling...@gmail.com> >>> Date: Tue, Oct 20, 2015 at 3:54 PM >>> Subject: Re: If you use Spark 1.5 and disabled Tungsten mode ... >>> To: Reynold Xin <r...@databricks.com> >>> >>> >>> Yup, coarse grained mode works just fine. :) >>> The difference is that by default, coarse grained mode uses 1 core per >>> task. If I constraint 20 cores in total, there can be only 20 tasks running >>> at the same time. However, with fine grained, I cannot set the total number >>> of cores and therefore, it could be +200 tasks running at the same time (It >>> is dynamic). So it might be the calculation of how much memory to acquire >>> fail when the number of cores cannot be known ahead of time because you >>> cannot make the assumption that X tasks running in an executor? Just my >>> guess... >>> >>> >>> On Tue, Oct 20, 2015 at 6:24 PM, Reynold Xin <r...@databricks.com> >>> wrote: >>> >>>> Can you try coarse-grained mode and see if it is the same? >>>> >>>> >>>> On Tue, Oct 20, 2015 at 3:20 PM, Jerry Lam <chiling...@gmail.com> >>>> wrote: >>>> >>>>> Hi Reynold, >>>>> >>>>> Yes, I'm using 1.5.1. I see them quite often. Sometimes it recovers >>>>> but sometimes it does not. For one particular job, it failed all the time >>>>>
Re: If you use Spark 1.5 and disabled Tungsten mode ...
I disabled it because of the "Could not acquire 65536 bytes of memory". It happens to fail the job. So for now, I'm not touching it. On Tue, Oct 20, 2015 at 4:48 PM, charmeewrote: > We had disabled tungsten after we found few performance issues, but had to > enable it back because we found that when we had large number of group by > fields, if tungsten is disabled the shuffle keeps failing. > > Here is an excerpt from one of our engineers with his analysis. > > With Tungsten Enabled (default in spark 1.5): > ~90 files of 0.5G each: > > Ingest (after applying broadcast lookups) : 54 min > Aggregation (~30 fields in group by and another 40 in aggregation) : 18 min > > With Tungsten Disabled: > > Ingest : 30 min > Aggregation : Erroring out > > On smaller tests we found that joins are slow with tungsten enabled. With > GROUP BY, disabling tungsten is not working in the first place. > > Hope this helps. > > -Charmee > > > > -- > View this message in context: > http://apache-spark-developers-list.1001551.n3.nabble.com/If-you-use-Spark-1-5-and-disabled-Tungsten-mode-tp14604p14711.html > Sent from the Apache Spark Developers List mailing list archive at > Nabble.com. > > - > To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org > For additional commands, e-mail: dev-h...@spark.apache.org > >
Re: If you use Spark 1.5 and disabled Tungsten mode ...
Jerry - I think that's been fixed in 1.5.1. Do you still see it? On Tue, Oct 20, 2015 at 2:11 PM, Jerry Lamwrote: > I disabled it because of the "Could not acquire 65536 bytes of memory". It > happens to fail the job. So for now, I'm not touching it. > > On Tue, Oct 20, 2015 at 4:48 PM, charmee wrote: > >> We had disabled tungsten after we found few performance issues, but had to >> enable it back because we found that when we had large number of group by >> fields, if tungsten is disabled the shuffle keeps failing. >> >> Here is an excerpt from one of our engineers with his analysis. >> >> With Tungsten Enabled (default in spark 1.5): >> ~90 files of 0.5G each: >> >> Ingest (after applying broadcast lookups) : 54 min >> Aggregation (~30 fields in group by and another 40 in aggregation) : 18 >> min >> >> With Tungsten Disabled: >> >> Ingest : 30 min >> Aggregation : Erroring out >> >> On smaller tests we found that joins are slow with tungsten enabled. With >> GROUP BY, disabling tungsten is not working in the first place. >> >> Hope this helps. >> >> -Charmee >> >> >> >> -- >> View this message in context: >> http://apache-spark-developers-list.1001551.n3.nabble.com/If-you-use-Spark-1-5-and-disabled-Tungsten-mode-tp14604p14711.html >> Sent from the Apache Spark Developers List mailing list archive at >> Nabble.com. >> >> - >> To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org >> For additional commands, e-mail: dev-h...@spark.apache.org >> >> >
Re: If you use Spark 1.5 and disabled Tungsten mode ...
We had disabled tungsten after we found few performance issues, but had to enable it back because we found that when we had large number of group by fields, if tungsten is disabled the shuffle keeps failing. Here is an excerpt from one of our engineers with his analysis. With Tungsten Enabled (default in spark 1.5): ~90 files of 0.5G each: Ingest (after applying broadcast lookups) : 54 min Aggregation (~30 fields in group by and another 40 in aggregation) : 18 min With Tungsten Disabled: Ingest : 30 min Aggregation : Erroring out On smaller tests we found that joins are slow with tungsten enabled. With GROUP BY, disabling tungsten is not working in the first place. Hope this helps. -Charmee -- View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/If-you-use-Spark-1-5-and-disabled-Tungsten-mode-tp14604p14711.html Sent from the Apache Spark Developers List mailing list archive at Nabble.com. - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org
Re: If you use Spark 1.5 and disabled Tungsten mode ...
Hi Reynold, Yes, I'm using 1.5.1. I see them quite often. Sometimes it recovers but sometimes it does not. For one particular job, it failed all the time with the acquire-memory issue. I'm using spark on mesos with fine grained mode. Does it make a difference? Best Regards, Jerry On Tue, Oct 20, 2015 at 5:27 PM, Reynold Xinwrote: > Jerry - I think that's been fixed in 1.5.1. Do you still see it? > > On Tue, Oct 20, 2015 at 2:11 PM, Jerry Lam wrote: > >> I disabled it because of the "Could not acquire 65536 bytes of memory". >> It happens to fail the job. So for now, I'm not touching it. >> >> On Tue, Oct 20, 2015 at 4:48 PM, charmee wrote: >> >>> We had disabled tungsten after we found few performance issues, but had >>> to >>> enable it back because we found that when we had large number of group by >>> fields, if tungsten is disabled the shuffle keeps failing. >>> >>> Here is an excerpt from one of our engineers with his analysis. >>> >>> With Tungsten Enabled (default in spark 1.5): >>> ~90 files of 0.5G each: >>> >>> Ingest (after applying broadcast lookups) : 54 min >>> Aggregation (~30 fields in group by and another 40 in aggregation) : 18 >>> min >>> >>> With Tungsten Disabled: >>> >>> Ingest : 30 min >>> Aggregation : Erroring out >>> >>> On smaller tests we found that joins are slow with tungsten enabled. With >>> GROUP BY, disabling tungsten is not working in the first place. >>> >>> Hope this helps. >>> >>> -Charmee >>> >>> >>> >>> -- >>> View this message in context: >>> http://apache-spark-developers-list.1001551.n3.nabble.com/If-you-use-Spark-1-5-and-disabled-Tungsten-mode-tp14604p14711.html >>> Sent from the Apache Spark Developers List mailing list archive at >>> Nabble.com. >>> >>> - >>> To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org >>> For additional commands, e-mail: dev-h...@spark.apache.org >>> >>> >> >
Fwd: If you use Spark 1.5 and disabled Tungsten mode ...
With Jerry's permission, sending this back to the dev list to close the loop. -- Forwarded message -- From: Jerry Lam <chiling...@gmail.com> Date: Tue, Oct 20, 2015 at 3:54 PM Subject: Re: If you use Spark 1.5 and disabled Tungsten mode ... To: Reynold Xin <r...@databricks.com> Yup, coarse grained mode works just fine. :) The difference is that by default, coarse grained mode uses 1 core per task. If I constraint 20 cores in total, there can be only 20 tasks running at the same time. However, with fine grained, I cannot set the total number of cores and therefore, it could be +200 tasks running at the same time (It is dynamic). So it might be the calculation of how much memory to acquire fail when the number of cores cannot be known ahead of time because you cannot make the assumption that X tasks running in an executor? Just my guess... On Tue, Oct 20, 2015 at 6:24 PM, Reynold Xin <r...@databricks.com> wrote: > Can you try coarse-grained mode and see if it is the same? > > > On Tue, Oct 20, 2015 at 3:20 PM, Jerry Lam <chiling...@gmail.com> wrote: > >> Hi Reynold, >> >> Yes, I'm using 1.5.1. I see them quite often. Sometimes it recovers but >> sometimes it does not. For one particular job, it failed all the time with >> the acquire-memory issue. I'm using spark on mesos with fine grained mode. >> Does it make a difference? >> >> Best Regards, >> >> Jerry >> >> On Tue, Oct 20, 2015 at 5:27 PM, Reynold Xin <r...@databricks.com> wrote: >> >>> Jerry - I think that's been fixed in 1.5.1. Do you still see it? >>> >>> On Tue, Oct 20, 2015 at 2:11 PM, Jerry Lam <chiling...@gmail.com> wrote: >>> >>>> I disabled it because of the "Could not acquire 65536 bytes of memory". >>>> It happens to fail the job. So for now, I'm not touching it. >>>> >>>> On Tue, Oct 20, 2015 at 4:48 PM, charmee <charm...@gmail.com> wrote: >>>> >>>>> We had disabled tungsten after we found few performance issues, but >>>>> had to >>>>> enable it back because we found that when we had large number of group >>>>> by >>>>> fields, if tungsten is disabled the shuffle keeps failing. >>>>> >>>>> Here is an excerpt from one of our engineers with his analysis. >>>>> >>>>> With Tungsten Enabled (default in spark 1.5): >>>>> ~90 files of 0.5G each: >>>>> >>>>> Ingest (after applying broadcast lookups) : 54 min >>>>> Aggregation (~30 fields in group by and another 40 in aggregation) : >>>>> 18 min >>>>> >>>>> With Tungsten Disabled: >>>>> >>>>> Ingest : 30 min >>>>> Aggregation : Erroring out >>>>> >>>>> On smaller tests we found that joins are slow with tungsten enabled. >>>>> With >>>>> GROUP BY, disabling tungsten is not working in the first place. >>>>> >>>>> Hope this helps. >>>>> >>>>> -Charmee >>>>> >>>>> >>>>> >>>>> -- >>>>> View this message in context: >>>>> http://apache-spark-developers-list.1001551.n3.nabble.com/If-you-use-Spark-1-5-and-disabled-Tungsten-mode-tp14604p14711.html >>>>> Sent from the Apache Spark Developers List mailing list archive at >>>>> Nabble.com. >>>>> >>>>> - >>>>> To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org >>>>> For additional commands, e-mail: dev-h...@spark.apache.org >>>>> >>>>> >>>> >>> >> >
Re: If you use Spark 1.5 and disabled Tungsten mode ...
To clarify, we're asking about the *spark.sql.tungsten.enabled* flag, which was introduced in Spark 1.5 and enables Project Tungsten optimizations in Spark SQL. This option is set to *true* by default in Spark 1.5+ and exists primarily to allow users to disable the new code paths if they encounter bugs or performance regressions. If anyone sets spark.sql.tungsten.enabled=*false *in their SparkConf in order to *disable* these optimizations, we'd like to hear from you in order to figure out why you disabled them and to see whether we can make improvements to allow your workload to run with Tungsten enabled. Thanks, Josh On Thu, Oct 15, 2015 at 9:33 AM, mkhaitmanwrote: > Are you referring to spark.shuffle.manager=tungsten-sort? If so, we saw the > default value as still being as the regular sort, and since it was only > first introduced in 1.5, were actually waiting a bit to see if anyone > ENABLED it as opposed to DISABLING it since - it's disabled by default! :) > > I recall enabling it during testing within our dev environment, but didn't > have a comparable workload and environment to our production cluster, so we > were going to play it safe and wait until 1.6 in case there were any major > changes / regressions that weren't seen during 1.5 testing! > > Mark. > > > > -- > View this message in context: > http://apache-spark-developers-list.1001551.n3.nabble.com/If-you-use-Spark-1-5-and-disabled-Tungsten-mode-tp14604p14627.html > Sent from the Apache Spark Developers List mailing list archive at > Nabble.com. > > - > To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org > For additional commands, e-mail: dev-h...@spark.apache.org > >
Re: If you use Spark 1.5 and disabled Tungsten mode ...
My apologies for mixing up what was being referred to in that case! :) Mark. -- View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/If-you-use-Spark-1-5-and-disabled-Tungsten-mode-tp14604p14629.html Sent from the Apache Spark Developers List mailing list archive at Nabble.com. - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org
Re: If you use Spark 1.5 and disabled Tungsten mode ...
Are you referring to spark.shuffle.manager=tungsten-sort? If so, we saw the default value as still being as the regular sort, and since it was only first introduced in 1.5, were actually waiting a bit to see if anyone ENABLED it as opposed to DISABLING it since - it's disabled by default! :) I recall enabling it during testing within our dev environment, but didn't have a comparable workload and environment to our production cluster, so we were going to play it safe and wait until 1.6 in case there were any major changes / regressions that weren't seen during 1.5 testing! Mark. -- View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/If-you-use-Spark-1-5-and-disabled-Tungsten-mode-tp14604p14627.html Sent from the Apache Spark Developers List mailing list archive at Nabble.com. - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org