[ https://issues.apache.org/jira/browse/SPARK-7603?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14542153#comment-14542153 ]
Ihor Bobak commented on SPARK-7603: ----------------------------------- I've just downloaded 1.2.2 and configured exactly the same way everything: the problem is NOT reproducible. Therefore, you most probably did some kind of optimization in the newer version. If you need more files from me (e.g. hive tables, etc.) - just feel free to ask me, I will send you everything. If you want, I can even give you the backup of the VM which I am working with. > Crash of thrift server when doing SQL without "limit" > ----------------------------------------------------- > > Key: SPARK-7603 > URL: https://issues.apache.org/jira/browse/SPARK-7603 > Project: Spark > Issue Type: Bug > Components: Spark Core > Affects Versions: 1.3.1 > Environment: Hortonworks Sandbox 2.1 with Spark 1.3.1 > Reporter: Ihor Bobak > > I have 2 tables in hive: one with 120 thousand records, another one is 5 > times smaller. > I'm running a standalone cluster on single VM, and the thrift server with > ./start-thriftserver.sh --conf spark.executor.memory=2048m --conf > spark.driver.memory=1024m > command. > My spark-defaults.conf contains: > spark.master spark://sandbox.hortonworks.com:7077 > spark.eventLog.enabled true > spark.eventLog.dir > hdfs://sandbox.hortonworks.com:8020/user/pdi/spark/logs > So, when I am running SQL > select <some fields from header>, <some fields from details> > from > vw_salesorderdetail as d > left join vw_salesorderheader as h on h.SalesOrderID = d.SalesOrderID > limit 2000000000; > everything is fine, no matter that the limit is unreal (again: the resultset > returned is just 120000 records). > But if I am running the same query without limit clause - I get hanging of > execution - see here: http://postimg.org/image/fujdjd16f/42945a78/ > and a lot of exceptions in the logs of thrift server - here you are: > 15/05/13 17:59:27 INFO TaskSetManager: Starting task 158.0 in stage 48.0 (TID > 953, sandbox.hortonworks.com, PROCESS_LOCAL, 1473 bytes) > 15/05/13 18:00:01 INFO TaskSetManager: Finished task 150.0 in stage 48.0 (TID > 945) in 36166 ms on sandbox.hortonworks.com (152/200) > 15/05/13 18:00:02 ERROR Utils: Uncaught exception in thread Spark Context > Cleaner > java.lang.OutOfMemoryError: GC overhead limit exceeded > at > org.apache.spark.ContextCleaner$$anonfun$org$apache$spark$ContextCleaner$$keepCleaning$1.apply$mcV$sp(ContextCleaner.scala:147) > at > org.apache.spark.ContextCleaner$$anonfun$org$apache$spark$ContextCleaner$$keepCleaning$1.apply(ContextCleaner.scala:144) > at > org.apache.spark.ContextCleaner$$anonfun$org$apache$spark$ContextCleaner$$keepCleaning$1.apply(ContextCleaner.scala:144) > at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1618) > at > org.apache.spark.ContextCleaner.org$apache$spark$ContextCleaner$$keepCleaning(ContextCleaner.scala:143) > at org.apache.spark.ContextCleaner$$anon$3.run(ContextCleaner.scala:65) > Exception in thread "Spark Context Cleaner" 15/05/13 18:00:02 ERROR Utils: > Uncaught exception in thread task-result-getter-1 > java.lang.OutOfMemoryError: GC overhead limit exceeded > at java.lang.String.<init>(String.java:315) > at com.esotericsoftware.kryo.io.Input.readAscii(Input.java:562) > at com.esotericsoftware.kryo.io.Input.readString(Input.java:436) > at > com.esotericsoftware.kryo.serializers.DefaultSerializers$StringSerializer.read(DefaultSerializers.java:157) > at > com.esotericsoftware.kryo.serializers.DefaultSerializers$StringSerializer.read(DefaultSerializers.java:146) > at com.esotericsoftware.kryo.Kryo.readObjectOrNull(Kryo.java:706) > at > com.esotericsoftware.kryo.serializers.FieldSerializer$ObjectField.read(FieldSerializer.java:611) > at > com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:221) > at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:732) > at > com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ObjectArraySerializer.read(DefaultArraySerializers.java:338) > at > com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ObjectArraySerializer.read(DefaultArraySerializers.java:293) > at com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:651) > at > com.esotericsoftware.kryo.serializers.FieldSerializer$ObjectField.read(FieldSerializer.java:605) > at > com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:221) > at com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:651) > at > com.esotericsoftware.kryo.serializers.FieldSerializer$ObjectField.read(FieldSerializer.java:605) > at > com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:221) > at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:732) > at > com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ObjectArraySerializer.read(DefaultArraySerializers.java:338) > at > com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ObjectArraySerializer.read(DefaultArraySerializers.java:293) > at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:732) > at > org.apache.spark.serializer.KryoSerializerInstance.deserialize(KryoSerializer.scala:173) > at > org.apache.spark.scheduler.DirectTaskResult.value(TaskResult.scala:79) > at > org.apache.spark.scheduler.TaskSetManager.handleSuccessfulTask(TaskSetManager.scala:621) > at > org.apache.spark.scheduler.TaskSchedulerImpl.handleSuccessfulTask(TaskSchedulerImpl.scala:379) > at > org.apache.spark.scheduler.TaskResultGetter$$anon$2$$anonfun$run$1.apply$mcV$sp(TaskResultGetter.scala:82) > at > org.apache.spark.scheduler.TaskResultGetter$$anon$2$$anonfun$run$1.apply(TaskResultGetter.scala:51) > at > org.apache.spark.scheduler.TaskResultGetter$$anon$2$$anonfun$run$1.apply(TaskResultGetter.scala:51) > at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1618) > at > org.apache.spark.scheduler.TaskResultGetter$$anon$2.run(TaskResultGetter.scala:50) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > Exception in thread "task-result-getter-1" 15/05/13 18:00:04 INFO > TaskSetManager: Starting task 159.0 in stage 48.0 (TID 954, > sandbox.hortonworks.com, PROCESS_LOCAL, 1473 bytes) > java.lang.OutOfMemoryError: GC overhead limit exceeded > at > org.apache.spark.ContextCleaner$$anonfun$org$apache$spark$ContextCleaner$$keepCleaning$1.apply$mcV$sp(ContextCleaner.scala:147) > at > org.apache.spark.ContextCleaner$$anonfun$org$apache$spark$ContextCleaner$$keepCleaning$1.apply(ContextCleaner.scala:144) > at > org.apache.spark.ContextCleaner$$anonfun$org$apache$spark$ContextCleaner$$keepCleaning$1.apply(ContextCleaner.scala:144) > at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1618) > at > org.apache.spark.ContextCleaner.org$apache$spark$ContextCleaner$$keepCleaning(ContextCleaner.scala:143) > at org.apache.spark.ContextCleaner$$anon$3.run(ContextCleaner.scala:65) > java.lang.OutOfMemoryError: GC overhead limit exceeded > at java.lang.String.<init>(String.java:315) > at com.esotericsoftware.kryo.io.Input.readAscii(Input.java:562) > at com.esotericsoftware.kryo.io.Input.readString(Input.java:436) > at > com.esotericsoftware.kryo.serializers.DefaultSerializers$StringSerializer.read(DefaultSerializers.java:157) > at > com.esotericsoftware.kryo.serializers.DefaultSerializers$StringSerializer.read(DefaultSerializers.java:146) > at com.esotericsoftware.kryo.Kryo.readObjectOrNull(Kryo.java:706) > at > com.esotericsoftware.kryo.serializers.FieldSerializer$ObjectField.read(FieldSerializer.java:611) > at > com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:221) > at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:732) > at > com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ObjectArraySerializer.read(DefaultArraySerializers.java:338) > at > com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ObjectArraySerializer.read(DefaultArraySerializers.java:293) > at com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:651) > at > com.esotericsoftware.kryo.serializers.FieldSerializer$ObjectField.read(FieldSerializer.java:605) > at > com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:221) > at com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:651) > at > com.esotericsoftware.kryo.serializers.FieldSerializer$ObjectField.read(FieldSerializer.java:605) > at > com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:221) > at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:732) > at > com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ObjectArraySerializer.read(DefaultArraySerializers.java:338) > at > com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ObjectArraySerializer.read(DefaultArraySerializers.java:293) > at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:732) > at > org.apache.spark.serializer.KryoSerializerInstance.deserialize(KryoSerializer.scala:173) > at > org.apache.spark.scheduler.DirectTaskResult.value(TaskResult.scala:79) > at > org.apache.spark.scheduler.TaskSetManager.handleSuccessfulTask(TaskSetManager.scala:621) > at > org.apache.spark.scheduler.TaskSchedulerImpl.handleSuccessfulTask(TaskSchedulerImpl.scala:379) > at > org.apache.spark.scheduler.TaskResultGetter$$anon$2$$anonfun$run$1.apply$mcV$sp(TaskResultGetter.scala:82) > at > org.apache.spark.scheduler.TaskResultGetter$$anon$2$$anonfun$run$1.apply(TaskResultGetter.scala:51) > at > org.apache.spark.scheduler.TaskResultGetter$$anon$2$$anonfun$run$1.apply(TaskResultGetter.scala:51) > at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1618) > at > org.apache.spark.scheduler.TaskResultGetter$$anon$2.run(TaskResultGetter.scala:50) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > 15/05/13 18:00:05 INFO TaskSetManager: Finished task 154.0 in stage 48.0 (TID > 949) in 40665 ms on sandbox.hortonworks.com (153/200) > 15/05/13 18:00:20 ERROR Utils: Uncaught exception in thread > task-result-getter-3 > java.lang.OutOfMemoryError: GC overhead limit exceeded > Exception in thread "task-result-getter-3" java.lang.OutOfMemoryError: GC > overhead limit exceeded > 15/05/13 18:00:28 ERROR Utils: Uncaught exception in thread > task-result-getter-2 > java.lang.OutOfMemoryError: GC overhead limit exceeded > Exception in thread "task-result-getter-2" java.lang.OutOfMemoryError: GC > overhead limit exceeded > 15/05/13 18:00:29 INFO TaskSetManager: Starting task 160.0 in stage 48.0 (TID > 955, sandbox.hortonworks.com, PROCESS_LOCAL, 1473 bytes) > 15/05/13 18:00:31 ERROR ActorSystemImpl: exception on LARS’ timer thread > java.lang.OutOfMemoryError: GC overhead limit exceeded > at > akka.actor.LightArrayRevolverScheduler$$anon$8.nextTick(Scheduler.scala:409) > at > akka.actor.LightArrayRevolverScheduler$$anon$8.run(Scheduler.scala:375) > at java.lang.Thread.run(Thread.java:744) > 15/05/13 18:00:31 INFO ActorSystemImpl: starting new LARS thread > 15/05/13 18:00:31 ERROR ActorSystemImpl: Uncaught fatal error from thread > [sparkDriver-akka.remote.default-remote-dispatcher-6] shutting down > ActorSystem [sparkDriver] > java.lang.OutOfMemoryError: GC overhead limit exceeded > at java.lang.Class.getDeclaredMethods0(Native Method) > at java.lang.Class.privateGetDeclaredMethods(Class.java:2531) > at java.lang.Class.getDeclaredMethod(Class.java:2002) > at > java.io.ObjectStreamClass.getPrivateMethod(ObjectStreamClass.java:1431) > at java.io.ObjectStreamClass.access$1700(ObjectStreamClass.java:72) > at java.io.ObjectStreamClass$2.run(ObjectStreamClass.java:494) > at java.io.ObjectStreamClass$2.run(ObjectStreamClass.java:468) > at java.security.AccessController.doPrivileged(Native Method) > at java.io.ObjectStreamClass.<init>(ObjectStreamClass.java:468) > at java.io.ObjectStreamClass.lookup(ObjectStreamClass.java:365) > at java.io.ObjectStreamClass.initNonProxy(ObjectStreamClass.java:602) > at > java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1622) > at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1517) > at > java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1771) > at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350) > at > java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990) > at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915) > at > java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798) > at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350) > at > java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990) > at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915) > at > java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798) > at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350) > at > java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990) > at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915) > at > java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798) > at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350) > at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370) > at > akka.serialization.JavaSerializer$$anonfun$1.apply(Serializer.scala:136) > at scala.util.DynamicVariable.withValue(DynamicVariable.scala:57) > at akka.serialization.JavaSerializer.fromBinary(Serializer.scala:136) > at > akka.serialization.Serialization$$anonfun$deserialize$1.apply(Serialization.scala:104) > 15/05/13 18:00:31 ERROR ActorSystemImpl: Uncaught fatal error from thread > [sparkDriver-scheduler-1] shutting down ActorSystem [sparkDriver] > java.lang.OutOfMemoryError: GC overhead limit exceeded > at > akka.actor.LightArrayRevolverScheduler$$anon$8.nextTick(Scheduler.scala:409) > at > akka.actor.LightArrayRevolverScheduler$$anon$8.run(Scheduler.scala:375) > at java.lang.Thread.run(Thread.java:744) > 15/05/13 18:00:31 ERROR ActorSystemImpl: Uncaught fatal error from thread > [sparkDriver-akka.remote.default-remote-dispatcher-5] shutting down > ActorSystem [sparkDriver] > java.lang.OutOfMemoryError: GC overhead limit exceeded > at java.lang.Class.getDeclaredMethods0(Native Method) > at java.lang.Class.privateGetDeclaredMethods(Class.java:2531) > at java.lang.Class.getDeclaredMethod(Class.java:2002) > at > java.io.ObjectStreamClass.getPrivateMethod(ObjectStreamClass.java:1431) > at java.io.ObjectStreamClass.access$1700(ObjectStreamClass.java:72) > at java.io.ObjectStreamClass$2.run(ObjectStreamClass.java:494) > at java.io.ObjectStreamClass$2.run(ObjectStreamClass.java:468) > at java.security.AccessController.doPrivileged(Native Method) > at java.io.ObjectStreamClass.<init>(ObjectStreamClass.java:468) > at java.io.ObjectStreamClass.lookup(ObjectStreamClass.java:365) > at java.io.ObjectStreamClass.initNonProxy(ObjectStreamClass.java:602) > at > java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1622) > at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1517) > at > java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1771) > at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350) > at > java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990) > at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915) > at > java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798) > at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350) > at > java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990) > at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915) > at > java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798) > at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350) > at > java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990) > at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915) > at > java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798) > at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350) > at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370) > at > akka.serialization.JavaSerializer$$anonfun$1.apply(Serializer.scala:136) > at scala.util.DynamicVariable.withValue(DynamicVariable.scala:57) > Feel free to contact me - I will send you full logs. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org