Try using Spark 1.4.0 with SQL code generation turned on; this should make
a huge difference.

On Sat, Jun 13, 2015 at 5:08 PM, Sanjay Subramanian <
sanjaysubraman...@yahoo.com> wrote:

> hey guys
>
> I tried the following settings as well. No luck
>
> --total-executor-cores 24 --executor-memory 4G
>
>
> BTW on the same cluster , impala absolutely kills it. same query 9
> seconds. no memory issues. no issues.
>
> In fact I am pretty disappointed with Spark-SQL.
> I have worked with Hive during the 0.9.x stages and taken projects to
> production successfully and Hive actually very rarely craps out.
>
> Whether the spark folks like what I say or not, yes my expectations are
> pretty high of Spark-SQL if I were to change the ways we are doing things
> at my workplace.
> Until that time, we are going to be hugely dependent on Impala and
>  Hive(with SSD speeding up the shuffle stage , even MR jobs are not that
> slow now).
>
> I want to clarify for those of u who may be asking - why I am not using
> spark with Scala and insisting on using spark-sql ?
>
> - I have already pipelined data from enterprise tables to Hive
> - I am using CDH 5.3.3 (Cloudera starving developers version)
> - I have close to 300 tables defined in Hive external tables.
> - Data if on HDFS
> - On an average we have 150 columns per table
> - One an everyday basis , we do crazy amounts of ad-hoc joining of new and
> old tables in getting datasets ready for supervised ML
> - I thought that quite simply I can point Spark to the Hive meta and do
> queries as I do - in fact the existing queries would work as is unless I am
> using some esoteric Hive/Impala function
>
> Anyway, if there are some settings I can use and get spark-sql to run even
> on standalone mode that will be huge help.
>
> On the pre-production cluster I have spark on YARN but could never get it
> to run fairly complex queries and I have no answers from this group of the
> CDH groups.
>
> So my assumption is that its possibly not solved , else I have always got
> very quick answers and responses :-) to my questions on all CDH groups,
> Spark, Hive
>
> best regards
>
> sanjay
>
>
>
>   ------------------------------
>  *From:* Josh Rosen <rosenvi...@gmail.com>
> *To:* Sanjay Subramanian <sanjaysubraman...@yahoo.com>
> *Cc:* "user@spark.apache.org" <user@spark.apache.org>
> *Sent:* Friday, June 12, 2015 7:15 AM
> *Subject:* Re: spark-sql from CLI --->EXCEPTION:
> java.lang.OutOfMemoryError: Java heap space
>
> It sounds like this might be caused by a memory configuration problem.  In
> addition to looking at the executor memory, I'd also bump up the driver
> memory, since it appears that your shell is running out of memory when
> collecting a large query result.
>
> Sent from my phone
>
>
>
> On Jun 11, 2015, at 8:43 AM, Sanjay Subramanian <
> sanjaysubraman...@yahoo.com.INVALID> wrote:
>
> hey guys
>
> Using Hive and Impala daily intensively.
> Want to transition to spark-sql in CLI mode
>
> Currently in my sandbox I am using the Spark (standalone mode) in the CDH
> distribution (starving developer version 5.3.3)
> 3 datanode hadoop cluster
> 32GB RAM per node
> 8 cores per node
>
> spark
> 1.2.0+cdh5.3.3+371
>
>
> I am testing some stuff on one view and getting memory errors
> Possibly reason is default memory per executor showing on 18080 is
> 512M
>
> These options when used to start the spark-sql CLI does not seem to have
> any effect
> --total-executor-cores 12 --executor-memory 4G
>
>
>
> /opt/cloudera/parcels/CDH/lib/spark/bin/spark-sql -e  "select distinct
> isr,event_dt,age,age_cod,sex,year,quarter from aers.aers_demo_view"
>
> aers.aers_demo_view (7 million+ records)
> ===================
> isr     bigint  case id
> event_dt        bigint  Event date
> age     double  age of patient
> age_cod string  days,months years
> sex     string  M or F
> year    int
> quarter int
>
>
> VIEW DEFINITION
> ================
> CREATE VIEW `aers.aers_demo_view` AS SELECT `isr` AS `isr`, `event_dt` AS
> `event_dt`, `age` AS `age`, `age_cod` AS `age_cod`, `gndr_cod` AS `sex`,
> `year` AS `year`, `quarter` AS `quarter` FROM (SELECT
>    `aers_demo_v1`.`isr`,
>    `aers_demo_v1`.`event_dt`,
>    `aers_demo_v1`.`age`,
>    `aers_demo_v1`.`age_cod`,
>    `aers_demo_v1`.`gndr_cod`,
>    `aers_demo_v1`.`year`,
>    `aers_demo_v1`.`quarter`
> FROM
>   `aers`.`aers_demo_v1`
> UNION ALL
> SELECT
>    `aers_demo_v2`.`isr`,
>    `aers_demo_v2`.`event_dt`,
>    `aers_demo_v2`.`age`,
>    `aers_demo_v2`.`age_cod`,
>    `aers_demo_v2`.`gndr_cod`,
>    `aers_demo_v2`.`year`,
>    `aers_demo_v2`.`quarter`
> FROM
>   `aers`.`aers_demo_v2`
> UNION ALL
> SELECT
>    `aers_demo_v3`.`isr`,
>    `aers_demo_v3`.`event_dt`,
>    `aers_demo_v3`.`age`,
>    `aers_demo_v3`.`age_cod`,
>    `aers_demo_v3`.`gndr_cod`,
>    `aers_demo_v3`.`year`,
>    `aers_demo_v3`.`quarter`
> FROM
>   `aers`.`aers_demo_v3`
> UNION ALL
> SELECT
>    `aers_demo_v4`.`isr`,
>    `aers_demo_v4`.`event_dt`,
>    `aers_demo_v4`.`age`,
>    `aers_demo_v4`.`age_cod`,
>    `aers_demo_v4`.`gndr_cod`,
>    `aers_demo_v4`.`year`,
>    `aers_demo_v4`.`quarter`
> FROM
>   `aers`.`aers_demo_v4`
> UNION ALL
> SELECT
>    `aers_demo_v5`.`primaryid` AS `ISR`,
>    `aers_demo_v5`.`event_dt`,
>    `aers_demo_v5`.`age`,
>    `aers_demo_v5`.`age_cod`,
>    `aers_demo_v5`.`gndr_cod`,
>    `aers_demo_v5`.`year`,
>    `aers_demo_v5`.`quarter`
> FROM
>   `aers`.`aers_demo_v5`
> UNION ALL
> SELECT
>    `aers_demo_v6`.`primaryid` AS `ISR`,
>    `aers_demo_v6`.`event_dt`,
>    `aers_demo_v6`.`age`,
>    `aers_demo_v6`.`age_cod`,
>    `aers_demo_v6`.`sex` AS `GNDR_COD`,
>    `aers_demo_v6`.`year`,
>    `aers_demo_v6`.`quarter`
> FROM
>   `aers`.`aers_demo_v6`) `aers_demo_view`
>
>
>
>
>
>
>
> 15/06/11 08:36:36 WARN DefaultChannelPipeline: An exception was thrown by
> a user handler while handling an exception event ([id: 0x01b99855, /
> 10.0.0.19:58117 => /10.0.0.19:52016] EXCEPTION:
> java.lang.OutOfMemoryError: Java heap space)
> java.lang.OutOfMemoryError: Java heap space
>         at
> org.jboss.netty.buffer.HeapChannelBuffer.<init>(HeapChannelBuffer.java:42)
>         at
> org.jboss.netty.buffer.BigEndianHeapChannelBuffer.<init>(BigEndianHeapChannelBuffer.java:34)
>         at
> org.jboss.netty.buffer.ChannelBuffers.buffer(ChannelBuffers.java:134)
>         at
> org.jboss.netty.buffer.HeapChannelBufferFactory.getBuffer(HeapChannelBufferFactory.java:68)
>         at
> org.jboss.netty.buffer.AbstractChannelBufferFactory.getBuffer(AbstractChannelBufferFactory.java:48)
>         at
> org.jboss.netty.handler.codec.frame.FrameDecoder.newCumulationBuffer(FrameDecoder.java:507)
>         at
> org.jboss.netty.handler.codec.frame.FrameDecoder.updateCumulation(FrameDecoder.java:345)
>         at
> org.jboss.netty.handler.codec.frame.FrameDecoder.messageReceived(FrameDecoder.java:312)
>         at
> org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:268)
>         at
> org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:255)
>         at
> org.jboss.netty.channel.socket.nio.NioWorker.read(NioWorker.java:88)
>         at
> org.jboss.netty.channel.socket.nio.AbstractNioWorker.process(AbstractNioWorker.java:109)
>         at
> org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:312)
>         at
> org.jboss.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:90)
>         at
> org.jboss.netty.channel.socket.nio.NioWorker.run(NioWorker.java:178)
>         at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>         at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>         at java.lang.Thread.run(Thread.java:745)
> 15/06/11 08:36:40 ERROR Utils: Uncaught exception in thread
> task-result-getter-0
> java.lang.OutOfMemoryError: GC overhead limit exceeded
>         at java.lang.Long.valueOf(Long.java:577)
>         at
> com.esotericsoftware.kryo.serializers.DefaultSerializers$LongSerializer.read(DefaultSerializers.java:113)
>         at
> com.esotericsoftware.kryo.serializers.DefaultSerializers$LongSerializer.read(DefaultSerializers.java:103)
>         at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:732)
>         at
> com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ObjectArraySerializer.read(DefaultArraySerializers.java:338)
>         at
> com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ObjectArraySerializer.read(DefaultArraySerializers.java:293)
>         at com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:651)
>         at
> com.esotericsoftware.kryo.serializers.FieldSerializer$ObjectField.read(FieldSerializer.java:605)
>         at
> com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:221)
>         at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:732)
>         at
> com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ObjectArraySerializer.read(DefaultArraySerializers.java:338)
>         at
> com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ObjectArraySerializer.read(DefaultArraySerializers.java:293)
>         at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:732)
>         at
> org.apache.spark.serializer.KryoSerializerInstance.deserialize(KryoSerializer.scala:171)
>         at
> org.apache.spark.scheduler.DirectTaskResult.value(TaskResult.scala:79)
>         at
> org.apache.spark.scheduler.TaskSetManager.handleSuccessfulTask(TaskSetManager.scala:558)
>         at
> org.apache.spark.scheduler.TaskSchedulerImpl.handleSuccessfulTask(TaskSchedulerImpl.scala:352)
>         at
> org.apache.spark.scheduler.TaskResultGetter$$anon$2$$anonfun$run$1.apply$mcV$sp(TaskResultGetter.scala:80)
>         at
> org.apache.spark.scheduler.TaskResultGetter$$anon$2$$anonfun$run$1.apply(TaskResultGetter.scala:49)
>         at
> org.apache.spark.scheduler.TaskResultGetter$$anon$2$$anonfun$run$1.apply(TaskResultGetter.scala:49)
>         at
> org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1468)
>         at
> org.apache.spark.scheduler.TaskResultGetter$$anon$2.run(TaskResultGetter.scala:48)
>         at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>         at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>         at java.lang.Thread.run(Thread.java:745)
> 15/06/11 08:36:38 ERROR ActorSystemImpl: exception on LARS’ timer thread
> java.lang.OutOfMemoryError: GC overhead limit exceeded
>         at
> akka.dispatch.AbstractNodeQueue.<init>(AbstractNodeQueue.java:19)
>         at
> akka.actor.LightArrayRevolverScheduler$TaskQueue.<init>(Scheduler.scala:431)
>         at
> akka.actor.LightArrayRevolverScheduler$$anon$12.nextTick(Scheduler.scala:397)
>         at
> akka.actor.LightArrayRevolverScheduler$$anon$12.run(Scheduler.scala:363)
>         at java.lang.Thread.run(Thread.java:745)
> Exception in thread "task-result-getter-0" java.lang.OutOfMemoryError: GC
> overhead limit exceeded
>         at java.lang.Long.valueOf(Long.java:577)
>         at
> com.esotericsoftware.kryo.serializers.DefaultSerializers$LongSerializer.read(DefaultSerializers.java:113)
>         at
> com.esotericsoftware.kryo.serializers.DefaultSerializers$LongSerializer.read(DefaultSerializers.java:103)
>         at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:732)
>         at
> com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ObjectArraySerializer.read(DefaultArraySerializers.java:338)
>         at
> com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ObjectArraySerializer.read(DefaultArraySerializers.java:293)
>         at com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:651)
>         at
> com.esotericsoftware.kryo.serializers.FieldSerializer$ObjectField.read(FieldSerializer.java:605)
>         at
> com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:221)
>         at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:732)
>         at
> com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ObjectArraySerializer.read(DefaultArraySerializers.java:338)
>         at
> com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ObjectArraySerializer.read(DefaultArraySerializers.java:293)
>         at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:732)
>         at
> org.apache.spark.serializer.KryoSerializerInstance.deserialize(KryoSerializer.scala:171)
>         at
> org.apache.spark.scheduler.DirectTaskResult.value(TaskResult.scala:79)
>         at
> org.apache.spark.scheduler.TaskSetManager.handleSuccessfulTask(TaskSetManager.scala:558)
>         at
> org.apache.spark.scheduler.TaskSchedulerImpl.handleSuccessfulTask(TaskSchedulerImpl.scala:352)
>         at
> org.apache.spark.scheduler.TaskResultGetter$$anon$2$$anonfun$run$1.apply$mcV$sp(TaskResultGetter.scala:80)
>         at
> org.apache.spark.scheduler.TaskResultGetter$$anon$2$$anonfun$run$1.apply(TaskResultGetter.scala:49)
>         at
> org.apache.spark.scheduler.TaskResultGetter$$anon$2$$anonfun$run$1.apply(TaskResultGetter.scala:49)
>         at
> org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1468)
>         at
> org.apache.spark.scheduler.TaskResultGetter$$anon$2.run(TaskResultGetter.scala:48)
>         at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>         at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>         at java.lang.Thread.run(Thread.java:745)
> 15/06/11 08:36:41 ERROR ActorSystemImpl: Uncaught fatal error from thread
> [sparkDriver-scheduler-1] shutting down ActorSystem [sparkDriver]
> java.lang.OutOfMemoryError: GC overhead limit exceeded
>         at
> akka.dispatch.AbstractNodeQueue.<init>(AbstractNodeQueue.java:19)
>         at
> akka.actor.LightArrayRevolverScheduler$TaskQueue.<init>(Scheduler.scala:431)
>         at
> akka.actor.LightArrayRevolverScheduler$$anon$12.nextTick(Scheduler.scala:397)
>         at
> akka.actor.LightArrayRevolverScheduler$$anon$12.run(Scheduler.scala:363)
>         at java.lang.Thread.run(Thread.java:745)
> 15/06/11 08:36:46 ERROR ActorSystemImpl: Uncaught fatal error from thread
> [sparkDriver-akka.actor.default-dispatcher-4] shutting down ActorSystem
> [sparkDriver]
> java.lang.OutOfMemoryError: GC overhead limit exceeded
> 15/06/11 08:36:46 ERROR SparkSQLDriver: Failed in [select distinct
> isr,event_dt,age,age_cod,sex,year,quarter from aers.aers_demo_view]
> org.apache.spark.SparkException: Job cancelled because SparkContext was
> shut down
>         at
> org.apache.spark.scheduler.DAGScheduler$$anonfun$cleanUpAfterSchedulerStop$1.apply(DAGScheduler.scala:702)
>         at
> org.apache.spark.scheduler.DAGScheduler$$anonfun$cleanUpAfterSchedulerStop$1.apply(DAGScheduler.scala:701)
>         at scala.collection.mutable.HashSet.foreach(HashSet.scala:79)
>         at
> org.apache.spark.scheduler.DAGScheduler.cleanUpAfterSchedulerStop(DAGScheduler.scala:701)
>         at
> org.apache.spark.scheduler.DAGSchedulerEventProcessActor.postStop(DAGScheduler.scala:1428)
>         at
> akka.actor.dungeon.FaultHandling$class.akka$actor$dungeon$FaultHandling$$finishTerminate(FaultHandling.scala:201)
>         at
> akka.actor.dungeon.FaultHandling$class.terminate(FaultHandling.scala:163)
>         at akka.actor.ActorCell.terminate(ActorCell.scala:338)
>         at akka.actor.ActorCell.invokeAll$1(ActorCell.scala:431)
>         at akka.actor.ActorCell.systemInvoke(ActorCell.scala:447)
>         at
> akka.dispatch.Mailbox.processAllSystemMessages(Mailbox.scala:262)
>         at akka.dispatch.Mailbox.run(Mailbox.scala:218)
>         at
> akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:386)
>         at
> scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
>         at
> scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
>         at
> scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
>         at
> scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
> 15/06/11 08:36:51 WARN DefaultChannelPipeline: An exception was thrown by
> a user handler while handling an exception event ([id: 0x79935a9b, /
> 10.0.0.35:54028 => /10.0.0.19:52016] EXCEPTION:
> java.lang.OutOfMemoryError: Java heap space)
> java.lang.OutOfMemoryError: Java heap space
> 15/06/11 08:36:52 ERROR ActorSystemImpl: Uncaught fatal error from thread
> [sparkDriver-akka.actor.default-dispatcher-5] shutting down ActorSystem
> [sparkDriver]
> java.lang.OutOfMemoryError: Java heap space
> 15/06/11 08:36:53 WARN DefaultChannelPipeline: An exception was thrown by
> a user handler while handling an exception event ([id: 0xcb8c4b5d, /
> 10.0.0.18:46744 => /10.0.0.19:52016] EXCEPTION:
> java.lang.OutOfMemoryError: Java heap space)
> java.lang.OutOfMemoryError: Java heap space
> 15/06/11 08:36:56 WARN NioEventLoop: Unexpected exception in the selector
> loop.
> java.lang.OutOfMemoryError: GC overhead limit exceeded
> 15/06/11 08:36:57 ERROR ActorSystemImpl: Uncaught fatal error from thread
> [sparkDriver-akka.actor.default-dispatcher-18] shutting down ActorSystem
> [sparkDriver]
> java.lang.OutOfMemoryError: GC overhead limit exceeded
> 15/06/11 08:36:58 ERROR Utils: Uncaught exception in thread
> task-result-getter-3
> java.lang.OutOfMemoryError: GC overhead limit exceeded
> Exception in thread "task-result-getter-3" java.lang.OutOfMemoryError: GC
> overhead limit exceeded
> 15/06/11 08:37:01 ERROR ActorSystemImpl: Uncaught fatal error from thread
> [sparkDriver-akka.actor.default-dispatcher-4] shutting down ActorSystem
> [sparkDriver]
> java.lang.OutOfMemoryError: Java heap space
> Time taken: 70.982 seconds
> 15/06/11 08:37:06 WARN QueuedThreadPool: 4 threads could not be stopped
> 15/06/11 08:37:11 ERROR MapOutputTrackerMaster: Error communicating with
> MapOutputTracker
> akka.pattern.AskTimeoutException:
> Recipient[Actor[akka://sparkDriver/user/MapOutputTracker#-2109395547]]
> had already been terminated.
>         at
> akka.pattern.AskableActorRef$.ask$extension(AskSupport.scala:134)
>         at
> org.apache.spark.MapOutputTracker.askTracker(MapOutputTracker.scala:111)
>         at
> org.apache.spark.MapOutputTracker.sendTracker(MapOutputTracker.scala:122)
>         at
> org.apache.spark.MapOutputTrackerMaster.stop(MapOutputTracker.scala:330)
>         at org.apache.spark.SparkEnv.stop(SparkEnv.scala:83)
>         at org.apache.spark.SparkContext.stop(SparkContext.scala:1210)
>         at
> org.apache.spark.sql.hive.thriftserver.SparkSQLEnv$.stop(SparkSQLEnv.scala:66)
>         at
> org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver$$anon$1.run(SparkSQLCLIDriver.scala:107)
> Exception in thread "Thread-3" org.apache.spark.SparkException: Error
> communicating with MapOutputTracker
>         at
> org.apache.spark.MapOutputTracker.askTracker(MapOutputTracker.scala:116)
>         at
> org.apache.spark.MapOutputTracker.sendTracker(MapOutputTracker.scala:122)
>         at
> org.apache.spark.MapOutputTrackerMaster.stop(MapOutputTracker.scala:330)
>         at org.apache.spark.SparkEnv.stop(SparkEnv.scala:83)
>         at org.apache.spark.SparkContext.stop(SparkContext.scala:1210)
>         at
> org.apache.spark.sql.hive.thriftserver.SparkSQLEnv$.stop(SparkSQLEnv.scala:66)
>         at
> org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver$$anon$1.run(SparkSQLCLIDriver.scala:107)
> Caused by: akka.pattern.AskTimeoutException:
> Recipient[Actor[akka://sparkDriver/user/MapOutputTracker#-2109395547]]
> had already been terminated.
>         at
> akka.pattern.AskableActorRef$.ask$extension(AskSupport.scala:134)
>         at
> org.apache.spark.MapOutputTracker.askTracker(MapOutputTracker.scala:111)
>
>
>
>
>

Reply via email to