ok solved. Looks like breathing the the spark-summit SFO air for 3 days helped 
a lot !
Piping the 7 million records to local disk still runs out of memory.So piped 
the results into another Hive table. I can live with that :-) 
/opt/cloudera/parcels/CDH/lib/spark/bin/spark-sql -e "use aers; create table 
unique_aers_demo as select distinct isr,event_dt,age,age_cod,sex,year,quarter 
from aers.aers_demo_view " --driver-memory 4G --total-executor-cores 12 
--executor-memory 4G

thanks

      From: Sanjay Subramanian <sanjaysubraman...@yahoo.com.INVALID>
 To: "user@spark.apache.org" <user@spark.apache.org> 
 Sent: Thursday, June 11, 2015 8:43 AM
 Subject: spark-sql from CLI --->EXCEPTION: java.lang.OutOfMemoryError: Java 
heap space
   
hey guys
Using Hive and Impala daily intensively.Want to transition to spark-sql in CLI 
mode
Currently in my sandbox I am using the Spark (standalone mode) in the CDH 
distribution (starving developer version 5.3.3)
3 datanode hadoop cluster32GB RAM per node8 cores per node



| spark | 1.2.0+cdh5.3.3+371 |



I am testing some stuff on one view and getting memory errorsPossibly reason is 
default memory per executor showing on 18080 is 512M

These options when used to start the spark-sql CLI does not seem to have any 
effect --total-executor-cores 12 --executor-memory 4G



/opt/cloudera/parcels/CDH/lib/spark/bin/spark-sql -e  "select distinct 
isr,event_dt,age,age_cod,sex,year,quarter from aers.aers_demo_view"

aers.aers_demo_view (7 million+ records)===================isr     bigint  case 
idevent_dt        bigint  Event dateage     double  age of patientage_cod 
string  days,months yearssex     string  M or Fyear    intquarter int

VIEW DEFINITION================CREATE VIEW `aers.aers_demo_view` AS SELECT 
`isr` AS `isr`, `event_dt` AS `event_dt`, `age` AS `age`, `age_cod` AS 
`age_cod`, `gndr_cod` AS `sex`, `year` AS `year`, `quarter` AS `quarter` FROM 
(SELECT   `aers_demo_v1`.`isr`,   `aers_demo_v1`.`event_dt`,   
`aers_demo_v1`.`age`,   `aers_demo_v1`.`age_cod`,   `aers_demo_v1`.`gndr_cod`,  
 `aers_demo_v1`.`year`,   `aers_demo_v1`.`quarter`FROM  
`aers`.`aers_demo_v1`UNION ALLSELECT   `aers_demo_v2`.`isr`,   
`aers_demo_v2`.`event_dt`,   `aers_demo_v2`.`age`,   `aers_demo_v2`.`age_cod`,  
 `aers_demo_v2`.`gndr_cod`,   `aers_demo_v2`.`year`,   
`aers_demo_v2`.`quarter`FROM  `aers`.`aers_demo_v2`UNION ALLSELECT   
`aers_demo_v3`.`isr`,   `aers_demo_v3`.`event_dt`,   `aers_demo_v3`.`age`,   
`aers_demo_v3`.`age_cod`,   `aers_demo_v3`.`gndr_cod`,   `aers_demo_v3`.`year`, 
  `aers_demo_v3`.`quarter`FROM  `aers`.`aers_demo_v3`UNION ALLSELECT   
`aers_demo_v4`.`isr`,   `aers_demo_v4`.`event_dt`,   `aers_demo_v4`.`age`,   
`aers_demo_v4`.`age_cod`,   `aers_demo_v4`.`gndr_cod`,   `aers_demo_v4`.`year`, 
  `aers_demo_v4`.`quarter`FROM  `aers`.`aers_demo_v4`UNION ALLSELECT   
`aers_demo_v5`.`primaryid` AS `ISR`,   `aers_demo_v5`.`event_dt`,   
`aers_demo_v5`.`age`,   `aers_demo_v5`.`age_cod`,   `aers_demo_v5`.`gndr_cod`,  
 `aers_demo_v5`.`year`,   `aers_demo_v5`.`quarter`FROM  
`aers`.`aers_demo_v5`UNION ALLSELECT   `aers_demo_v6`.`primaryid` AS `ISR`,   
`aers_demo_v6`.`event_dt`,   `aers_demo_v6`.`age`,   `aers_demo_v6`.`age_cod`,  
 `aers_demo_v6`.`sex` AS `GNDR_COD`,   `aers_demo_v6`.`year`,   
`aers_demo_v6`.`quarter`FROM  `aers`.`aers_demo_v6`) `aers_demo_view`






15/06/11 08:36:36 WARN DefaultChannelPipeline: An exception was thrown by a 
user handler while handling an exception event ([id: 0x01b99855, 
/10.0.0.19:58117 => /10.0.0.19:52016] EXCEPTION: java.lang.OutOfMemoryError: 
Java heap space)java.lang.OutOfMemoryError: Java heap space        at 
org.jboss.netty.buffer.HeapChannelBuffer.<init>(HeapChannelBuffer.java:42)      
  at 
org.jboss.netty.buffer.BigEndianHeapChannelBuffer.<init>(BigEndianHeapChannelBuffer.java:34)
        at 
org.jboss.netty.buffer.ChannelBuffers.buffer(ChannelBuffers.java:134)        at 
org.jboss.netty.buffer.HeapChannelBufferFactory.getBuffer(HeapChannelBufferFactory.java:68)
        at 
org.jboss.netty.buffer.AbstractChannelBufferFactory.getBuffer(AbstractChannelBufferFactory.java:48)
        at 
org.jboss.netty.handler.codec.frame.FrameDecoder.newCumulationBuffer(FrameDecoder.java:507)
        at 
org.jboss.netty.handler.codec.frame.FrameDecoder.updateCumulation(FrameDecoder.java:345)
        at 
org.jboss.netty.handler.codec.frame.FrameDecoder.messageReceived(FrameDecoder.java:312)
        at 
org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:268)        
at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:255)      
  at org.jboss.netty.channel.socket.nio.NioWorker.read(NioWorker.java:88)       
 at 
org.jboss.netty.channel.socket.nio.AbstractNioWorker.process(AbstractNioWorker.java:109)
        at 
org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:312)
        at 
org.jboss.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:90)
        at org.jboss.netty.channel.socket.nio.NioWorker.run(NioWorker.java:178) 
       at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) 
       at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) 
       at java.lang.Thread.run(Thread.java:745)15/06/11 08:36:40 ERROR Utils: 
Uncaught exception in thread task-result-getter-0java.lang.OutOfMemoryError: GC 
overhead limit exceeded        at java.lang.Long.valueOf(Long.java:577)        
at 
com.esotericsoftware.kryo.serializers.DefaultSerializers$LongSerializer.read(DefaultSerializers.java:113)
        at 
com.esotericsoftware.kryo.serializers.DefaultSerializers$LongSerializer.read(DefaultSerializers.java:103)
        at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:732)     
   at 
com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ObjectArraySerializer.read(DefaultArraySerializers.java:338)
        at 
com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ObjectArraySerializer.read(DefaultArraySerializers.java:293)
        at com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:651)        at 
com.esotericsoftware.kryo.serializers.FieldSerializer$ObjectField.read(FieldSerializer.java:605)
        at 
com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:221)
        at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:732)     
   at 
com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ObjectArraySerializer.read(DefaultArraySerializers.java:338)
        at 
com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ObjectArraySerializer.read(DefaultArraySerializers.java:293)
        at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:732)     
   at 
org.apache.spark.serializer.KryoSerializerInstance.deserialize(KryoSerializer.scala:171)
        at 
org.apache.spark.scheduler.DirectTaskResult.value(TaskResult.scala:79)        
at 
org.apache.spark.scheduler.TaskSetManager.handleSuccessfulTask(TaskSetManager.scala:558)
        at 
org.apache.spark.scheduler.TaskSchedulerImpl.handleSuccessfulTask(TaskSchedulerImpl.scala:352)
        at 
org.apache.spark.scheduler.TaskResultGetter$$anon$2$$anonfun$run$1.apply$mcV$sp(TaskResultGetter.scala:80)
        at 
org.apache.spark.scheduler.TaskResultGetter$$anon$2$$anonfun$run$1.apply(TaskResultGetter.scala:49)
        at 
org.apache.spark.scheduler.TaskResultGetter$$anon$2$$anonfun$run$1.apply(TaskResultGetter.scala:49)
        at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1468) 
       at 
org.apache.spark.scheduler.TaskResultGetter$$anon$2.run(TaskResultGetter.scala:48)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) 
       at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) 
       at java.lang.Thread.run(Thread.java:745)15/06/11 08:36:38 ERROR 
ActorSystemImpl: exception on LARS’ timer threadjava.lang.OutOfMemoryError: GC 
overhead limit exceeded        at 
akka.dispatch.AbstractNodeQueue.<init>(AbstractNodeQueue.java:19)        at 
akka.actor.LightArrayRevolverScheduler$TaskQueue.<init>(Scheduler.scala:431)    
    at 
akka.actor.LightArrayRevolverScheduler$$anon$12.nextTick(Scheduler.scala:397)   
     at 
akka.actor.LightArrayRevolverScheduler$$anon$12.run(Scheduler.scala:363)        
at java.lang.Thread.run(Thread.java:745)Exception in thread 
"task-result-getter-0" java.lang.OutOfMemoryError: GC overhead limit exceeded   
     at java.lang.Long.valueOf(Long.java:577)        at 
com.esotericsoftware.kryo.serializers.DefaultSerializers$LongSerializer.read(DefaultSerializers.java:113)
        at 
com.esotericsoftware.kryo.serializers.DefaultSerializers$LongSerializer.read(DefaultSerializers.java:103)
        at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:732)     
   at 
com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ObjectArraySerializer.read(DefaultArraySerializers.java:338)
        at 
com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ObjectArraySerializer.read(DefaultArraySerializers.java:293)
        at com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:651)        at 
com.esotericsoftware.kryo.serializers.FieldSerializer$ObjectField.read(FieldSerializer.java:605)
        at 
com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:221)
        at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:732)     
   at 
com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ObjectArraySerializer.read(DefaultArraySerializers.java:338)
        at 
com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ObjectArraySerializer.read(DefaultArraySerializers.java:293)
        at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:732)     
   at 
org.apache.spark.serializer.KryoSerializerInstance.deserialize(KryoSerializer.scala:171)
        at 
org.apache.spark.scheduler.DirectTaskResult.value(TaskResult.scala:79)        
at 
org.apache.spark.scheduler.TaskSetManager.handleSuccessfulTask(TaskSetManager.scala:558)
        at 
org.apache.spark.scheduler.TaskSchedulerImpl.handleSuccessfulTask(TaskSchedulerImpl.scala:352)
        at 
org.apache.spark.scheduler.TaskResultGetter$$anon$2$$anonfun$run$1.apply$mcV$sp(TaskResultGetter.scala:80)
        at 
org.apache.spark.scheduler.TaskResultGetter$$anon$2$$anonfun$run$1.apply(TaskResultGetter.scala:49)
        at 
org.apache.spark.scheduler.TaskResultGetter$$anon$2$$anonfun$run$1.apply(TaskResultGetter.scala:49)
        at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1468) 
       at 
org.apache.spark.scheduler.TaskResultGetter$$anon$2.run(TaskResultGetter.scala:48)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) 
       at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) 
       at java.lang.Thread.run(Thread.java:745)15/06/11 08:36:41 ERROR 
ActorSystemImpl: Uncaught fatal error from thread [sparkDriver-scheduler-1] 
shutting down ActorSystem [sparkDriver]java.lang.OutOfMemoryError: GC overhead 
limit exceeded        at 
akka.dispatch.AbstractNodeQueue.<init>(AbstractNodeQueue.java:19)        at 
akka.actor.LightArrayRevolverScheduler$TaskQueue.<init>(Scheduler.scala:431)    
    at 
akka.actor.LightArrayRevolverScheduler$$anon$12.nextTick(Scheduler.scala:397)   
     at 
akka.actor.LightArrayRevolverScheduler$$anon$12.run(Scheduler.scala:363)        
at java.lang.Thread.run(Thread.java:745)15/06/11 08:36:46 ERROR 
ActorSystemImpl: Uncaught fatal error from thread 
[sparkDriver-akka.actor.default-dispatcher-4] shutting down ActorSystem 
[sparkDriver]java.lang.OutOfMemoryError: GC overhead limit exceeded15/06/11 
08:36:46 ERROR SparkSQLDriver: Failed in [select distinct 
isr,event_dt,age,age_cod,sex,year,quarter from 
aers.aers_demo_view]org.apache.spark.SparkException: Job cancelled because 
SparkContext was shut down        at 
org.apache.spark.scheduler.DAGScheduler$$anonfun$cleanUpAfterSchedulerStop$1.apply(DAGScheduler.scala:702)
        at 
org.apache.spark.scheduler.DAGScheduler$$anonfun$cleanUpAfterSchedulerStop$1.apply(DAGScheduler.scala:701)
        at scala.collection.mutable.HashSet.foreach(HashSet.scala:79)        at 
org.apache.spark.scheduler.DAGScheduler.cleanUpAfterSchedulerStop(DAGScheduler.scala:701)
        at 
org.apache.spark.scheduler.DAGSchedulerEventProcessActor.postStop(DAGScheduler.scala:1428)
        at 
akka.actor.dungeon.FaultHandling$class.akka$actor$dungeon$FaultHandling$$finishTerminate(FaultHandling.scala:201)
        at 
akka.actor.dungeon.FaultHandling$class.terminate(FaultHandling.scala:163)       
 at akka.actor.ActorCell.terminate(ActorCell.scala:338)        at 
akka.actor.ActorCell.invokeAll$1(ActorCell.scala:431)        at 
akka.actor.ActorCell.systemInvoke(ActorCell.scala:447)        at 
akka.dispatch.Mailbox.processAllSystemMessages(Mailbox.scala:262)        at 
akka.dispatch.Mailbox.run(Mailbox.scala:218)        at 
akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:386)
        at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) 
       at 
scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
        at 
scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)        
at 
scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)15/06/11
 08:36:51 WARN DefaultChannelPipeline: An exception was thrown by a user 
handler while handling an exception event ([id: 0x79935a9b, /10.0.0.35:54028 => 
/10.0.0.19:52016] EXCEPTION: java.lang.OutOfMemoryError: Java heap 
space)java.lang.OutOfMemoryError: Java heap space15/06/11 08:36:52 ERROR 
ActorSystemImpl: Uncaught fatal error from thread 
[sparkDriver-akka.actor.default-dispatcher-5] shutting down ActorSystem 
[sparkDriver]java.lang.OutOfMemoryError: Java heap space15/06/11 08:36:53 WARN 
DefaultChannelPipeline: An exception was thrown by a user handler while 
handling an exception event ([id: 0xcb8c4b5d, /10.0.0.18:46744 => 
/10.0.0.19:52016] EXCEPTION: java.lang.OutOfMemoryError: Java heap 
space)java.lang.OutOfMemoryError: Java heap space15/06/11 08:36:56 WARN 
NioEventLoop: Unexpected exception in the selector 
loop.java.lang.OutOfMemoryError: GC overhead limit exceeded15/06/11 08:36:57 
ERROR ActorSystemImpl: Uncaught fatal error from thread 
[sparkDriver-akka.actor.default-dispatcher-18] shutting down ActorSystem 
[sparkDriver]java.lang.OutOfMemoryError: GC overhead limit exceeded15/06/11 
08:36:58 ERROR Utils: Uncaught exception in thread 
task-result-getter-3java.lang.OutOfMemoryError: GC overhead limit 
exceededException in thread "task-result-getter-3" java.lang.OutOfMemoryError: 
GC overhead limit exceeded15/06/11 08:37:01 ERROR ActorSystemImpl: Uncaught 
fatal error from thread [sparkDriver-akka.actor.default-dispatcher-4] shutting 
down ActorSystem [sparkDriver]java.lang.OutOfMemoryError: Java heap spaceTime 
taken: 70.982 seconds15/06/11 08:37:06 WARN QueuedThreadPool: 4 threads could 
not be stopped15/06/11 08:37:11 ERROR MapOutputTrackerMaster: Error 
communicating with MapOutputTrackerakka.pattern.AskTimeoutException: 
Recipient[Actor[akka://sparkDriver/user/MapOutputTracker#-2109395547]] had 
already been terminated.        at 
akka.pattern.AskableActorRef$.ask$extension(AskSupport.scala:134)        at 
org.apache.spark.MapOutputTracker.askTracker(MapOutputTracker.scala:111)        
at org.apache.spark.MapOutputTracker.sendTracker(MapOutputTracker.scala:122)    
    at org.apache.spark.MapOutputTrackerMaster.stop(MapOutputTracker.scala:330) 
       at org.apache.spark.SparkEnv.stop(SparkEnv.scala:83)        at 
org.apache.spark.SparkContext.stop(SparkContext.scala:1210)        at 
org.apache.spark.sql.hive.thriftserver.SparkSQLEnv$.stop(SparkSQLEnv.scala:66)  
      at 
org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver$$anon$1.run(SparkSQLCLIDriver.scala:107)Exception
 in thread "Thread-3" org.apache.spark.SparkException: Error communicating with 
MapOutputTracker        at 
org.apache.spark.MapOutputTracker.askTracker(MapOutputTracker.scala:116)        
at org.apache.spark.MapOutputTracker.sendTracker(MapOutputTracker.scala:122)    
    at org.apache.spark.MapOutputTrackerMaster.stop(MapOutputTracker.scala:330) 
       at org.apache.spark.SparkEnv.stop(SparkEnv.scala:83)        at 
org.apache.spark.SparkContext.stop(SparkContext.scala:1210)        at 
org.apache.spark.sql.hive.thriftserver.SparkSQLEnv$.stop(SparkSQLEnv.scala:66)  
      at 
org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver$$anon$1.run(SparkSQLCLIDriver.scala:107)Caused
 by: akka.pattern.AskTimeoutException: 
Recipient[Actor[akka://sparkDriver/user/MapOutputTracker#-2109395547]] had 
already been terminated.        at 
akka.pattern.AskableActorRef$.ask$extension(AskSupport.scala:134)        at 
org.apache.spark.MapOutputTracker.askTracker(MapOutputTracker.scala:111)
 


  

Reply via email to