Re: spark-sql from CLI ---EXCEPTION: java.lang.OutOfMemoryError: Java heap space
ok solved. Looks like breathing the the spark-summit SFO air for 3 days helped a lot ! Piping the 7 million records to local disk still runs out of memory.So piped the results into another Hive table. I can live with that :-) /opt/cloudera/parcels/CDH/lib/spark/bin/spark-sql -e use aers; create table unique_aers_demo as select distinct isr,event_dt,age,age_cod,sex,year,quarter from aers.aers_demo_view --driver-memory 4G --total-executor-cores 12 --executor-memory 4G thanks From: Sanjay Subramanian sanjaysubraman...@yahoo.com.INVALID To: user@spark.apache.org user@spark.apache.org Sent: Thursday, June 11, 2015 8:43 AM Subject: spark-sql from CLI ---EXCEPTION: java.lang.OutOfMemoryError: Java heap space hey guys Using Hive and Impala daily intensively.Want to transition to spark-sql in CLI mode Currently in my sandbox I am using the Spark (standalone mode) in the CDH distribution (starving developer version 5.3.3) 3 datanode hadoop cluster32GB RAM per node8 cores per node | spark | 1.2.0+cdh5.3.3+371 | I am testing some stuff on one view and getting memory errorsPossibly reason is default memory per executor showing on 18080 is 512M These options when used to start the spark-sql CLI does not seem to have any effect --total-executor-cores 12 --executor-memory 4G /opt/cloudera/parcels/CDH/lib/spark/bin/spark-sql -e select distinct isr,event_dt,age,age_cod,sex,year,quarter from aers.aers_demo_view aers.aers_demo_view (7 million+ records)===isr bigint case idevent_dt bigint Event dateage double age of patientage_cod string days,months yearssex string M or Fyear intquarter int VIEW DEFINITIONCREATE VIEW `aers.aers_demo_view` AS SELECT `isr` AS `isr`, `event_dt` AS `event_dt`, `age` AS `age`, `age_cod` AS `age_cod`, `gndr_cod` AS `sex`, `year` AS `year`, `quarter` AS `quarter` FROM (SELECT `aers_demo_v1`.`isr`, `aers_demo_v1`.`event_dt`, `aers_demo_v1`.`age`, `aers_demo_v1`.`age_cod`, `aers_demo_v1`.`gndr_cod`, `aers_demo_v1`.`year`, `aers_demo_v1`.`quarter`FROM `aers`.`aers_demo_v1`UNION ALLSELECT `aers_demo_v2`.`isr`, `aers_demo_v2`.`event_dt`, `aers_demo_v2`.`age`, `aers_demo_v2`.`age_cod`, `aers_demo_v2`.`gndr_cod`, `aers_demo_v2`.`year`, `aers_demo_v2`.`quarter`FROM `aers`.`aers_demo_v2`UNION ALLSELECT `aers_demo_v3`.`isr`, `aers_demo_v3`.`event_dt`, `aers_demo_v3`.`age`, `aers_demo_v3`.`age_cod`, `aers_demo_v3`.`gndr_cod`, `aers_demo_v3`.`year`, `aers_demo_v3`.`quarter`FROM `aers`.`aers_demo_v3`UNION ALLSELECT `aers_demo_v4`.`isr`, `aers_demo_v4`.`event_dt`, `aers_demo_v4`.`age`, `aers_demo_v4`.`age_cod`, `aers_demo_v4`.`gndr_cod`, `aers_demo_v4`.`year`, `aers_demo_v4`.`quarter`FROM `aers`.`aers_demo_v4`UNION ALLSELECT `aers_demo_v5`.`primaryid` AS `ISR`, `aers_demo_v5`.`event_dt`, `aers_demo_v5`.`age`, `aers_demo_v5`.`age_cod`, `aers_demo_v5`.`gndr_cod`, `aers_demo_v5`.`year`, `aers_demo_v5`.`quarter`FROM `aers`.`aers_demo_v5`UNION ALLSELECT `aers_demo_v6`.`primaryid` AS `ISR`, `aers_demo_v6`.`event_dt`, `aers_demo_v6`.`age`, `aers_demo_v6`.`age_cod`, `aers_demo_v6`.`sex` AS `GNDR_COD`, `aers_demo_v6`.`year`, `aers_demo_v6`.`quarter`FROM `aers`.`aers_demo_v6`) `aers_demo_view` 15/06/11 08:36:36 WARN DefaultChannelPipeline: An exception was thrown by a user handler while handling an exception event ([id: 0x01b99855, /10.0.0.19:58117 = /10.0.0.19:52016] EXCEPTION: java.lang.OutOfMemoryError: Java heap space)java.lang.OutOfMemoryError: Java heap space at org.jboss.netty.buffer.HeapChannelBuffer.init(HeapChannelBuffer.java:42) at org.jboss.netty.buffer.BigEndianHeapChannelBuffer.init(BigEndianHeapChannelBuffer.java:34) at org.jboss.netty.buffer.ChannelBuffers.buffer(ChannelBuffers.java:134) at org.jboss.netty.buffer.HeapChannelBufferFactory.getBuffer(HeapChannelBufferFactory.java:68) at org.jboss.netty.buffer.AbstractChannelBufferFactory.getBuffer(AbstractChannelBufferFactory.java:48) at org.jboss.netty.handler.codec.frame.FrameDecoder.newCumulationBuffer(FrameDecoder.java:507) at org.jboss.netty.handler.codec.frame.FrameDecoder.updateCumulation(FrameDecoder.java:345) at org.jboss.netty.handler.codec.frame.FrameDecoder.messageReceived(FrameDecoder.java:312) at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:268) at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:255) at org.jboss.netty.channel.socket.nio.NioWorker.read(NioWorker.java:88) at org.jboss.netty.channel.socket.nio.AbstractNioWorker.process(AbstractNioWorker.java:109) at org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:312) at org.jboss.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:90) at
Re: spark-sql from CLI ---EXCEPTION: java.lang.OutOfMemoryError: Java heap space
Hi Josh It was great meeting u in person at the spark-summit SFO yesterday. Thanks for discussing potential solutions to the problem. I verified that 2 hive gateway nodes had not been configured correctly. My bad. I added hive-site.xml to the spark Conf directories for these 2 additional hive gateway nodes. Plus I increased the driver-memory parameter to 1gb. That solved the memory issue. So good news is I can get spark-SQL running in standalone mode (on a CDH 5.3.3 with spark 1.2 on YARN) Not so good news is that the following params have no effect --master yarn --deployment-mode client So the spark-SQL query runs with only ONE executor :-( I am planning on bugging u for 5-10 minutes at the Spark office hours :-) and hopefully we can solve this. Thanks Best regards Sanjay Sent from my iPhone On Jun 13, 2015, at 5:38 PM, Josh Rosen rosenvi...@gmail.com wrote: Try using Spark 1.4.0 with SQL code generation turned on; this should make a huge difference. On Sat, Jun 13, 2015 at 5:08 PM, Sanjay Subramanian sanjaysubraman...@yahoo.com wrote: hey guys I tried the following settings as well. No luck --total-executor-cores 24 --executor-memory 4G BTW on the same cluster , impala absolutely kills it. same query 9 seconds. no memory issues. no issues. In fact I am pretty disappointed with Spark-SQL. I have worked with Hive during the 0.9.x stages and taken projects to production successfully and Hive actually very rarely craps out. Whether the spark folks like what I say or not, yes my expectations are pretty high of Spark-SQL if I were to change the ways we are doing things at my workplace. Until that time, we are going to be hugely dependent on Impala and Hive(with SSD speeding up the shuffle stage , even MR jobs are not that slow now). I want to clarify for those of u who may be asking - why I am not using spark with Scala and insisting on using spark-sql ? - I have already pipelined data from enterprise tables to Hive - I am using CDH 5.3.3 (Cloudera starving developers version) - I have close to 300 tables defined in Hive external tables. - Data if on HDFS - On an average we have 150 columns per table - One an everyday basis , we do crazy amounts of ad-hoc joining of new and old tables in getting datasets ready for supervised ML - I thought that quite simply I can point Spark to the Hive meta and do queries as I do - in fact the existing queries would work as is unless I am using some esoteric Hive/Impala function Anyway, if there are some settings I can use and get spark-sql to run even on standalone mode that will be huge help. On the pre-production cluster I have spark on YARN but could never get it to run fairly complex queries and I have no answers from this group of the CDH groups. So my assumption is that its possibly not solved , else I have always got very quick answers and responses :-) to my questions on all CDH groups, Spark, Hive best regards sanjay From: Josh Rosen rosenvi...@gmail.com To: Sanjay Subramanian sanjaysubraman...@yahoo.com Cc: user@spark.apache.org user@spark.apache.org Sent: Friday, June 12, 2015 7:15 AM Subject: Re: spark-sql from CLI ---EXCEPTION: java.lang.OutOfMemoryError: Java heap space It sounds like this might be caused by a memory configuration problem. In addition to looking at the executor memory, I'd also bump up the driver memory, since it appears that your shell is running out of memory when collecting a large query result. Sent from my phone On Jun 11, 2015, at 8:43 AM, Sanjay Subramanian sanjaysubraman...@yahoo.com.INVALID wrote: hey guys Using Hive and Impala daily intensively. Want to transition to spark-sql in CLI mode Currently in my sandbox I am using the Spark (standalone mode) in the CDH distribution (starving developer version 5.3.3) 3 datanode hadoop cluster 32GB RAM per node 8 cores per node spark 1.2.0+cdh5.3.3+371 I am testing some stuff on one view and getting memory errors Possibly reason is default memory per executor showing on 18080 is 512M These options when used to start the spark-sql CLI does not seem to have any effect --total-executor-cores 12 --executor-memory 4G /opt/cloudera/parcels/CDH/lib/spark/bin/spark-sql -e select distinct isr,event_dt,age,age_cod,sex,year,quarter from aers.aers_demo_view aers.aers_demo_view (7 million+ records) === isr bigint case id event_dtbigint Event date age double age of patient age_cod string days,months years sex string M or F yearint quarter int VIEW DEFINITION CREATE VIEW `aers.aers_demo_view` AS SELECT `isr` AS `isr`, `event_dt` AS `event_dt`, `age` AS `age`, `age_cod` AS `age_cod`, `gndr_cod` AS `sex`, `year` AS `year`, `quarter` AS `quarter` FROM (SELECT `aers_demo_v1`.`isr`, `aers_demo_v1`.`event_dt`, `aers_demo_v1
Re: spark-sql from CLI ---EXCEPTION: java.lang.OutOfMemoryError: Java heap space
hey guys I tried the following settings as well. No luck --total-executor-cores 24 --executor-memory 4G BTW on the same cluster , impala absolutely kills it. same query 9 seconds. no memory issues. no issues. In fact I am pretty disappointed with Spark-SQL.I have worked with Hive during the 0.9.x stages and taken projects to production successfully and Hive actually very rarely craps out. Whether the spark folks like what I say or not, yes my expectations are pretty high of Spark-SQL if I were to change the ways we are doing things at my workplace.Until that time, we are going to be hugely dependent on Impala and Hive(with SSD speeding up the shuffle stage , even MR jobs are not that slow now). I want to clarify for those of u who may be asking - why I am not using spark with Scala and insisting on using spark-sql ? - I have already pipelined data from enterprise tables to Hive- I am using CDH 5.3.3 (Cloudera starving developers version)- I have close to 300 tables defined in Hive external tables. - Data if on HDFS- On an average we have 150 columns per table- One an everyday basis , we do crazy amounts of ad-hoc joining of new and old tables in getting datasets ready for supervised ML- I thought that quite simply I can point Spark to the Hive meta and do queries as I do - in fact the existing queries would work as is unless I am using some esoteric Hive/Impala function Anyway, if there are some settings I can use and get spark-sql to run even on standalone mode that will be huge help. On the pre-production cluster I have spark on YARN but could never get it to run fairly complex queries and I have no answers from this group of the CDH groups. So my assumption is that its possibly not solved , else I have always got very quick answers and responses :-) to my questions on all CDH groups, Spark, Hive best regards sanjay From: Josh Rosen rosenvi...@gmail.com To: Sanjay Subramanian sanjaysubraman...@yahoo.com Cc: user@spark.apache.org user@spark.apache.org Sent: Friday, June 12, 2015 7:15 AM Subject: Re: spark-sql from CLI ---EXCEPTION: java.lang.OutOfMemoryError: Java heap space It sounds like this might be caused by a memory configuration problem. In addition to looking at the executor memory, I'd also bump up the driver memory, since it appears that your shell is running out of memory when collecting a large query result. Sent from my phone On Jun 11, 2015, at 8:43 AM, Sanjay Subramanian sanjaysubraman...@yahoo.com.INVALID wrote: hey guys Using Hive and Impala daily intensively.Want to transition to spark-sql in CLI mode Currently in my sandbox I am using the Spark (standalone mode) in the CDH distribution (starving developer version 5.3.3) 3 datanode hadoop cluster32GB RAM per node8 cores per node | spark | 1.2.0+cdh5.3.3+371 | I am testing some stuff on one view and getting memory errorsPossibly reason is default memory per executor showing on 18080 is 512M These options when used to start the spark-sql CLI does not seem to have any effect --total-executor-cores 12 --executor-memory 4G /opt/cloudera/parcels/CDH/lib/spark/bin/spark-sql -e select distinct isr,event_dt,age,age_cod,sex,year,quarter from aers.aers_demo_view aers.aers_demo_view (7 million+ records)===isr bigint case idevent_dt bigint Event dateage double age of patientage_cod string days,months yearssex string M or Fyear intquarter int VIEW DEFINITIONCREATE VIEW `aers.aers_demo_view` AS SELECT `isr` AS `isr`, `event_dt` AS `event_dt`, `age` AS `age`, `age_cod` AS `age_cod`, `gndr_cod` AS `sex`, `year` AS `year`, `quarter` AS `quarter` FROM (SELECT `aers_demo_v1`.`isr`, `aers_demo_v1`.`event_dt`, `aers_demo_v1`.`age`, `aers_demo_v1`.`age_cod`, `aers_demo_v1`.`gndr_cod`, `aers_demo_v1`.`year`, `aers_demo_v1`.`quarter`FROM `aers`.`aers_demo_v1`UNION ALLSELECT `aers_demo_v2`.`isr`, `aers_demo_v2`.`event_dt`, `aers_demo_v2`.`age`, `aers_demo_v2`.`age_cod`, `aers_demo_v2`.`gndr_cod`, `aers_demo_v2`.`year`, `aers_demo_v2`.`quarter`FROM `aers`.`aers_demo_v2`UNION ALLSELECT `aers_demo_v3`.`isr`, `aers_demo_v3`.`event_dt`, `aers_demo_v3`.`age`, `aers_demo_v3`.`age_cod`, `aers_demo_v3`.`gndr_cod`, `aers_demo_v3`.`year`, `aers_demo_v3`.`quarter`FROM `aers`.`aers_demo_v3`UNION ALLSELECT `aers_demo_v4`.`isr`, `aers_demo_v4`.`event_dt`, `aers_demo_v4`.`age`, `aers_demo_v4`.`age_cod`, `aers_demo_v4`.`gndr_cod`, `aers_demo_v4`.`year`, `aers_demo_v4`.`quarter`FROM `aers`.`aers_demo_v4`UNION ALLSELECT `aers_demo_v5`.`primaryid` AS `ISR`, `aers_demo_v5`.`event_dt`, `aers_demo_v5`.`age`, `aers_demo_v5`.`age_cod`, `aers_demo_v5`.`gndr_cod`, `aers_demo_v5`.`year`, `aers_demo_v5`.`quarter`FROM `aers`.`aers_demo_v5`UNION ALLSELECT `aers_demo_v6`.`primaryid` AS `ISR`, `aers_demo_v6`.`event_dt`, `aers_demo_v6`.`age`, `aers_demo_v6`.`age_cod
Re: spark-sql from CLI ---EXCEPTION: java.lang.OutOfMemoryError: Java heap space
Try using Spark 1.4.0 with SQL code generation turned on; this should make a huge difference. On Sat, Jun 13, 2015 at 5:08 PM, Sanjay Subramanian sanjaysubraman...@yahoo.com wrote: hey guys I tried the following settings as well. No luck --total-executor-cores 24 --executor-memory 4G BTW on the same cluster , impala absolutely kills it. same query 9 seconds. no memory issues. no issues. In fact I am pretty disappointed with Spark-SQL. I have worked with Hive during the 0.9.x stages and taken projects to production successfully and Hive actually very rarely craps out. Whether the spark folks like what I say or not, yes my expectations are pretty high of Spark-SQL if I were to change the ways we are doing things at my workplace. Until that time, we are going to be hugely dependent on Impala and Hive(with SSD speeding up the shuffle stage , even MR jobs are not that slow now). I want to clarify for those of u who may be asking - why I am not using spark with Scala and insisting on using spark-sql ? - I have already pipelined data from enterprise tables to Hive - I am using CDH 5.3.3 (Cloudera starving developers version) - I have close to 300 tables defined in Hive external tables. - Data if on HDFS - On an average we have 150 columns per table - One an everyday basis , we do crazy amounts of ad-hoc joining of new and old tables in getting datasets ready for supervised ML - I thought that quite simply I can point Spark to the Hive meta and do queries as I do - in fact the existing queries would work as is unless I am using some esoteric Hive/Impala function Anyway, if there are some settings I can use and get spark-sql to run even on standalone mode that will be huge help. On the pre-production cluster I have spark on YARN but could never get it to run fairly complex queries and I have no answers from this group of the CDH groups. So my assumption is that its possibly not solved , else I have always got very quick answers and responses :-) to my questions on all CDH groups, Spark, Hive best regards sanjay -- *From:* Josh Rosen rosenvi...@gmail.com *To:* Sanjay Subramanian sanjaysubraman...@yahoo.com *Cc:* user@spark.apache.org user@spark.apache.org *Sent:* Friday, June 12, 2015 7:15 AM *Subject:* Re: spark-sql from CLI ---EXCEPTION: java.lang.OutOfMemoryError: Java heap space It sounds like this might be caused by a memory configuration problem. In addition to looking at the executor memory, I'd also bump up the driver memory, since it appears that your shell is running out of memory when collecting a large query result. Sent from my phone On Jun 11, 2015, at 8:43 AM, Sanjay Subramanian sanjaysubraman...@yahoo.com.INVALID wrote: hey guys Using Hive and Impala daily intensively. Want to transition to spark-sql in CLI mode Currently in my sandbox I am using the Spark (standalone mode) in the CDH distribution (starving developer version 5.3.3) 3 datanode hadoop cluster 32GB RAM per node 8 cores per node spark 1.2.0+cdh5.3.3+371 I am testing some stuff on one view and getting memory errors Possibly reason is default memory per executor showing on 18080 is 512M These options when used to start the spark-sql CLI does not seem to have any effect --total-executor-cores 12 --executor-memory 4G /opt/cloudera/parcels/CDH/lib/spark/bin/spark-sql -e select distinct isr,event_dt,age,age_cod,sex,year,quarter from aers.aers_demo_view aers.aers_demo_view (7 million+ records) === isr bigint case id event_dtbigint Event date age double age of patient age_cod string days,months years sex string M or F yearint quarter int VIEW DEFINITION CREATE VIEW `aers.aers_demo_view` AS SELECT `isr` AS `isr`, `event_dt` AS `event_dt`, `age` AS `age`, `age_cod` AS `age_cod`, `gndr_cod` AS `sex`, `year` AS `year`, `quarter` AS `quarter` FROM (SELECT `aers_demo_v1`.`isr`, `aers_demo_v1`.`event_dt`, `aers_demo_v1`.`age`, `aers_demo_v1`.`age_cod`, `aers_demo_v1`.`gndr_cod`, `aers_demo_v1`.`year`, `aers_demo_v1`.`quarter` FROM `aers`.`aers_demo_v1` UNION ALL SELECT `aers_demo_v2`.`isr`, `aers_demo_v2`.`event_dt`, `aers_demo_v2`.`age`, `aers_demo_v2`.`age_cod`, `aers_demo_v2`.`gndr_cod`, `aers_demo_v2`.`year`, `aers_demo_v2`.`quarter` FROM `aers`.`aers_demo_v2` UNION ALL SELECT `aers_demo_v3`.`isr`, `aers_demo_v3`.`event_dt`, `aers_demo_v3`.`age`, `aers_demo_v3`.`age_cod`, `aers_demo_v3`.`gndr_cod`, `aers_demo_v3`.`year`, `aers_demo_v3`.`quarter` FROM `aers`.`aers_demo_v3` UNION ALL SELECT `aers_demo_v4`.`isr`, `aers_demo_v4`.`event_dt`, `aers_demo_v4`.`age`, `aers_demo_v4`.`age_cod`, `aers_demo_v4`.`gndr_cod`, `aers_demo_v4`.`year`, `aers_demo_v4`.`quarter` FROM `aers`.`aers_demo_v4` UNION ALL SELECT
Re: spark-sql from CLI ---EXCEPTION: java.lang.OutOfMemoryError: Java heap space
Sent from my phone On Jun 11, 2015, at 8:43 AM, Sanjay Subramanian sanjaysubraman...@yahoo.com.INVALID wrote: hey guys Using Hive and Impala daily intensively. Want to transition to spark-sql in CLI mode Currently in my sandbox I am using the Spark (standalone mode) in the CDH distribution (starving developer version 5.3.3) 3 datanode hadoop cluster 32GB RAM per node 8 cores per node spark 1.2.0+cdh5.3.3+371 I am testing some stuff on one view and getting memory errors Possibly reason is default memory per executor showing on 18080 is 512M These options when used to start the spark-sql CLI does not seem to have any effect --total-executor-cores 12 --executor-memory 4G /opt/cloudera/parcels/CDH/lib/spark/bin/spark-sql -e select distinct isr,event_dt,age,age_cod,sex,year,quarter from aers.aers_demo_view aers.aers_demo_view (7 million+ records) === isr bigint case id event_dtbigint Event date age double age of patient age_cod string days,months years sex string M or F yearint quarter int VIEW DEFINITION CREATE VIEW `aers.aers_demo_view` AS SELECT `isr` AS `isr`, `event_dt` AS `event_dt`, `age` AS `age`, `age_cod` AS `age_cod`, `gndr_cod` AS `sex`, `year` AS `year`, `quarter` AS `quarter` FROM (SELECT `aers_demo_v1`.`isr`, `aers_demo_v1`.`event_dt`, `aers_demo_v1`.`age`, `aers_demo_v1`.`age_cod`, `aers_demo_v1`.`gndr_cod`, `aers_demo_v1`.`year`, `aers_demo_v1`.`quarter` FROM `aers`.`aers_demo_v1` UNION ALL SELECT `aers_demo_v2`.`isr`, `aers_demo_v2`.`event_dt`, `aers_demo_v2`.`age`, `aers_demo_v2`.`age_cod`, `aers_demo_v2`.`gndr_cod`, `aers_demo_v2`.`year`, `aers_demo_v2`.`quarter` FROM `aers`.`aers_demo_v2` UNION ALL SELECT `aers_demo_v3`.`isr`, `aers_demo_v3`.`event_dt`, `aers_demo_v3`.`age`, `aers_demo_v3`.`age_cod`, `aers_demo_v3`.`gndr_cod`, `aers_demo_v3`.`year`, `aers_demo_v3`.`quarter` FROM `aers`.`aers_demo_v3` UNION ALL SELECT `aers_demo_v4`.`isr`, `aers_demo_v4`.`event_dt`, `aers_demo_v4`.`age`, `aers_demo_v4`.`age_cod`, `aers_demo_v4`.`gndr_cod`, `aers_demo_v4`.`year`, `aers_demo_v4`.`quarter` FROM `aers`.`aers_demo_v4` UNION ALL SELECT `aers_demo_v5`.`primaryid` AS `ISR`, `aers_demo_v5`.`event_dt`, `aers_demo_v5`.`age`, `aers_demo_v5`.`age_cod`, `aers_demo_v5`.`gndr_cod`, `aers_demo_v5`.`year`, `aers_demo_v5`.`quarter` FROM `aers`.`aers_demo_v5` UNION ALL SELECT `aers_demo_v6`.`primaryid` AS `ISR`, `aers_demo_v6`.`event_dt`, `aers_demo_v6`.`age`, `aers_demo_v6`.`age_cod`, `aers_demo_v6`.`sex` AS `GNDR_COD`, `aers_demo_v6`.`year`, `aers_demo_v6`.`quarter` FROM `aers`.`aers_demo_v6`) `aers_demo_view` 15/06/11 08:36:36 WARN DefaultChannelPipeline: An exception was thrown by a user handler while handling an exception event ([id: 0x01b99855, /10.0.0.19:58117 = /10.0.0.19:52016] EXCEPTION: java.lang.OutOfMemoryError: Java heap space) java.lang.OutOfMemoryError: Java heap space at org.jboss.netty.buffer.HeapChannelBuffer.init(HeapChannelBuffer.java:42) at org.jboss.netty.buffer.BigEndianHeapChannelBuffer.init(BigEndianHeapChannelBuffer.java:34) at org.jboss.netty.buffer.ChannelBuffers.buffer(ChannelBuffers.java:134) at org.jboss.netty.buffer.HeapChannelBufferFactory.getBuffer(HeapChannelBufferFactory.java:68) at org.jboss.netty.buffer.AbstractChannelBufferFactory.getBuffer(AbstractChannelBufferFactory.java:48) at org.jboss.netty.handler.codec.frame.FrameDecoder.newCumulationBuffer(FrameDecoder.java:507) at org.jboss.netty.handler.codec.frame.FrameDecoder.updateCumulation(FrameDecoder.java:345) at org.jboss.netty.handler.codec.frame.FrameDecoder.messageReceived(FrameDecoder.java:312) at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:268) at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:255) at org.jboss.netty.channel.socket.nio.NioWorker.read(NioWorker.java:88) at org.jboss.netty.channel.socket.nio.AbstractNioWorker.process(AbstractNioWorker.java:109) at org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:312) at org.jboss.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:90) at org.jboss.netty.channel.socket.nio.NioWorker.run(NioWorker.java:178) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) 15/06/11 08:36:40 ERROR Utils: Uncaught exception in thread task-result-getter-0 java.lang.OutOfMemoryError: GC
Re: spark-sql from CLI ---EXCEPTION: java.lang.OutOfMemoryError: Java heap space
It sounds like this might be caused by a memory configuration problem. In addition to looking at the executor memory, I'd also bump up the driver memory, since it appears that your shell is running out of memory when collecting a large query result. Sent from my phone On Jun 11, 2015, at 8:43 AM, Sanjay Subramanian sanjaysubraman...@yahoo.com.INVALID wrote: hey guys Using Hive and Impala daily intensively. Want to transition to spark-sql in CLI mode Currently in my sandbox I am using the Spark (standalone mode) in the CDH distribution (starving developer version 5.3.3) 3 datanode hadoop cluster 32GB RAM per node 8 cores per node spark 1.2.0+cdh5.3.3+371 I am testing some stuff on one view and getting memory errors Possibly reason is default memory per executor showing on 18080 is 512M These options when used to start the spark-sql CLI does not seem to have any effect --total-executor-cores 12 --executor-memory 4G /opt/cloudera/parcels/CDH/lib/spark/bin/spark-sql -e select distinct isr,event_dt,age,age_cod,sex,year,quarter from aers.aers_demo_view aers.aers_demo_view (7 million+ records) === isr bigint case id event_dtbigint Event date age double age of patient age_cod string days,months years sex string M or F yearint quarter int VIEW DEFINITION CREATE VIEW `aers.aers_demo_view` AS SELECT `isr` AS `isr`, `event_dt` AS `event_dt`, `age` AS `age`, `age_cod` AS `age_cod`, `gndr_cod` AS `sex`, `year` AS `year`, `quarter` AS `quarter` FROM (SELECT `aers_demo_v1`.`isr`, `aers_demo_v1`.`event_dt`, `aers_demo_v1`.`age`, `aers_demo_v1`.`age_cod`, `aers_demo_v1`.`gndr_cod`, `aers_demo_v1`.`year`, `aers_demo_v1`.`quarter` FROM `aers`.`aers_demo_v1` UNION ALL SELECT `aers_demo_v2`.`isr`, `aers_demo_v2`.`event_dt`, `aers_demo_v2`.`age`, `aers_demo_v2`.`age_cod`, `aers_demo_v2`.`gndr_cod`, `aers_demo_v2`.`year`, `aers_demo_v2`.`quarter` FROM `aers`.`aers_demo_v2` UNION ALL SELECT `aers_demo_v3`.`isr`, `aers_demo_v3`.`event_dt`, `aers_demo_v3`.`age`, `aers_demo_v3`.`age_cod`, `aers_demo_v3`.`gndr_cod`, `aers_demo_v3`.`year`, `aers_demo_v3`.`quarter` FROM `aers`.`aers_demo_v3` UNION ALL SELECT `aers_demo_v4`.`isr`, `aers_demo_v4`.`event_dt`, `aers_demo_v4`.`age`, `aers_demo_v4`.`age_cod`, `aers_demo_v4`.`gndr_cod`, `aers_demo_v4`.`year`, `aers_demo_v4`.`quarter` FROM `aers`.`aers_demo_v4` UNION ALL SELECT `aers_demo_v5`.`primaryid` AS `ISR`, `aers_demo_v5`.`event_dt`, `aers_demo_v5`.`age`, `aers_demo_v5`.`age_cod`, `aers_demo_v5`.`gndr_cod`, `aers_demo_v5`.`year`, `aers_demo_v5`.`quarter` FROM `aers`.`aers_demo_v5` UNION ALL SELECT `aers_demo_v6`.`primaryid` AS `ISR`, `aers_demo_v6`.`event_dt`, `aers_demo_v6`.`age`, `aers_demo_v6`.`age_cod`, `aers_demo_v6`.`sex` AS `GNDR_COD`, `aers_demo_v6`.`year`, `aers_demo_v6`.`quarter` FROM `aers`.`aers_demo_v6`) `aers_demo_view` 15/06/11 08:36:36 WARN DefaultChannelPipeline: An exception was thrown by a user handler while handling an exception event ([id: 0x01b99855, /10.0.0.19:58117 = /10.0.0.19:52016] EXCEPTION: java.lang.OutOfMemoryError: Java heap space) java.lang.OutOfMemoryError: Java heap space at org.jboss.netty.buffer.HeapChannelBuffer.init(HeapChannelBuffer.java:42) at org.jboss.netty.buffer.BigEndianHeapChannelBuffer.init(BigEndianHeapChannelBuffer.java:34) at org.jboss.netty.buffer.ChannelBuffers.buffer(ChannelBuffers.java:134) at org.jboss.netty.buffer.HeapChannelBufferFactory.getBuffer(HeapChannelBufferFactory.java:68) at org.jboss.netty.buffer.AbstractChannelBufferFactory.getBuffer(AbstractChannelBufferFactory.java:48) at org.jboss.netty.handler.codec.frame.FrameDecoder.newCumulationBuffer(FrameDecoder.java:507) at org.jboss.netty.handler.codec.frame.FrameDecoder.updateCumulation(FrameDecoder.java:345) at org.jboss.netty.handler.codec.frame.FrameDecoder.messageReceived(FrameDecoder.java:312) at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:268) at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:255) at org.jboss.netty.channel.socket.nio.NioWorker.read(NioWorker.java:88) at org.jboss.netty.channel.socket.nio.AbstractNioWorker.process(AbstractNioWorker.java:109) at org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:312) at org.jboss.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:90) at org.jboss.netty.channel.socket.nio.NioWorker.run(NioWorker.java:178) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at