Re: spark-sql from CLI ---EXCEPTION: java.lang.OutOfMemoryError: Java heap space

2015-06-17 Thread Sanjay Subramanian
ok solved. Looks like breathing the the spark-summit SFO air for 3 days helped 
a lot !
Piping the 7 million records to local disk still runs out of memory.So piped 
the results into another Hive table. I can live with that :-) 
/opt/cloudera/parcels/CDH/lib/spark/bin/spark-sql -e use aers; create table 
unique_aers_demo as select distinct isr,event_dt,age,age_cod,sex,year,quarter 
from aers.aers_demo_view  --driver-memory 4G --total-executor-cores 12 
--executor-memory 4G

thanks

  From: Sanjay Subramanian sanjaysubraman...@yahoo.com.INVALID
 To: user@spark.apache.org user@spark.apache.org 
 Sent: Thursday, June 11, 2015 8:43 AM
 Subject: spark-sql from CLI ---EXCEPTION: java.lang.OutOfMemoryError: Java 
heap space
   
hey guys
Using Hive and Impala daily intensively.Want to transition to spark-sql in CLI 
mode
Currently in my sandbox I am using the Spark (standalone mode) in the CDH 
distribution (starving developer version 5.3.3)
3 datanode hadoop cluster32GB RAM per node8 cores per node



| spark | 1.2.0+cdh5.3.3+371 |



I am testing some stuff on one view and getting memory errorsPossibly reason is 
default memory per executor showing on 18080 is 512M

These options when used to start the spark-sql CLI does not seem to have any 
effect --total-executor-cores 12 --executor-memory 4G



/opt/cloudera/parcels/CDH/lib/spark/bin/spark-sql -e  select distinct 
isr,event_dt,age,age_cod,sex,year,quarter from aers.aers_demo_view

aers.aers_demo_view (7 million+ records)===isr     bigint  case 
idevent_dt        bigint  Event dateage     double  age of patientage_cod 
string  days,months yearssex     string  M or Fyear    intquarter int

VIEW DEFINITIONCREATE VIEW `aers.aers_demo_view` AS SELECT 
`isr` AS `isr`, `event_dt` AS `event_dt`, `age` AS `age`, `age_cod` AS 
`age_cod`, `gndr_cod` AS `sex`, `year` AS `year`, `quarter` AS `quarter` FROM 
(SELECT   `aers_demo_v1`.`isr`,   `aers_demo_v1`.`event_dt`,   
`aers_demo_v1`.`age`,   `aers_demo_v1`.`age_cod`,   `aers_demo_v1`.`gndr_cod`,  
 `aers_demo_v1`.`year`,   `aers_demo_v1`.`quarter`FROM  
`aers`.`aers_demo_v1`UNION ALLSELECT   `aers_demo_v2`.`isr`,   
`aers_demo_v2`.`event_dt`,   `aers_demo_v2`.`age`,   `aers_demo_v2`.`age_cod`,  
 `aers_demo_v2`.`gndr_cod`,   `aers_demo_v2`.`year`,   
`aers_demo_v2`.`quarter`FROM  `aers`.`aers_demo_v2`UNION ALLSELECT   
`aers_demo_v3`.`isr`,   `aers_demo_v3`.`event_dt`,   `aers_demo_v3`.`age`,   
`aers_demo_v3`.`age_cod`,   `aers_demo_v3`.`gndr_cod`,   `aers_demo_v3`.`year`, 
  `aers_demo_v3`.`quarter`FROM  `aers`.`aers_demo_v3`UNION ALLSELECT   
`aers_demo_v4`.`isr`,   `aers_demo_v4`.`event_dt`,   `aers_demo_v4`.`age`,   
`aers_demo_v4`.`age_cod`,   `aers_demo_v4`.`gndr_cod`,   `aers_demo_v4`.`year`, 
  `aers_demo_v4`.`quarter`FROM  `aers`.`aers_demo_v4`UNION ALLSELECT   
`aers_demo_v5`.`primaryid` AS `ISR`,   `aers_demo_v5`.`event_dt`,   
`aers_demo_v5`.`age`,   `aers_demo_v5`.`age_cod`,   `aers_demo_v5`.`gndr_cod`,  
 `aers_demo_v5`.`year`,   `aers_demo_v5`.`quarter`FROM  
`aers`.`aers_demo_v5`UNION ALLSELECT   `aers_demo_v6`.`primaryid` AS `ISR`,   
`aers_demo_v6`.`event_dt`,   `aers_demo_v6`.`age`,   `aers_demo_v6`.`age_cod`,  
 `aers_demo_v6`.`sex` AS `GNDR_COD`,   `aers_demo_v6`.`year`,   
`aers_demo_v6`.`quarter`FROM  `aers`.`aers_demo_v6`) `aers_demo_view`






15/06/11 08:36:36 WARN DefaultChannelPipeline: An exception was thrown by a 
user handler while handling an exception event ([id: 0x01b99855, 
/10.0.0.19:58117 = /10.0.0.19:52016] EXCEPTION: java.lang.OutOfMemoryError: 
Java heap space)java.lang.OutOfMemoryError: Java heap space        at 
org.jboss.netty.buffer.HeapChannelBuffer.init(HeapChannelBuffer.java:42)      
  at 
org.jboss.netty.buffer.BigEndianHeapChannelBuffer.init(BigEndianHeapChannelBuffer.java:34)
        at 
org.jboss.netty.buffer.ChannelBuffers.buffer(ChannelBuffers.java:134)        at 
org.jboss.netty.buffer.HeapChannelBufferFactory.getBuffer(HeapChannelBufferFactory.java:68)
        at 
org.jboss.netty.buffer.AbstractChannelBufferFactory.getBuffer(AbstractChannelBufferFactory.java:48)
        at 
org.jboss.netty.handler.codec.frame.FrameDecoder.newCumulationBuffer(FrameDecoder.java:507)
        at 
org.jboss.netty.handler.codec.frame.FrameDecoder.updateCumulation(FrameDecoder.java:345)
        at 
org.jboss.netty.handler.codec.frame.FrameDecoder.messageReceived(FrameDecoder.java:312)
        at 
org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:268)        
at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:255)      
  at org.jboss.netty.channel.socket.nio.NioWorker.read(NioWorker.java:88)       
 at 
org.jboss.netty.channel.socket.nio.AbstractNioWorker.process(AbstractNioWorker.java:109)
        at 
org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:312)
        at 
org.jboss.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:90)
        at 

Re: spark-sql from CLI ---EXCEPTION: java.lang.OutOfMemoryError: Java heap space

2015-06-16 Thread Sanjay Subramanian
Hi Josh

It was great meeting u in person at the spark-summit SFO yesterday.
Thanks for discussing potential solutions to the problem.
I verified that 2 hive gateway nodes had not been configured correctly. My bad.
I added hive-site.xml to the spark Conf directories for these 2 additional hive 
gateway nodes. 

Plus I increased the driver-memory parameter to 1gb. That solved the memory 
issue. 

So good news is I can get spark-SQL running in standalone mode (on a CDH 5.3.3 
with spark 1.2 on YARN)

Not so good news is that the following params have no effect

--master yarn   --deployment-mode client

So the spark-SQL query runs with only ONE executor :-(

I am planning on bugging u for 5-10 minutes at the Spark office hours :-) and 
hopefully we can solve this. 

Thanks 
Best regards 
Sanjay 

Sent from my iPhone

 On Jun 13, 2015, at 5:38 PM, Josh Rosen rosenvi...@gmail.com wrote:
 
 Try using Spark 1.4.0 with SQL code generation turned on; this should make a 
 huge difference.
 
 On Sat, Jun 13, 2015 at 5:08 PM, Sanjay Subramanian 
 sanjaysubraman...@yahoo.com wrote:
 hey guys
 
 I tried the following settings as well. No luck
 
 --total-executor-cores 24 --executor-memory 4G
 
 
 BTW on the same cluster , impala absolutely kills it. same query 9 seconds. 
 no memory issues. no issues.
 
 In fact I am pretty disappointed with Spark-SQL.
 I have worked with Hive during the 0.9.x stages and taken projects to 
 production successfully and Hive actually very rarely craps out.
 
 Whether the spark folks like what I say or not, yes my expectations are 
 pretty high of Spark-SQL if I were to change the ways we are doing things at 
 my workplace.
 Until that time, we are going to be hugely dependent on Impala and  
 Hive(with SSD speeding up the shuffle stage , even MR jobs are not that slow 
 now).
 
 I want to clarify for those of u who may be asking - why I am not using 
 spark with Scala and insisting on using spark-sql ?
 
 - I have already pipelined data from enterprise tables to Hive
 - I am using CDH 5.3.3 (Cloudera starving developers version)
 - I have close to 300 tables defined in Hive external tables.
 - Data if on HDFS
 - On an average we have 150 columns per table
 - One an everyday basis , we do crazy amounts of ad-hoc joining of new and 
 old tables in getting datasets ready for supervised ML
 - I thought that quite simply I can point Spark to the Hive meta and do 
 queries as I do - in fact the existing queries would work as is unless I am 
 using some esoteric Hive/Impala function
 
 Anyway, if there are some settings I can use and get spark-sql to run even 
 on standalone mode that will be huge help.
 
 On the pre-production cluster I have spark on YARN but could never get it to 
 run fairly complex queries and I have no answers from this group of the CDH 
 groups.
 
 So my assumption is that its possibly not solved , else I have always got 
 very quick answers and responses :-) to my questions on all CDH groups, 
 Spark, Hive
 
 best regards
 
 sanjay
 
  
 
 From: Josh Rosen rosenvi...@gmail.com
 To: Sanjay Subramanian sanjaysubraman...@yahoo.com 
 Cc: user@spark.apache.org user@spark.apache.org 
 Sent: Friday, June 12, 2015 7:15 AM
 Subject: Re: spark-sql from CLI ---EXCEPTION: java.lang.OutOfMemoryError: 
 Java heap space
 
 It sounds like this might be caused by a memory configuration problem.  In 
 addition to looking at the executor memory, I'd also bump up the driver 
 memory, since it appears that your shell is running out of memory when 
 collecting a large query result.
 
 Sent from my phone
 
 
 
 On Jun 11, 2015, at 8:43 AM, Sanjay Subramanian 
 sanjaysubraman...@yahoo.com.INVALID wrote:
 
 hey guys
 
 Using Hive and Impala daily intensively.
 Want to transition to spark-sql in CLI mode
 
 Currently in my sandbox I am using the Spark (standalone mode) in the CDH 
 distribution (starving developer version 5.3.3)
 3 datanode hadoop cluster
 32GB RAM per node
 8 cores per node
 
 spark   
 1.2.0+cdh5.3.3+371
 
 
 I am testing some stuff on one view and getting memory errors
 Possibly reason is default memory per executor showing on 18080 is 
 512M
 
 These options when used to start the spark-sql CLI does not seem to have 
 any effect 
 --total-executor-cores 12 --executor-memory 4G
 
 
 
 /opt/cloudera/parcels/CDH/lib/spark/bin/spark-sql -e  select distinct 
 isr,event_dt,age,age_cod,sex,year,quarter from aers.aers_demo_view
 
 aers.aers_demo_view (7 million+ records)
 ===
 isr bigint  case id
 event_dtbigint  Event date
 age double  age of patient
 age_cod string  days,months years
 sex string  M or F
 yearint
 quarter int
 
 
 VIEW DEFINITION
 
 CREATE VIEW `aers.aers_demo_view` AS SELECT `isr` AS `isr`, `event_dt` AS 
 `event_dt`, `age` AS `age`, `age_cod` AS `age_cod`, `gndr_cod` AS `sex`, 
 `year` AS `year`, `quarter` AS `quarter` FROM (SELECT
`aers_demo_v1`.`isr`,
`aers_demo_v1`.`event_dt`,
`aers_demo_v1

Re: spark-sql from CLI ---EXCEPTION: java.lang.OutOfMemoryError: Java heap space

2015-06-13 Thread Sanjay Subramanian
hey guys
I tried the following settings as well. No luck
--total-executor-cores 24 --executor-memory 4G

BTW on the same cluster , impala absolutely kills it. same query 9 seconds. no 
memory issues. no issues.
In fact I am pretty disappointed with Spark-SQL.I have worked with Hive during 
the 0.9.x stages and taken projects to production successfully and Hive 
actually very rarely craps out.
Whether the spark folks like what I say or not, yes my expectations are pretty 
high of Spark-SQL if I were to change the ways we are doing things at my 
workplace.Until that time, we are going to be hugely dependent on Impala and  
Hive(with SSD speeding up the shuffle stage , even MR jobs are not that slow 
now).
I want to clarify for those of u who may be asking - why I am not using spark 
with Scala and insisting on using spark-sql ?
- I have already pipelined data from enterprise tables to Hive- I am using CDH 
5.3.3 (Cloudera starving developers version)- I have close to 300 tables 
defined in Hive external tables.
- Data if on HDFS- On an average we have 150 columns per table- One an everyday 
basis , we do crazy amounts of ad-hoc joining of new and old tables in getting 
datasets ready for supervised ML- I thought that quite simply I can point Spark 
to the Hive meta and do queries as I do - in fact the existing queries would 
work as is unless I am using some esoteric Hive/Impala function
Anyway, if there are some settings I can use and get spark-sql to run even on 
standalone mode that will be huge help.
On the pre-production cluster I have spark on YARN but could never get it to 
run fairly complex queries and I have no answers from this group of the CDH 
groups.
So my assumption is that its possibly not solved , else I have always got very 
quick answers and responses :-) to my questions on all CDH groups, Spark, Hive
best regards
sanjay
 
  From: Josh Rosen rosenvi...@gmail.com
 To: Sanjay Subramanian sanjaysubraman...@yahoo.com 
Cc: user@spark.apache.org user@spark.apache.org 
 Sent: Friday, June 12, 2015 7:15 AM
 Subject: Re: spark-sql from CLI ---EXCEPTION: java.lang.OutOfMemoryError: 
Java heap space
   
It sounds like this might be caused by a memory configuration problem.  In 
addition to looking at the executor memory, I'd also bump up the driver memory, 
since it appears that your shell is running out of memory when collecting a 
large query result.

Sent from my phone


On Jun 11, 2015, at 8:43 AM, Sanjay Subramanian 
sanjaysubraman...@yahoo.com.INVALID wrote:


hey guys
Using Hive and Impala daily intensively.Want to transition to spark-sql in CLI 
mode
Currently in my sandbox I am using the Spark (standalone mode) in the CDH 
distribution (starving developer version 5.3.3)
3 datanode hadoop cluster32GB RAM per node8 cores per node

| spark | 1.2.0+cdh5.3.3+371 |



I am testing some stuff on one view and getting memory errorsPossibly reason is 
default memory per executor showing on 18080 is 512M

These options when used to start the spark-sql CLI does not seem to have any 
effect --total-executor-cores 12 --executor-memory 4G



/opt/cloudera/parcels/CDH/lib/spark/bin/spark-sql -e  select distinct 
isr,event_dt,age,age_cod,sex,year,quarter from aers.aers_demo_view

aers.aers_demo_view (7 million+ records)===isr     bigint  case 
idevent_dt        bigint  Event dateage     double  age of patientage_cod 
string  days,months yearssex     string  M or Fyear    intquarter int

VIEW DEFINITIONCREATE VIEW `aers.aers_demo_view` AS SELECT 
`isr` AS `isr`, `event_dt` AS `event_dt`, `age` AS `age`, `age_cod` AS 
`age_cod`, `gndr_cod` AS `sex`, `year` AS `year`, `quarter` AS `quarter` FROM 
(SELECT   `aers_demo_v1`.`isr`,   `aers_demo_v1`.`event_dt`,   
`aers_demo_v1`.`age`,   `aers_demo_v1`.`age_cod`,   `aers_demo_v1`.`gndr_cod`,  
 `aers_demo_v1`.`year`,   `aers_demo_v1`.`quarter`FROM  
`aers`.`aers_demo_v1`UNION ALLSELECT   `aers_demo_v2`.`isr`,   
`aers_demo_v2`.`event_dt`,   `aers_demo_v2`.`age`,   `aers_demo_v2`.`age_cod`,  
 `aers_demo_v2`.`gndr_cod`,   `aers_demo_v2`.`year`,   
`aers_demo_v2`.`quarter`FROM  `aers`.`aers_demo_v2`UNION ALLSELECT   
`aers_demo_v3`.`isr`,   `aers_demo_v3`.`event_dt`,   `aers_demo_v3`.`age`,   
`aers_demo_v3`.`age_cod`,   `aers_demo_v3`.`gndr_cod`,   `aers_demo_v3`.`year`, 
  `aers_demo_v3`.`quarter`FROM  `aers`.`aers_demo_v3`UNION ALLSELECT   
`aers_demo_v4`.`isr`,   `aers_demo_v4`.`event_dt`,   `aers_demo_v4`.`age`,   
`aers_demo_v4`.`age_cod`,   `aers_demo_v4`.`gndr_cod`,   `aers_demo_v4`.`year`, 
  `aers_demo_v4`.`quarter`FROM  `aers`.`aers_demo_v4`UNION ALLSELECT   
`aers_demo_v5`.`primaryid` AS `ISR`,   `aers_demo_v5`.`event_dt`,   
`aers_demo_v5`.`age`,   `aers_demo_v5`.`age_cod`,   `aers_demo_v5`.`gndr_cod`,  
 `aers_demo_v5`.`year`,   `aers_demo_v5`.`quarter`FROM  
`aers`.`aers_demo_v5`UNION ALLSELECT   `aers_demo_v6`.`primaryid` AS `ISR`,   
`aers_demo_v6`.`event_dt`,   `aers_demo_v6`.`age`,   `aers_demo_v6`.`age_cod

Re: spark-sql from CLI ---EXCEPTION: java.lang.OutOfMemoryError: Java heap space

2015-06-13 Thread Josh Rosen
Try using Spark 1.4.0 with SQL code generation turned on; this should make
a huge difference.

On Sat, Jun 13, 2015 at 5:08 PM, Sanjay Subramanian 
sanjaysubraman...@yahoo.com wrote:

 hey guys

 I tried the following settings as well. No luck

 --total-executor-cores 24 --executor-memory 4G


 BTW on the same cluster , impala absolutely kills it. same query 9
 seconds. no memory issues. no issues.

 In fact I am pretty disappointed with Spark-SQL.
 I have worked with Hive during the 0.9.x stages and taken projects to
 production successfully and Hive actually very rarely craps out.

 Whether the spark folks like what I say or not, yes my expectations are
 pretty high of Spark-SQL if I were to change the ways we are doing things
 at my workplace.
 Until that time, we are going to be hugely dependent on Impala and
  Hive(with SSD speeding up the shuffle stage , even MR jobs are not that
 slow now).

 I want to clarify for those of u who may be asking - why I am not using
 spark with Scala and insisting on using spark-sql ?

 - I have already pipelined data from enterprise tables to Hive
 - I am using CDH 5.3.3 (Cloudera starving developers version)
 - I have close to 300 tables defined in Hive external tables.
 - Data if on HDFS
 - On an average we have 150 columns per table
 - One an everyday basis , we do crazy amounts of ad-hoc joining of new and
 old tables in getting datasets ready for supervised ML
 - I thought that quite simply I can point Spark to the Hive meta and do
 queries as I do - in fact the existing queries would work as is unless I am
 using some esoteric Hive/Impala function

 Anyway, if there are some settings I can use and get spark-sql to run even
 on standalone mode that will be huge help.

 On the pre-production cluster I have spark on YARN but could never get it
 to run fairly complex queries and I have no answers from this group of the
 CDH groups.

 So my assumption is that its possibly not solved , else I have always got
 very quick answers and responses :-) to my questions on all CDH groups,
 Spark, Hive

 best regards

 sanjay



   --
  *From:* Josh Rosen rosenvi...@gmail.com
 *To:* Sanjay Subramanian sanjaysubraman...@yahoo.com
 *Cc:* user@spark.apache.org user@spark.apache.org
 *Sent:* Friday, June 12, 2015 7:15 AM
 *Subject:* Re: spark-sql from CLI ---EXCEPTION:
 java.lang.OutOfMemoryError: Java heap space

 It sounds like this might be caused by a memory configuration problem.  In
 addition to looking at the executor memory, I'd also bump up the driver
 memory, since it appears that your shell is running out of memory when
 collecting a large query result.

 Sent from my phone



 On Jun 11, 2015, at 8:43 AM, Sanjay Subramanian 
 sanjaysubraman...@yahoo.com.INVALID wrote:

 hey guys

 Using Hive and Impala daily intensively.
 Want to transition to spark-sql in CLI mode

 Currently in my sandbox I am using the Spark (standalone mode) in the CDH
 distribution (starving developer version 5.3.3)
 3 datanode hadoop cluster
 32GB RAM per node
 8 cores per node

 spark
 1.2.0+cdh5.3.3+371


 I am testing some stuff on one view and getting memory errors
 Possibly reason is default memory per executor showing on 18080 is
 512M

 These options when used to start the spark-sql CLI does not seem to have
 any effect
 --total-executor-cores 12 --executor-memory 4G



 /opt/cloudera/parcels/CDH/lib/spark/bin/spark-sql -e  select distinct
 isr,event_dt,age,age_cod,sex,year,quarter from aers.aers_demo_view

 aers.aers_demo_view (7 million+ records)
 ===
 isr bigint  case id
 event_dtbigint  Event date
 age double  age of patient
 age_cod string  days,months years
 sex string  M or F
 yearint
 quarter int


 VIEW DEFINITION
 
 CREATE VIEW `aers.aers_demo_view` AS SELECT `isr` AS `isr`, `event_dt` AS
 `event_dt`, `age` AS `age`, `age_cod` AS `age_cod`, `gndr_cod` AS `sex`,
 `year` AS `year`, `quarter` AS `quarter` FROM (SELECT
`aers_demo_v1`.`isr`,
`aers_demo_v1`.`event_dt`,
`aers_demo_v1`.`age`,
`aers_demo_v1`.`age_cod`,
`aers_demo_v1`.`gndr_cod`,
`aers_demo_v1`.`year`,
`aers_demo_v1`.`quarter`
 FROM
   `aers`.`aers_demo_v1`
 UNION ALL
 SELECT
`aers_demo_v2`.`isr`,
`aers_demo_v2`.`event_dt`,
`aers_demo_v2`.`age`,
`aers_demo_v2`.`age_cod`,
`aers_demo_v2`.`gndr_cod`,
`aers_demo_v2`.`year`,
`aers_demo_v2`.`quarter`
 FROM
   `aers`.`aers_demo_v2`
 UNION ALL
 SELECT
`aers_demo_v3`.`isr`,
`aers_demo_v3`.`event_dt`,
`aers_demo_v3`.`age`,
`aers_demo_v3`.`age_cod`,
`aers_demo_v3`.`gndr_cod`,
`aers_demo_v3`.`year`,
`aers_demo_v3`.`quarter`
 FROM
   `aers`.`aers_demo_v3`
 UNION ALL
 SELECT
`aers_demo_v4`.`isr`,
`aers_demo_v4`.`event_dt`,
`aers_demo_v4`.`age`,
`aers_demo_v4`.`age_cod`,
`aers_demo_v4`.`gndr_cod`,
`aers_demo_v4`.`year`,
`aers_demo_v4`.`quarter`
 FROM
   `aers`.`aers_demo_v4`
 UNION ALL
 SELECT

Re: spark-sql from CLI ---EXCEPTION: java.lang.OutOfMemoryError: Java heap space

2015-06-12 Thread Josh Rosen


Sent from my phone

 On Jun 11, 2015, at 8:43 AM, Sanjay Subramanian 
 sanjaysubraman...@yahoo.com.INVALID wrote:
 
 hey guys
 
 Using Hive and Impala daily intensively.
 Want to transition to spark-sql in CLI mode
 
 Currently in my sandbox I am using the Spark (standalone mode) in the CDH 
 distribution (starving developer version 5.3.3)
 3 datanode hadoop cluster
 32GB RAM per node
 8 cores per node
 
 spark 
 1.2.0+cdh5.3.3+371
 
 
 I am testing some stuff on one view and getting memory errors
 Possibly reason is default memory per executor showing on 18080 is 
 512M
 
 These options when used to start the spark-sql CLI does not seem to have any 
 effect 
 --total-executor-cores 12 --executor-memory 4G
 
 
 
 /opt/cloudera/parcels/CDH/lib/spark/bin/spark-sql -e  select distinct 
 isr,event_dt,age,age_cod,sex,year,quarter from aers.aers_demo_view
 
 aers.aers_demo_view (7 million+ records)
 ===
 isr bigint  case id
 event_dtbigint  Event date
 age double  age of patient
 age_cod string  days,months years
 sex string  M or F
 yearint
 quarter int
 
 
 VIEW DEFINITION
 
 CREATE VIEW `aers.aers_demo_view` AS SELECT `isr` AS `isr`, `event_dt` AS 
 `event_dt`, `age` AS `age`, `age_cod` AS `age_cod`, `gndr_cod` AS `sex`, 
 `year` AS `year`, `quarter` AS `quarter` FROM (SELECT
`aers_demo_v1`.`isr`,
`aers_demo_v1`.`event_dt`,
`aers_demo_v1`.`age`,
`aers_demo_v1`.`age_cod`,
`aers_demo_v1`.`gndr_cod`,
`aers_demo_v1`.`year`,
`aers_demo_v1`.`quarter`
 FROM
   `aers`.`aers_demo_v1`
 UNION ALL
 SELECT
`aers_demo_v2`.`isr`,
`aers_demo_v2`.`event_dt`,
`aers_demo_v2`.`age`,
`aers_demo_v2`.`age_cod`,
`aers_demo_v2`.`gndr_cod`,
`aers_demo_v2`.`year`,
`aers_demo_v2`.`quarter`
 FROM
   `aers`.`aers_demo_v2`
 UNION ALL
 SELECT
`aers_demo_v3`.`isr`,
`aers_demo_v3`.`event_dt`,
`aers_demo_v3`.`age`,
`aers_demo_v3`.`age_cod`,
`aers_demo_v3`.`gndr_cod`,
`aers_demo_v3`.`year`,
`aers_demo_v3`.`quarter`
 FROM
   `aers`.`aers_demo_v3`
 UNION ALL
 SELECT
`aers_demo_v4`.`isr`,
`aers_demo_v4`.`event_dt`,
`aers_demo_v4`.`age`,
`aers_demo_v4`.`age_cod`,
`aers_demo_v4`.`gndr_cod`,
`aers_demo_v4`.`year`,
`aers_demo_v4`.`quarter`
 FROM
   `aers`.`aers_demo_v4`
 UNION ALL
 SELECT
`aers_demo_v5`.`primaryid` AS `ISR`,
`aers_demo_v5`.`event_dt`,
`aers_demo_v5`.`age`,
`aers_demo_v5`.`age_cod`,
`aers_demo_v5`.`gndr_cod`,
`aers_demo_v5`.`year`,
`aers_demo_v5`.`quarter`
 FROM
   `aers`.`aers_demo_v5`
 UNION ALL
 SELECT
`aers_demo_v6`.`primaryid` AS `ISR`,
`aers_demo_v6`.`event_dt`,
`aers_demo_v6`.`age`,
`aers_demo_v6`.`age_cod`,
`aers_demo_v6`.`sex` AS `GNDR_COD`,
`aers_demo_v6`.`year`,
`aers_demo_v6`.`quarter`
 FROM
   `aers`.`aers_demo_v6`) `aers_demo_view`
 
 
 
 
 
 
 
 15/06/11 08:36:36 WARN DefaultChannelPipeline: An exception was thrown by a 
 user handler while handling an exception event ([id: 0x01b99855, 
 /10.0.0.19:58117 = /10.0.0.19:52016] EXCEPTION: java.lang.OutOfMemoryError: 
 Java heap space)
 java.lang.OutOfMemoryError: Java heap space
 at 
 org.jboss.netty.buffer.HeapChannelBuffer.init(HeapChannelBuffer.java:42)
 at 
 org.jboss.netty.buffer.BigEndianHeapChannelBuffer.init(BigEndianHeapChannelBuffer.java:34)
 at 
 org.jboss.netty.buffer.ChannelBuffers.buffer(ChannelBuffers.java:134)
 at 
 org.jboss.netty.buffer.HeapChannelBufferFactory.getBuffer(HeapChannelBufferFactory.java:68)
 at 
 org.jboss.netty.buffer.AbstractChannelBufferFactory.getBuffer(AbstractChannelBufferFactory.java:48)
 at 
 org.jboss.netty.handler.codec.frame.FrameDecoder.newCumulationBuffer(FrameDecoder.java:507)
 at 
 org.jboss.netty.handler.codec.frame.FrameDecoder.updateCumulation(FrameDecoder.java:345)
 at 
 org.jboss.netty.handler.codec.frame.FrameDecoder.messageReceived(FrameDecoder.java:312)
 at 
 org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:268)
 at 
 org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:255)
 at 
 org.jboss.netty.channel.socket.nio.NioWorker.read(NioWorker.java:88)
 at 
 org.jboss.netty.channel.socket.nio.AbstractNioWorker.process(AbstractNioWorker.java:109)
 at 
 org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:312)
 at 
 org.jboss.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:90)
 at 
 org.jboss.netty.channel.socket.nio.NioWorker.run(NioWorker.java:178)
 at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
 at java.lang.Thread.run(Thread.java:745)
 15/06/11 08:36:40 ERROR Utils: Uncaught exception in thread 
 task-result-getter-0
 java.lang.OutOfMemoryError: GC 

Re: spark-sql from CLI ---EXCEPTION: java.lang.OutOfMemoryError: Java heap space

2015-06-12 Thread Josh Rosen
It sounds like this might be caused by a memory configuration problem.  In 
addition to looking at the executor memory, I'd also bump up the driver memory, 
since it appears that your shell is running out of memory when collecting a 
large query result.

Sent from my phone

 On Jun 11, 2015, at 8:43 AM, Sanjay Subramanian 
 sanjaysubraman...@yahoo.com.INVALID wrote:
 
 hey guys
 
 Using Hive and Impala daily intensively.
 Want to transition to spark-sql in CLI mode
 
 Currently in my sandbox I am using the Spark (standalone mode) in the CDH 
 distribution (starving developer version 5.3.3)
 3 datanode hadoop cluster
 32GB RAM per node
 8 cores per node
 
 spark 
 1.2.0+cdh5.3.3+371
 
 
 I am testing some stuff on one view and getting memory errors
 Possibly reason is default memory per executor showing on 18080 is 
 512M
 
 These options when used to start the spark-sql CLI does not seem to have any 
 effect 
 --total-executor-cores 12 --executor-memory 4G
 
 
 
 /opt/cloudera/parcels/CDH/lib/spark/bin/spark-sql -e  select distinct 
 isr,event_dt,age,age_cod,sex,year,quarter from aers.aers_demo_view
 
 aers.aers_demo_view (7 million+ records)
 ===
 isr bigint  case id
 event_dtbigint  Event date
 age double  age of patient
 age_cod string  days,months years
 sex string  M or F
 yearint
 quarter int
 
 
 VIEW DEFINITION
 
 CREATE VIEW `aers.aers_demo_view` AS SELECT `isr` AS `isr`, `event_dt` AS 
 `event_dt`, `age` AS `age`, `age_cod` AS `age_cod`, `gndr_cod` AS `sex`, 
 `year` AS `year`, `quarter` AS `quarter` FROM (SELECT
`aers_demo_v1`.`isr`,
`aers_demo_v1`.`event_dt`,
`aers_demo_v1`.`age`,
`aers_demo_v1`.`age_cod`,
`aers_demo_v1`.`gndr_cod`,
`aers_demo_v1`.`year`,
`aers_demo_v1`.`quarter`
 FROM
   `aers`.`aers_demo_v1`
 UNION ALL
 SELECT
`aers_demo_v2`.`isr`,
`aers_demo_v2`.`event_dt`,
`aers_demo_v2`.`age`,
`aers_demo_v2`.`age_cod`,
`aers_demo_v2`.`gndr_cod`,
`aers_demo_v2`.`year`,
`aers_demo_v2`.`quarter`
 FROM
   `aers`.`aers_demo_v2`
 UNION ALL
 SELECT
`aers_demo_v3`.`isr`,
`aers_demo_v3`.`event_dt`,
`aers_demo_v3`.`age`,
`aers_demo_v3`.`age_cod`,
`aers_demo_v3`.`gndr_cod`,
`aers_demo_v3`.`year`,
`aers_demo_v3`.`quarter`
 FROM
   `aers`.`aers_demo_v3`
 UNION ALL
 SELECT
`aers_demo_v4`.`isr`,
`aers_demo_v4`.`event_dt`,
`aers_demo_v4`.`age`,
`aers_demo_v4`.`age_cod`,
`aers_demo_v4`.`gndr_cod`,
`aers_demo_v4`.`year`,
`aers_demo_v4`.`quarter`
 FROM
   `aers`.`aers_demo_v4`
 UNION ALL
 SELECT
`aers_demo_v5`.`primaryid` AS `ISR`,
`aers_demo_v5`.`event_dt`,
`aers_demo_v5`.`age`,
`aers_demo_v5`.`age_cod`,
`aers_demo_v5`.`gndr_cod`,
`aers_demo_v5`.`year`,
`aers_demo_v5`.`quarter`
 FROM
   `aers`.`aers_demo_v5`
 UNION ALL
 SELECT
`aers_demo_v6`.`primaryid` AS `ISR`,
`aers_demo_v6`.`event_dt`,
`aers_demo_v6`.`age`,
`aers_demo_v6`.`age_cod`,
`aers_demo_v6`.`sex` AS `GNDR_COD`,
`aers_demo_v6`.`year`,
`aers_demo_v6`.`quarter`
 FROM
   `aers`.`aers_demo_v6`) `aers_demo_view`
 
 
 
 
 
 
 
 15/06/11 08:36:36 WARN DefaultChannelPipeline: An exception was thrown by a 
 user handler while handling an exception event ([id: 0x01b99855, 
 /10.0.0.19:58117 = /10.0.0.19:52016] EXCEPTION: java.lang.OutOfMemoryError: 
 Java heap space)
 java.lang.OutOfMemoryError: Java heap space
 at 
 org.jboss.netty.buffer.HeapChannelBuffer.init(HeapChannelBuffer.java:42)
 at 
 org.jboss.netty.buffer.BigEndianHeapChannelBuffer.init(BigEndianHeapChannelBuffer.java:34)
 at 
 org.jboss.netty.buffer.ChannelBuffers.buffer(ChannelBuffers.java:134)
 at 
 org.jboss.netty.buffer.HeapChannelBufferFactory.getBuffer(HeapChannelBufferFactory.java:68)
 at 
 org.jboss.netty.buffer.AbstractChannelBufferFactory.getBuffer(AbstractChannelBufferFactory.java:48)
 at 
 org.jboss.netty.handler.codec.frame.FrameDecoder.newCumulationBuffer(FrameDecoder.java:507)
 at 
 org.jboss.netty.handler.codec.frame.FrameDecoder.updateCumulation(FrameDecoder.java:345)
 at 
 org.jboss.netty.handler.codec.frame.FrameDecoder.messageReceived(FrameDecoder.java:312)
 at 
 org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:268)
 at 
 org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:255)
 at 
 org.jboss.netty.channel.socket.nio.NioWorker.read(NioWorker.java:88)
 at 
 org.jboss.netty.channel.socket.nio.AbstractNioWorker.process(AbstractNioWorker.java:109)
 at 
 org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:312)
 at 
 org.jboss.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:90)
 at 
 org.jboss.netty.channel.socket.nio.NioWorker.run(NioWorker.java:178)
 at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
 at