Re: python : Out of memory: Kill process

2015-03-30 Thread Eduardo Cusa
Hi, I change my process flow.

Now I am processing a file per hour, instead of process at the end of the
day.

This decreased the memory comsuption .

Regards
Eduardo









On Thu, Mar 26, 2015 at 3:16 PM, Davies Liu dav...@databricks.com wrote:

 Could you narrow down to a step which cause the OOM, something like:

 log2= self.sqlContext.jsonFile(path)
 log2.count()
 ...
 out.count()
 ...

 On Thu, Mar 26, 2015 at 10:34 AM, Eduardo Cusa
 eduardo.c...@usmediaconsulting.com wrote:
  the last try was without log2.cache() and still getting out of memory
 
  I using the following conf, maybe help:
 
 
 
conf = (SparkConf()
.setAppName(LoadS3)
.set(spark.executor.memory, 13g)
.set(spark.driver.memory, 13g)
.set(spark.driver.maxResultSize,2g)
.set(spark.default.parallelism,200)
.set(spark.kryoserializer.buffer.mb,512))
sc = SparkContext(conf=conf )
sqlContext = SQLContext(sc)
 
 
 
 
 
  On Thu, Mar 26, 2015 at 2:29 PM, Davies Liu dav...@databricks.com
 wrote:
 
  Could you try to remove the line `log2.cache()` ?
 
  On Thu, Mar 26, 2015 at 10:02 AM, Eduardo Cusa
  eduardo.c...@usmediaconsulting.com wrote:
   I running on ec2 :
  
   1 Master : 4 CPU 15 GB RAM  (2 GB swap)
  
   2 Slaves  4 CPU 15 GB RAM
  
  
   the uncompressed dataset size is 15 GB
  
  
  
  
   On Thu, Mar 26, 2015 at 10:41 AM, Eduardo Cusa
   eduardo.c...@usmediaconsulting.com wrote:
  
   Hi Davies, I upgrade to 1.3.0 and still getting Out of Memory.
  
   I ran the same code as before, I need to make any changes?
  
  
  
  
  
  
   On Wed, Mar 25, 2015 at 4:00 PM, Davies Liu dav...@databricks.com
   wrote:
  
   With batchSize = 1, I think it will become even worse.
  
   I'd suggest to go with 1.3, have a taste for the new DataFrame API.
  
   On Wed, Mar 25, 2015 at 11:49 AM, Eduardo Cusa
   eduardo.c...@usmediaconsulting.com wrote:
Hi Davies, I running 1.1.0.
   
Now I'm following this thread that recommend use batchsize
 parameter
=
1
   
   
   
   
   
 http://apache-spark-user-list.1001560.n3.nabble.com/pySpark-memory-usage-td3022.html
   
if this does not work I will install  1.2.1 or  1.3
   
Regards
   
   
   
   
   
   
On Wed, Mar 25, 2015 at 3:39 PM, Davies Liu 
 dav...@databricks.com
wrote:
   
What's the version of Spark you are running?
   
There is a bug in SQL Python API [1], it's fixed in 1.2.1 and
 1.3,
   
[1] https://issues.apache.org/jira/browse/SPARK-6055
   
On Wed, Mar 25, 2015 at 10:33 AM, Eduardo Cusa
eduardo.c...@usmediaconsulting.com wrote:
 Hi Guys, I running the following function with spark-submmit
 and
 de
 SO
 is
 killing my process :


   def getRdd(self,date,provider):
 path='s3n://'+AWS_BUCKET+'/'+date+'/*.log.gz'
 log2= self.sqlContext.jsonFile(path)
 log2.registerTempTable('log_test')
 log2.cache()
   
You only visit the table once, cache does not help here.
   
 out=self.sqlContext.sql(SELECT user, tax from log_test
 where
 provider =
 '+provider+'and country  '').map(lambda row: (row.user,
 row.tax))
 print out1
 return  map((lambda (x,y): (x, list(y))),
 sorted(out.groupByKey(2000).collect()))
   
100 partitions (or less) will be enough for 2G dataset.
   


 The input dataset has 57 zip files (2 GB)

 The same process with a smaller dataset completed successfully

 Any ideas to debug is welcome.

 Regards
 Eduardo


   
   
  
  
  
 
 



Re: python : Out of memory: Kill process

2015-03-26 Thread Eduardo Cusa
the last try was without log2.cache() and still getting out of memory

I using the following conf, maybe help:



  conf = (SparkConf()
  .setAppName(LoadS3)
  .set(spark.executor.memory, 13g)
  .set(spark.driver.memory, 13g)
  .set(spark.driver.maxResultSize,2g)
  .set(spark.default.parallelism,200)
  .set(spark.kryoserializer.buffer.mb,512))
  sc = SparkContext(conf=conf )
  sqlContext = SQLContext(sc)





On Thu, Mar 26, 2015 at 2:29 PM, Davies Liu dav...@databricks.com wrote:

 Could you try to remove the line `log2.cache()` ?

 On Thu, Mar 26, 2015 at 10:02 AM, Eduardo Cusa
 eduardo.c...@usmediaconsulting.com wrote:
  I running on ec2 :
 
  1 Master : 4 CPU 15 GB RAM  (2 GB swap)
 
  2 Slaves  4 CPU 15 GB RAM
 
 
  the uncompressed dataset size is 15 GB
 
 
 
 
  On Thu, Mar 26, 2015 at 10:41 AM, Eduardo Cusa
  eduardo.c...@usmediaconsulting.com wrote:
 
  Hi Davies, I upgrade to 1.3.0 and still getting Out of Memory.
 
  I ran the same code as before, I need to make any changes?
 
 
 
 
 
 
  On Wed, Mar 25, 2015 at 4:00 PM, Davies Liu dav...@databricks.com
 wrote:
 
  With batchSize = 1, I think it will become even worse.
 
  I'd suggest to go with 1.3, have a taste for the new DataFrame API.
 
  On Wed, Mar 25, 2015 at 11:49 AM, Eduardo Cusa
  eduardo.c...@usmediaconsulting.com wrote:
   Hi Davies, I running 1.1.0.
  
   Now I'm following this thread that recommend use batchsize parameter
 =
   1
  
  
  
  
 http://apache-spark-user-list.1001560.n3.nabble.com/pySpark-memory-usage-td3022.html
  
   if this does not work I will install  1.2.1 or  1.3
  
   Regards
  
  
  
  
  
  
   On Wed, Mar 25, 2015 at 3:39 PM, Davies Liu dav...@databricks.com
   wrote:
  
   What's the version of Spark you are running?
  
   There is a bug in SQL Python API [1], it's fixed in 1.2.1 and 1.3,
  
   [1] https://issues.apache.org/jira/browse/SPARK-6055
  
   On Wed, Mar 25, 2015 at 10:33 AM, Eduardo Cusa
   eduardo.c...@usmediaconsulting.com wrote:
Hi Guys, I running the following function with spark-submmit and
 de
SO
is
killing my process :
   
   
  def getRdd(self,date,provider):
path='s3n://'+AWS_BUCKET+'/'+date+'/*.log.gz'
log2= self.sqlContext.jsonFile(path)
log2.registerTempTable('log_test')
log2.cache()
  
   You only visit the table once, cache does not help here.
  
out=self.sqlContext.sql(SELECT user, tax from log_test where
provider =
'+provider+'and country  '').map(lambda row: (row.user,
row.tax))
print out1
return  map((lambda (x,y): (x, list(y))),
sorted(out.groupByKey(2000).collect()))
  
   100 partitions (or less) will be enough for 2G dataset.
  
   
   
The input dataset has 57 zip files (2 GB)
   
The same process with a smaller dataset completed successfully
   
Any ideas to debug is welcome.
   
Regards
Eduardo
   
   
  
  
 
 
 



Re: python : Out of memory: Kill process

2015-03-26 Thread Eduardo Cusa
I running on ec2 :

1 Master : 4 CPU 15 GB RAM  (2 GB swap)

2 Slaves  4 CPU 15 GB RAM


the uncompressed dataset size is 15 GB




On Thu, Mar 26, 2015 at 10:41 AM, Eduardo Cusa 
eduardo.c...@usmediaconsulting.com wrote:

 Hi Davies, I upgrade to 1.3.0 and still getting Out of Memory.

 I ran the same code as before, I need to make any changes?






 On Wed, Mar 25, 2015 at 4:00 PM, Davies Liu dav...@databricks.com wrote:

 With batchSize = 1, I think it will become even worse.

 I'd suggest to go with 1.3, have a taste for the new DataFrame API.

 On Wed, Mar 25, 2015 at 11:49 AM, Eduardo Cusa
 eduardo.c...@usmediaconsulting.com wrote:
  Hi Davies, I running 1.1.0.
 
  Now I'm following this thread that recommend use batchsize parameter = 1
 
 
 
 http://apache-spark-user-list.1001560.n3.nabble.com/pySpark-memory-usage-td3022.html
 
  if this does not work I will install  1.2.1 or  1.3
 
  Regards
 
 
 
 
 
 
  On Wed, Mar 25, 2015 at 3:39 PM, Davies Liu dav...@databricks.com
 wrote:
 
  What's the version of Spark you are running?
 
  There is a bug in SQL Python API [1], it's fixed in 1.2.1 and 1.3,
 
  [1] https://issues.apache.org/jira/browse/SPARK-6055
 
  On Wed, Mar 25, 2015 at 10:33 AM, Eduardo Cusa
  eduardo.c...@usmediaconsulting.com wrote:
   Hi Guys, I running the following function with spark-submmit and de
 SO
   is
   killing my process :
  
  
 def getRdd(self,date,provider):
   path='s3n://'+AWS_BUCKET+'/'+date+'/*.log.gz'
   log2= self.sqlContext.jsonFile(path)
   log2.registerTempTable('log_test')
   log2.cache()
 
  You only visit the table once, cache does not help here.
 
   out=self.sqlContext.sql(SELECT user, tax from log_test where
   provider =
   '+provider+'and country  '').map(lambda row: (row.user,
 row.tax))
   print out1
   return  map((lambda (x,y): (x, list(y))),
   sorted(out.groupByKey(2000).collect()))
 
  100 partitions (or less) will be enough for 2G dataset.
 
  
  
   The input dataset has 57 zip files (2 GB)
  
   The same process with a smaller dataset completed successfully
  
   Any ideas to debug is welcome.
  
   Regards
   Eduardo
  
  
 
 





Re: python : Out of memory: Kill process

2015-03-26 Thread Eduardo Cusa
Hi Davies, I upgrade to 1.3.0 and still getting Out of Memory.

I ran the same code as before, I need to make any changes?






On Wed, Mar 25, 2015 at 4:00 PM, Davies Liu dav...@databricks.com wrote:

 With batchSize = 1, I think it will become even worse.

 I'd suggest to go with 1.3, have a taste for the new DataFrame API.

 On Wed, Mar 25, 2015 at 11:49 AM, Eduardo Cusa
 eduardo.c...@usmediaconsulting.com wrote:
  Hi Davies, I running 1.1.0.
 
  Now I'm following this thread that recommend use batchsize parameter = 1
 
 
 
 http://apache-spark-user-list.1001560.n3.nabble.com/pySpark-memory-usage-td3022.html
 
  if this does not work I will install  1.2.1 or  1.3
 
  Regards
 
 
 
 
 
 
  On Wed, Mar 25, 2015 at 3:39 PM, Davies Liu dav...@databricks.com
 wrote:
 
  What's the version of Spark you are running?
 
  There is a bug in SQL Python API [1], it's fixed in 1.2.1 and 1.3,
 
  [1] https://issues.apache.org/jira/browse/SPARK-6055
 
  On Wed, Mar 25, 2015 at 10:33 AM, Eduardo Cusa
  eduardo.c...@usmediaconsulting.com wrote:
   Hi Guys, I running the following function with spark-submmit and de SO
   is
   killing my process :
  
  
 def getRdd(self,date,provider):
   path='s3n://'+AWS_BUCKET+'/'+date+'/*.log.gz'
   log2= self.sqlContext.jsonFile(path)
   log2.registerTempTable('log_test')
   log2.cache()
 
  You only visit the table once, cache does not help here.
 
   out=self.sqlContext.sql(SELECT user, tax from log_test where
   provider =
   '+provider+'and country  '').map(lambda row: (row.user, row.tax))
   print out1
   return  map((lambda (x,y): (x, list(y))),
   sorted(out.groupByKey(2000).collect()))
 
  100 partitions (or less) will be enough for 2G dataset.
 
  
  
   The input dataset has 57 zip files (2 GB)
  
   The same process with a smaller dataset completed successfully
  
   Any ideas to debug is welcome.
  
   Regards
   Eduardo
  
  
 
 



python : Out of memory: Kill process

2015-03-25 Thread Eduardo Cusa
Hi Guys, I running the following function with spark-submmit and de SO is
killing my process :


  def getRdd(self,date,provider):
path='s3n://'+AWS_BUCKET+'/'+date+'/*.log.gz'
log2= self.sqlContext.jsonFile(path)
log2.registerTempTable('log_test')
log2.cache()
out=self.sqlContext.sql(SELECT user, tax from log_test where provider
= '+provider+'and country  '').map(lambda row: (row.user, row.tax))
print out1
return  map((lambda (x,y): (x, list(y))),
sorted(out.groupByKey(2000).collect()))



The input dataset has 57 zip files (2 GB)

The same process with a smaller dataset completed successfully

Any ideas to debug is welcome.

Regards
Eduardo


Re: python : Out of memory: Kill process

2015-03-25 Thread Eduardo Cusa
Hi Davies, I running 1.1.0.

Now I'm following this thread that recommend use batchsize parameter = 1


http://apache-spark-user-list.1001560.n3.nabble.com/pySpark-memory-usage-td3022.html

if this does not work I will install  1.2.1 or  1.3

Regards






On Wed, Mar 25, 2015 at 3:39 PM, Davies Liu dav...@databricks.com wrote:

 What's the version of Spark you are running?

 There is a bug in SQL Python API [1], it's fixed in 1.2.1 and 1.3,

 [1] https://issues.apache.org/jira/browse/SPARK-6055

 On Wed, Mar 25, 2015 at 10:33 AM, Eduardo Cusa
 eduardo.c...@usmediaconsulting.com wrote:
  Hi Guys, I running the following function with spark-submmit and de SO is
  killing my process :
 
 
def getRdd(self,date,provider):
  path='s3n://'+AWS_BUCKET+'/'+date+'/*.log.gz'
  log2= self.sqlContext.jsonFile(path)
  log2.registerTempTable('log_test')
  log2.cache()

 You only visit the table once, cache does not help here.

  out=self.sqlContext.sql(SELECT user, tax from log_test where
 provider =
  '+provider+'and country  '').map(lambda row: (row.user, row.tax))
  print out1
  return  map((lambda (x,y): (x, list(y))),
  sorted(out.groupByKey(2000).collect()))

 100 partitions (or less) will be enough for 2G dataset.

 
 
  The input dataset has 57 zip files (2 GB)
 
  The same process with a smaller dataset completed successfully
 
  Any ideas to debug is welcome.
 
  Regards
  Eduardo
 
 



Re: Play Scala Spark Exmaple

2015-01-12 Thread Eduardo Cusa
The EC2 versiĆ³n is 1.1.0 and this is my build.sbt:


libraryDependencies ++= Seq(
  jdbc,
  anorm,
  cache,
  org.apache.spark  %% spark-core  % 1.1.0,
  com.typesafe.akka %% akka-actor  % 2.2.3,
  com.typesafe.akka %% akka-slf4j  % 2.2.3,
  org.apache.spark  %% spark-streaming-twitter % 1.1.0,
  org.apache.spark  %% spark-sql   % 1.1.0,
  org.apache.spark  %% spark-mllib % 1.1.0
  )



On Sun, Jan 11, 2015 at 3:01 AM, Akhil Das ak...@sigmoidanalytics.com
wrote:

 What is your spark version that is running on the EC2 cluster? From the build
 file https://github.com/knoldus/Play-Spark-Scala/blob/master/build.sbt
 of your play application it seems that it uses Spark 1.0.1.

 Thanks
 Best Regards

 On Fri, Jan 9, 2015 at 7:17 PM, Eduardo Cusa 
 eduardo.c...@usmediaconsulting.com wrote:

 Hi guys, I running the following example :
 https://github.com/knoldus/Play-Spark-Scala in the same machine as the
 spark master, and the spark cluster was lauched with ec2 script.


 I'm stuck with this errors, any idea how to fix it?

 Regards
 Eduardo


 call the play app prints the following exception :


 [*error*] a.r.EndpointWriter - AssociationError 
 [akka.tcp://sparkDriver@ip-10-28-236-122.ec2.internal:47481] - 
 [akka.tcp://driverPropsFetcher@ip-10-158-18-250.ec2.internal:52575]: Error 
 [Shut down address: 
 akka.tcp://driverPropsFetcher@ip-10-158-18-250.ec2.internal:52575] [
 akka.remote.ShutDownAssociation: Shut down address: 
 akka.tcp://driverPropsFetcher@ip-10-158-18-250.ec2.internal:52575
 Caused by: akka.remote.transport.Transport$InvalidAssociationException: The 
 remote system terminated the association because it is shutting down.




 The master recive the spark application and generate the following stderr
 log page:


 15/01/09 13:31:23 INFO Remoting: Remoting started; listening on addresses 
 :[akka.tcp://sparkExecutor@ip-10-158-18-250.ec2.internal:37856]
 15/01/09 13:31:23 INFO Remoting: Remoting now listens on addresses: 
 [akka.tcp://sparkExecutor@ip-10-158-18-250.ec2.internal:37856]
 15/01/09 13:31:23 INFO util.Utils: Successfully started service 
 'sparkExecutor' on port 37856.
 15/01/09 13:31:23 INFO util.AkkaUtils: Connecting to MapOutputTracker: 
 akka.tcp://sparkDriver@ip-10-28-236-122.ec2.internal:47481/user/MapOutputTracker
 15/01/09 13:31:23 INFO util.AkkaUtils: Connecting to BlockManagerMaster: 
 akka.tcp://sparkDriver@ip-10-28-236-122.ec2.internal:47481/user/BlockManagerMaster
 15/01/09 13:31:23 INFO storage.DiskBlockManager: Created local directory at 
 /mnt/spark/spark-local-20150109133123-3805
 15/01/09 13:31:23 INFO storage.DiskBlockManager: Created local directory at 
 /mnt2/spark/spark-local-20150109133123-b05e
 15/01/09 13:31:23 INFO util.Utils: Successfully started service 'Connection 
 manager for block manager' on port 36936.
 15/01/09 13:31:23 INFO network.ConnectionManager: Bound socket to port 36936 
 with id = ConnectionManagerId(ip-10-158-18-250.ec2.internal,36936)
 15/01/09 13:31:23 INFO storage.MemoryStore: MemoryStore started with 
 capacity 265.4 MB
 15/01/09 13:31:23 INFO storage.BlockManagerMaster: Trying to register 
 BlockManager
 15/01/09 13:31:23 INFO storage.BlockManagerMaster: Registered BlockManager
 15/01/09 13:31:23 INFO util.AkkaUtils: Connecting to HeartbeatReceiver: 
 akka.tcp://sparkDriver@ip-10-28-236-122.ec2.internal:47481/user/HeartbeatReceiver
 15/01/09 13:31:54 ERROR executor.CoarseGrainedExecutorBackend: Driver 
 Disassociated [akka.tcp://sparkExecutor@ip-10-158-18-250.ec2.internal:57671] 
 - [akka.tcp://sparkDriver@ip-10-28-236-122.ec2.internal:47481] 
 disassociated! Shutting down.





Play Scala Spark Exmaple

2015-01-09 Thread Eduardo Cusa
Hi guys, I running the following example :
https://github.com/knoldus/Play-Spark-Scala in the same machine as the
spark master, and the spark cluster was lauched with ec2 script.


I'm stuck with this errors, any idea how to fix it?

Regards
Eduardo


call the play app prints the following exception :


[*error*] a.r.EndpointWriter - AssociationError
[akka.tcp://sparkDriver@ip-10-28-236-122.ec2.internal:47481] -
[akka.tcp://driverPropsFetcher@ip-10-158-18-250.ec2.internal:52575]:
Error [Shut down address:
akka.tcp://driverPropsFetcher@ip-10-158-18-250.ec2.internal:52575] [
akka.remote.ShutDownAssociation: Shut down address:
akka.tcp://driverPropsFetcher@ip-10-158-18-250.ec2.internal:52575
Caused by: akka.remote.transport.Transport$InvalidAssociationException:
The remote system terminated the association because it is shutting
down.




The master recive the spark application and generate the following stderr
log page:


15/01/09 13:31:23 INFO Remoting: Remoting started; listening on
addresses :[akka.tcp://sparkExecutor@ip-10-158-18-250.ec2.internal:37856]
15/01/09 13:31:23 INFO Remoting: Remoting now listens on addresses:
[akka.tcp://sparkExecutor@ip-10-158-18-250.ec2.internal:37856]
15/01/09 13:31:23 INFO util.Utils: Successfully started service
'sparkExecutor' on port 37856.
15/01/09 13:31:23 INFO util.AkkaUtils: Connecting to MapOutputTracker:
akka.tcp://sparkDriver@ip-10-28-236-122.ec2.internal:47481/user/MapOutputTracker
15/01/09 13:31:23 INFO util.AkkaUtils: Connecting to
BlockManagerMaster:
akka.tcp://sparkDriver@ip-10-28-236-122.ec2.internal:47481/user/BlockManagerMaster
15/01/09 13:31:23 INFO storage.DiskBlockManager: Created local
directory at /mnt/spark/spark-local-20150109133123-3805
15/01/09 13:31:23 INFO storage.DiskBlockManager: Created local
directory at /mnt2/spark/spark-local-20150109133123-b05e
15/01/09 13:31:23 INFO util.Utils: Successfully started service
'Connection manager for block manager' on port 36936.
15/01/09 13:31:23 INFO network.ConnectionManager: Bound socket to port
36936 with id =
ConnectionManagerId(ip-10-158-18-250.ec2.internal,36936)
15/01/09 13:31:23 INFO storage.MemoryStore: MemoryStore started with
capacity 265.4 MB
15/01/09 13:31:23 INFO storage.BlockManagerMaster: Trying to register
BlockManager
15/01/09 13:31:23 INFO storage.BlockManagerMaster: Registered BlockManager
15/01/09 13:31:23 INFO util.AkkaUtils: Connecting to
HeartbeatReceiver:
akka.tcp://sparkDriver@ip-10-28-236-122.ec2.internal:47481/user/HeartbeatReceiver
15/01/09 13:31:54 ERROR executor.CoarseGrainedExecutorBackend: Driver
Disassociated [akka.tcp://sparkExecutor@ip-10-158-18-250.ec2.internal:57671]
- [akka.tcp://sparkDriver@ip-10-28-236-122.ec2.internal:47481]
disassociated! Shutting down.


Re: EC2 VPC script

2014-12-29 Thread Eduardo Cusa
I running the master branch.

Finally I can make it work, changing  all occurrences of *public_dns_name*
property with *private_ip_address* in the spark_ec2.py script.

My VPC instances always have null value in *public_dns_name* property

Now my script only work for VPC instances.

Regards
Eduardo










On Sat, Dec 20, 2014 at 7:53 PM, Nicholas Chammas 
nicholas.cham...@gmail.com wrote:

 What version of the script are you running? What did you see in the EC2
 web console when this happened?

 Sometimes instances just don't come up in a reasonable amount of time and
 you have to kill and restart the process.

 Does this always happen, or was it just once?

 Nick

 On Thu, Dec 18, 2014 at 9:42 AM, Eduardo Cusa 
 eduardo.c...@usmediaconsulting.com wrote:

 Hi guys.

 I run the folling command to lauch a new cluster :

 ./spark-ec2 -k test -i test.pem -s 1  --vpc-id vpc-X --subnet-id
 subnet-X launch  vpc_spark

 The instances started ok but the command never end. With the following
 output:


 Setting up security groups...
 Searching for existing cluster vpc_spark...
 Spark AMI: ami-5bb18832
 Launching instances...
 Launched 1 slaves in us-east-1a, regid = r-e9d603c4
 Launched master in us-east-1a, regid = r-89d104a4
 Waiting for cluster to enter 'ssh-ready' state...


 any ideas what happend?


 regards
 Eduardo






EC2 VPC script

2014-12-18 Thread Eduardo Cusa
Hi guys.

I run the folling command to lauch a new cluster :

./spark-ec2 -k test -i test.pem -s 1  --vpc-id vpc-X --subnet-id
subnet-X launch  vpc_spark

The instances started ok but the command never end. With the following
output:


Setting up security groups...
Searching for existing cluster vpc_spark...
Spark AMI: ami-5bb18832
Launching instances...
Launched 1 slaves in us-east-1a, regid = r-e9d603c4
Launched master in us-east-1a, regid = r-89d104a4
Waiting for cluster to enter 'ssh-ready' state...


any ideas what happend?


regards
Eduardo


undefined

2014-12-18 Thread Eduardo Cusa
Hi guys.

I run the folling command to lauch a new cluster :

./spark-ec2 -k test -i test.pem -s 1  --vpc-id vpc-X --subnet-id
subnet-X launch  vpc_spark

The instances started ok but the command never end. With the following
output:


Setting up security groups...
Searching for existing cluster vpc_spark...
Spark AMI: ami-5bb18832
Launching instances...
Launched 1 slaves in us-east-1a, regid = r-e9d603c4
Launched master in us-east-1a, regid = r-89d104a4
Waiting for cluster to enter 'ssh-ready' state...


any ideas what happend?


Java client connection

2014-11-12 Thread Eduardo Cusa
HI guys, I starting to working with spark from java and when i run the
folliwing code :


SparkConf conf = new SparkConf().setMaster(spark://10.0.2.20:7077
).setAppName(SparkTest);
JavaSparkContext sc = new JavaSparkContext(conf);


I recived the following error and the java process exit ends:


14/11/11 17:58:12 WARN client.AppClient$ClientActor: Could not connect to
akka.tcp://sparkMaster@10.0.2.20:7077:
akka.remote.EndpointAssociationException: Association failed with
[akka.tcp://sparkMaster@10.0.2.20:7077]
14/11/11 17:58:18 WARN client.AppClient$ClientActor: Could not connect to
akka.tcp://sparkMaster@10.0.2.20:7077:
akka.remote.EndpointAssociationException: Association failed with
[akka.tcp://sparkMaster@10.0.2.20:7077]
14/11/11 17:58:20 WARN client.AppClient$ClientActor: Could not connect to
akka.tcp://sparkMaster@10.0.2.20:7077:
akka.remote.EndpointAssociationException: Association failed with
[akka.tcp://sparkMaster@10.0.2.20:7077]



My spark master is run on 10.0.2.20.
From pyspark I can work properly.


Regards
Eduardo