Re: Using data in RDD to specify HDFS directory to write to

2014-12-05 Thread Nathan Murthy
I'm experiencing the same problem when I try to run my app in a standalone Spark cluster. My use case, however, is closer to the problem documented in this thread: http://apache-spark-user-list.1001560.n3.nabble.com/Please-help-running-a-standalone-app-on-a-Spark-cluster-td1596.html. The

Re: Using data in RDD to specify HDFS directory to write to

2014-11-17 Thread jschindler
Yes, thank you for suggestion. The error I found below was in the worker logs. AssociationError [akka.tcp://sparkwor...@cloudera01.local.company.com:7078] - [akka.tcp://sparkexecu...@cloudera01.local.company.com:33329]: Error [Association failed with

Re: Using data in RDD to specify HDFS directory to write to

2014-11-16 Thread Akhil Das
Can you check in the worker logs what exactly is happening!?? Thanks Best Regards On Sun, Nov 16, 2014 at 2:54 AM, jschindler john.schind...@utexas.edu wrote: UPDATE I have removed and added things systematically to the job and have figured that the inclusion of the construction of the

Re: Using data in RDD to specify HDFS directory to write to

2014-11-15 Thread jschindler
UPDATE I have removed and added things systematically to the job and have figured that the inclusion of the construction of the SparkContext object is what is causing it to fail. The last run contained the code below. I keep losing executors apparently and I'm not sure why. Some of the

Re: Using data in RDD to specify HDFS directory to write to

2014-11-14 Thread jschindler
I reworked my app using your idea of throwing the data in a map. It looks like it should work but I'm getting some strange errors and my job gets terminated. I get a WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered

Re: Using data in RDD to specify HDFS directory to write to

2014-11-13 Thread Akhil Das
Why not something like: lines.foreachRDD(rdd = { *//Convert rdd(json) to map* val mapper = new ObjectMapper() with ScalaObjectMapper mapper.registerModule(DefaultScalaModule) val myMap = mapper.readValue[Map[String,String]](x) val event =

Using data in RDD to specify HDFS directory to write to

2014-11-12 Thread jschindler
I am having a problem trying to figure out how to solve a problem. I would like to stream events from Kafka to my Spark Streaming app and write the contents of each RDD out to a HDFS directory. Each event that comes into the app via kafka will be JSON and have an event field with the name of the