Re: Registering Metrics Source in Spark

2016-04-05 Thread Gideon
Hi, I don't have a specific solution to your problem but I was having some problems writing my own metrics with Spark a few months back I don't know if it helps but you can try and look at this thread

Re: Read from kafka after application is restarted

2016-02-23 Thread Gideon
Regarding the spark streaming receiver - can't you just use Kafka direct receivers with checkpoints? So when you restart your application it will read where it last stopped and continue from there Regarding limiting the number of messages - you can do that by setting

Converting CSV files to Avro

2016-01-17 Thread Gideon
Hi everyone, I'm writing a Scala program which uses Spark CSV to read CSV files from a directory. After reading the CSVs as data frames I need to convert them to Avro format since I need to eventually convert that data to a GenericRecord

Re: Streaming json records from kafka ... how can I process ... help please :)

2015-12-23 Thread Gideon
What you wrote is inaccurate. When you create a directkafkastream what happens is that you actually create DirectKafkaInputDStream. This DirectKafkaInputDStream extends a DStream. 2 functions that a DStream has are: map and print when you map on your DirectKafkaInputDStream what you're actually

Re: Spark Streaming Checkpoint help failed application

2015-11-11 Thread Gideon
Hi, I'm no expert but Short answer: yes, after restarting your application will reread the failed messages Longer answer: it seems like you're mixing several things together Let me try and explain: - WAL is used to prevent your application from losing data by making the executor first write the

Re: Spark Streaming : minimum cores for a Receiver

2015-11-07 Thread Gideon
I'm not a Spark expert but: What Spark does is run receivers in the executors. These receivers are a long-running task, each receiver occupies 1 core in your executor, if an executor has more cores than receivers it can also process (at least some of) the data that it is receiving. So, enough

Re: How do I get the executor ID from running Java code

2015-11-02 Thread Gideon
Looking at the post date I can only assume you've got your answer. since I just encountered your post while trying to do the same thing I decided it's worth answering for other people. In order to get the executor ID you can use: SparkEnv.get().executorId() I hope this helps anyone -- View