[Spark Streaming] java.lang.OutOfMemoryError: GC overhead limit exceeded

2014-09-08 Thread Yan Fang
Hi guys, My Spark Streaming application have this java.lang.OutOfMemoryError: GC overhead limit exceeded error in SparkStreaming driver program. I have done the following to debug with it: 1. improved the driver memory from 1GB to 2GB, this error came after 22 hrs. When the memory was 1GB, it

How do you debug with the logs ?

2014-09-03 Thread Yan Fang
Hi guys, curious how you deal with the logs. I feel difficulty in debugging with the logs: run spark-streaming in our yarn cluster using client-mode. So have two logs: yarn log and local log ( for client ). Whenever I have problem, the log is too big to read with gedit and grep. (e.g. after

[Spar Streaming] How can we use consecutive data points as the features ?

2014-08-15 Thread Yan Fang
Hi guys, We have a use case where we need to use consecutive data points to predict the status. (yes, like using time series data to predict the machine failure). Is there a straight-forward way to do this in Spark Streaming? If all consecutive data points are in one batch, it's not complicated

Re: Is there a way to get previous/other keys' state in Spark Streaming?

2014-07-18 Thread Yan Fang
. But this will be improved a lt when we have IndexedRDD https://github.com/apache/spark/pull/1297 in the Spark that allows faster single value updates to a key-value (within each partition, without processing the entire partition. Soon. TD On Thu, Jul 17, 2014 at 5:57 PM, Yan Fang yanfang

Is there a way to get previous/other keys' state in Spark Streaming?

2014-07-17 Thread Yan Fang
Hi guys, sure you have similar use case and want to know how you deal with that. In our application, we want to check the previous state of some keys and compare with their current state. AFAIK, Spark Streaming does not have key-value access. So current what I am doing is storing the previous

Re: Is there a way to get previous/other keys' state in Spark Streaming?

2014-07-17 Thread Yan Fang
Hi TD, Thank you for the quick replying and backing my approach. :) 1) The example is this: 1. In the first 2 second interval, after updateStateByKey, I get a few keys and their states, say, (a - 1, b - 2, c - 3) 2. In the following 2 second interval, I only receive c and d and their value. But

unserializable object in Spark Streaming context

2014-07-17 Thread Yan Fang
Hi guys, need some help in this problem. In our use case, we need to continuously insert values into the database. So our approach is to create the jdbc object in the main method and then do the inserting operation in the DStream foreachRDD operation. Is this approach reasonable? Then the

Re: unserializable object in Spark Streaming context

2014-07-17 Thread Yan Fang
you're instantiating the JDBC connection in the driver, and using it inside a closure that would be run in a remote executor. That means that the connection object would need to be serializable. If that sounds like what you're doing, it won't work. On Thu, Jul 17, 2014 at 1:37 PM, Yan Fang

Re: unserializable object in Spark Streaming context

2014-07-17 Thread Yan Fang
, Fang, Yan yanfang...@gmail.com +1 (206) 849-4108 On Thu, Jul 17, 2014 at 2:53 PM, Sean Owen so...@cloudera.com wrote: On Thu, Jul 17, 2014 at 10:39 PM, Yan Fang yanfang...@gmail.com wrote: Thank you for the help. If I use TD's approache, it works and there is no exception. Only drawback

Re: Is there a way to get previous/other keys' state in Spark Streaming?

2014-07-17 Thread Yan Fang
RDD by doing stateStream.foreachRDD(...). There you can do whatever operation on all the key-state pairs. TD On Thu, Jul 17, 2014 at 11:58 AM, Yan Fang yanfang...@gmail.com wrote: Hi TD, Thank you for the quick replying and backing my approach. :) 1) The example is this: 1

Re: How are the executors used in Spark Streaming in terms of receiver and driver program?

2014-07-12 Thread Yan Fang
, Yan Fang yanfang...@gmail.com wrote: Hi Tathagata, Thank you. Is task slot equivalent to the core number? Or actually one core can run multiple tasks at the same time? Best, Fang, Yan yanfang...@gmail.com +1 (206) 849-4108 On Fri, Jul 11, 2014 at 1:45 PM, Tathagata Das

Re: How are the executors used in Spark Streaming in terms of receiver and driver program?

2014-07-11 Thread Yan Fang
think the same executor can be used for receiver and the processing task also (this part am not very sure) On Fri, Jul 11, 2014 at 12:29 AM, Yan Fang yanfang...@gmail.com wrote: Hi all, I am working to improve the parallelism of the Spark Streaming application. But I have problem

Re: How are the executors used in Spark Streaming in terms of receiver and driver program?

2014-07-11 Thread Yan Fang
on, thanks for chipping in! On Fri, Jul 11, 2014 at 11:16 AM, Yan Fang yanfang...@gmail.com wrote: Hi Praveen, Thank you for the answer. That's interesting because if I only bring up one executor for the Spark Streaming, it seems only the receiver is working, no other tasks are happening

How are the executors used in Spark Streaming in terms of receiver and driver program?

2014-07-10 Thread Yan Fang
Hi all, I am working to improve the parallelism of the Spark Streaming application. But I have problem in understanding how the executors are used and the application is distributed. 1. In YARN, is one executor equal one container? 2. I saw the statement that a streaming receiver runs on one

Spark Streaming - two questions about the streamingcontext

2014-07-09 Thread Yan Fang
I am using the Spark Streaming and have the following two questions: 1. If more than one output operations are put in the same StreamingContext (basically, I mean, I put all the output operations in the same class), are they processed one by one as the order they appear in the class? Or they are

Re: Spark Streaming - two questions about the streamingcontext

2014-07-09 Thread Yan Fang
at a time. This *can* be parallelized using an undocumented config parameter spark.streaming.concurrentJobs which is by default set to 1. 2. Yes, the output operations (and the spark jobs that are involved with them) gets queued up. TD On Wed, Jul 9, 2014 at 11:22 AM, Yan Fang yanfang

Spark Streaming - What does Spark Streaming checkpoint?

2014-07-09 Thread Yan Fang
Hi guys, I am a little confusing by the checkpointing in Spark Streaming. It checkpoints the intermediate data for the stateful operations for sure. Does it also checkpoint the information of StreamingContext? Because it seems we can recreate the SC from the checkpoint in a driver node failure

Issues in opening UI when running Spark Streaming in YARN

2014-07-07 Thread Yan Fang
Hi guys, Not sure if you have similar issues. Did not find relevant tickets in JIRA. When I deploy the Spark Streaming to YARN, I have following two issues: 1. The UI port is random. It is not default 4040. I have to look at the container's log to check the UI port. Is this suppose to be this

Re: Issues in opening UI when running Spark Streaming in YARN

2014-07-07 Thread Yan Fang
2014-07-07 11:20 GMT-07:00 Yan Fang yanfang...@gmail.com: Hi guys, Not sure if you have similar issues. Did not find relevant tickets in JIRA. When I deploy the Spark Streaming to YARN, I have following two issues: 1. The UI port is random. It is not default 4040. I have to look

Re: Issues in opening UI when running Spark Streaming in YARN

2014-07-07 Thread Yan Fang
, not the executor containers. This is because SparkUI belongs to the SparkContext, which only exists on the driver. Andrew 2014-07-07 11:20 GMT-07:00 Yan Fang yanfang...@gmail.com: Hi guys, Not sure if you have similar issues. Did not find relevant tickets in JIRA. When I deploy the Spark Streaming

Is the order of messages guaranteed in a DStream?

2014-07-07 Thread Yan Fang
I know the order of processing DStream is guaranteed. Wondering if the order of messages in one DStream is guaranteed. My gut feeling is yes for the question because RDD is immutable. Some simple tests prove this. Want to hear from authority to persuade myself. Thank you. Best, Fang, Yan