Re: Pause Spark Streaming reading or sampling streaming data
Hi, - yes - it's great that you wrote it yourself - it means you have more control. I have the feeling that the most efficient point to discard as much data as possible - or even modify your subscription protocol to - your spark input source - not even receive the other 50 seconds of data is the most efficient point. After you deliver data to DStream - you might filter them as much as you want - but you will still be subject to garbage collection and/or potential shuffles/and HDD checkpoints. On Thu, Aug 6, 2015 at 1:31 AM, Heath Guo heath...@fb.com wrote: Hi Dimitris, Thanks for your reply. Just wondering – are you asking about my streaming input source? I implemented a custom receiver and have been using that. Thanks. From: Dimitris Kouzis - Loukas look...@gmail.com Date: Wednesday, August 5, 2015 at 5:27 PM To: Heath Guo heath...@fb.com Cc: user@spark.apache.org user@spark.apache.org Subject: Re: Pause Spark Streaming reading or sampling streaming data What driver do you use? Sounds like something you should do before the driver... On Thu, Aug 6, 2015 at 12:50 AM, Heath Guo heath...@fb.com wrote: Hi, I have a question about sampling Spark Streaming data, or getting part of the data. For every minute, I only want the data read in during the first 10 seconds, and discard all data in the next 50 seconds. Is there any way to pause reading and discard data in that period? I'm doing this to sample from a stream of huge amount of data, which saves processing time in the real-time program. Thanks!
Re: Pause Spark Streaming reading or sampling streaming data
Re-reading your description - I guess you could potentially make your input source to connect for 10 seconds, pause for 50 and then reconnect. On Thu, Aug 6, 2015 at 10:32 AM, Dimitris Kouzis - Loukas look...@gmail.com wrote: Hi, - yes - it's great that you wrote it yourself - it means you have more control. I have the feeling that the most efficient point to discard as much data as possible - or even modify your subscription protocol to - your spark input source - not even receive the other 50 seconds of data is the most efficient point. After you deliver data to DStream - you might filter them as much as you want - but you will still be subject to garbage collection and/or potential shuffles/and HDD checkpoints. On Thu, Aug 6, 2015 at 1:31 AM, Heath Guo heath...@fb.com wrote: Hi Dimitris, Thanks for your reply. Just wondering – are you asking about my streaming input source? I implemented a custom receiver and have been using that. Thanks. From: Dimitris Kouzis - Loukas look...@gmail.com Date: Wednesday, August 5, 2015 at 5:27 PM To: Heath Guo heath...@fb.com Cc: user@spark.apache.org user@spark.apache.org Subject: Re: Pause Spark Streaming reading or sampling streaming data What driver do you use? Sounds like something you should do before the driver... On Thu, Aug 6, 2015 at 12:50 AM, Heath Guo heath...@fb.com wrote: Hi, I have a question about sampling Spark Streaming data, or getting part of the data. For every minute, I only want the data read in during the first 10 seconds, and discard all data in the next 50 seconds. Is there any way to pause reading and discard data in that period? I'm doing this to sample from a stream of huge amount of data, which saves processing time in the real-time program. Thanks!
Pause Spark Streaming reading or sampling streaming data
Hi, I have a question about sampling Spark Streaming data, or getting part of the data. For every minute, I only want the data read in during the first 10 seconds, and discard all data in the next 50 seconds. Is there any way to pause reading and discard data in that period? I'm doing this to sample from a stream of huge amount of data, which saves processing time in the real-time program. Thanks! -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Pause-Spark-Streaming-reading-or-sampling-streaming-data-tp24146.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Pause Spark Streaming reading or sampling streaming data
Hi, I have a question about sampling Spark Streaming data, or getting part of the data. For every minute, I only want the data read in during the first 10 seconds, and discard all data in the next 50 seconds. Is there any way to pause reading and discard data in that period? I'm doing this to sample from a stream of huge amount of data, which saves processing time in the real-time program. Thanks!
Re: Pause Spark Streaming reading or sampling streaming data
Hi Dimitris, Thanks for your reply. Just wondering – are you asking about my streaming input source? I implemented a custom receiver and have been using that. Thanks. From: Dimitris Kouzis - Loukas look...@gmail.commailto:look...@gmail.com Date: Wednesday, August 5, 2015 at 5:27 PM To: Heath Guo heath...@fb.commailto:heath...@fb.com Cc: user@spark.apache.orgmailto:user@spark.apache.org user@spark.apache.orgmailto:user@spark.apache.org Subject: Re: Pause Spark Streaming reading or sampling streaming data What driver do you use? Sounds like something you should do before the driver... On Thu, Aug 6, 2015 at 12:50 AM, Heath Guo heath...@fb.commailto:heath...@fb.com wrote: Hi, I have a question about sampling Spark Streaming data, or getting part of the data. For every minute, I only want the data read in during the first 10 seconds, and discard all data in the next 50 seconds. Is there any way to pause reading and discard data in that period? I'm doing this to sample from a stream of huge amount of data, which saves processing time in the real-time program. Thanks!