Re: Accumulator question

2016-05-09 Thread Abi
I am splitting an integer array in 2 partitions and using an accumulator to sum the array. problem is 1. I am not seeing execution time becoming half of a linear summing. 2. The second node (from looking at timestamps) takes 3 times as long as the first node. This gives the impression it is

spark 1.6 : RDD Partitions not distributed evenly to executors

2016-05-09 Thread prateek arora
Hi My Spark Streaming application receiving data from one kafka topic ( one partition) and rdd have 30 partition. but scheduler schedule the task between executors running on same host with NODE_LOCAL locality level. ( where kafka topic partition created) . Below are the logs : 16/05/06

Re: Accessing Cassandra data from Spark Shell

2016-05-09 Thread Ted Yu
bq. Can you use HiveContext for Cassandra data? Most likely the above cannot be done. On Mon, May 9, 2016 at 9:08 PM, Cassa L wrote: > Hi, > Has anyone tried accessing Cassandra data using SparkShell? How do you do > it? Can you use HiveContext for Cassandra data? I'm using

Accessing Cassandra data from Spark Shell

2016-05-09 Thread Cassa L
Hi, Has anyone tried accessing Cassandra data using SparkShell? How do you do it? Can you use HiveContext for Cassandra data? I'm using community version of Cassandra-3.0 Thanks, LCassa

Re: Spark build failure with com.oracle:ojdbc6:jar:11.2.0.1.0

2016-05-09 Thread Luciano Resende
The manual download is not required on latest trunk code anymore. On Monday, May 9, 2016, Andrew Lee wrote: > In fact, it does require ojdbc from Oracle which also requires a username > and password. This was added as part of the testing scope for > Oracle's docker. > > > I

best fit - Dataframe and spark sql use cases

2016-05-09 Thread Divya Gehlot
Hi, I would like to know the uses cases where data frames is best fit and use cases where Spark SQL is best fit based on the one's experience . Thanks, Divya

Re:Re: Re: Re: How big the spark stream window could be ?

2016-05-09 Thread 李明伟
Hi Mich I added some more infor (the spark-env.sh setting and top command output in that thread.) Can you help to check pleas? Regards Mingwei At 2016-05-09 23:45:19, "Mich Talebzadeh" wrote: I had a look at the thread. This is what you have which I gather

Re: Why I have memory leaking for such simple spark stream code?

2016-05-09 Thread kramer2...@126.com
To add more infor: This is the setting in my spark-env.sh [root@ES01 conf]# grep -v "#" spark-env.sh SPARK_EXECUTOR_CORES=1 SPARK_EXECUTOR_INSTANCES=1 SPARK_DAEMON_MEMORY=4G So I did not set the executor to use more memory here. Also here is the top output KiB Mem : 16268156 total, 161116

Re: DataFrame cannot find temporary table

2016-05-09 Thread Takeshi Yamamuro
Hi, What's `convertRDDToDF`? Seems you use different `SQLContext` between table registration and querying. //maropu On Tue, May 10, 2016 at 2:46 AM, Mich Talebzadeh wrote: > Have you created sqlContext based on HiveContext? > > > val sc = new SparkContext(conf) >

Accumulator question

2016-05-09 Thread Abi
I am splitting an integer array in 2 partitions and using an accumulator to sum the array. problem is 1. I am not seeing execution time becoming half of a linear summing. 2. The second node (from looking at timestamps) takes 3 times as long as the first node. This gives the impression it is

Re: No of Spark context per jvm

2016-05-09 Thread Jacek Laskowski
Hi, I'd say "one per classloader". Pozdrawiam, Jacek Laskowski https://medium.com/@jaceklaskowski/ Mastering Apache Spark http://bit.ly/mastering-apache-spark Follow me at https://twitter.com/jaceklaskowski On Mon, May 9, 2016 at 10:16 AM, praveen S wrote: > Hi, > >

Re: ERROR SparkContext: Error initializing SparkContext.

2016-05-09 Thread Jacek Laskowski
Hi Andrew, Could you use the following log configuration in conf/log4j.properties and start over? Share the logs when you finish. log4j.logger.org.apache.spark.deploy.yarn.Client=DEBUG log4j.logger.org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend=DEBUG You could also have a look to

Re: Spark build failure with com.oracle:ojdbc6:jar:11.2.0.1.0

2016-05-09 Thread Andrew Lee
In fact, it does require ojdbc from Oracle which also requires a username and password. This was added as part of the testing scope for Oracle's docker. I notice this PR and commit in branch-2.0 according to https://issues.apache.org/jira/browse/SPARK-12941. In the comment, I'm not sure what

Re: Spark-csv- partitionBy

2016-05-09 Thread Gourav Sengupta
Hi, its supported, try to use coalesce(1) (the spelling is wrong) and after that do the partitions. Regards, Gourav On Mon, May 9, 2016 at 7:12 PM, Mail.com wrote: > Hi, > > I have to write tab delimited file and need to have one directory for each > unique value of a

Re: spark 2.0 issue with yarn?

2016-05-09 Thread Marcelo Vanzin
On Mon, May 9, 2016 at 3:34 PM, Matt Cheah wrote: > @Marcelo: Interesting - why would this manifest on the YARN-client side > though (as Spark is the client to YARN in this case)? Spark as a client > shouldn’t care about what auxiliary services are on the YARN cluster. The

Re: spark 2.0 issue with yarn?

2016-05-09 Thread Matt Cheah
@Marcelo: Interesting - why would this manifest on the YARN-client side though (as Spark is the client to YARN in this case)? Spark as a client shouldn’t care about what auxiliary services are on the YARN cluster. @Jesse: The change I wrote excludes all artifacts from the com.sun.jersey group. So

Re: spark 2.0 issue with yarn?

2016-05-09 Thread Marcelo Vanzin
Hi Jesse, On Mon, May 9, 2016 at 2:52 PM, Jesse F Chen wrote: > Sean - thanks. definitely related to SPARK-12154. > Is there a way to continue use Jersey 1 for existing working environment? The error you're getting is because of a third-party extension that tries to talk to

Re: spark 2.0 issue with yarn?

2016-05-09 Thread Sean Owen
The reason I ask how you're running is that YARN itself should have the classes that YARN needs, and should be on your classpath. That's why it's not in Spark. There still could be a subtler problem. The best way to trouble shoot, if you want to act on it now, is to have a look at the exclusions

Re: spark 2.0 issue with yarn?

2016-05-09 Thread Jesse F Chen
Sean - thanks. definitely related to SPARK-12154. Is there a way to continue use Jersey 1 for existing working environment? Or, what's the best way to patch up existing Jersey 1 environment to Jersey 2? This does break all of our Spark jobs running Spark 2.0 on YARN.

Re: Parse Json in Spark

2016-05-09 Thread KhajaAsmath Mohammed
Thanks Ewan. I did the same way you explained. Thanks for your response once again. On Mon, May 9, 2016 at 4:21 PM, Ewan Leith wrote: > The simplest way is probably to use the sc.binaryFiles or > sc.wholeTextFiles API to create an RDD containing the JSON files (maybe

RE: Parse Json in Spark

2016-05-09 Thread Ewan Leith
The simplest way is probably to use the sc.binaryFiles or sc.wholeTextFiles API to create an RDD containing the JSON files (maybe need a sc.wholeTextFiles(…).map(x => x._2) to drop off the filename column) then do a sqlContext.read.json(rddName) That way, you don’t need to worry about

Re: spark 2.0 issue with yarn?

2016-05-09 Thread Sean Owen
Hm, this may be related to updating to Jersey 2, which happened 4 days ago: https://issues.apache.org/jira/browse/SPARK-12154 That is a Jersey 1 class that's missing. How are you building and running Spark? I think the theory was that Jersey 1 would still be supplied at runtime. We may have to

spark 2.0 issue with yarn?

2016-05-09 Thread Jesse F Chen
I had been running fine until builds around 05/07/2016 If I used the "--master yarn" in builds after 05/07, I got the following error...sounds like something jars are missing. I am using YARN 2.7.2 and Hive 1.2.1. Do I need something new to deploy related to YARN? bin/spark-sql

Spark-csv- partitionBy

2016-05-09 Thread Mail.com
Hi, I have to write tab delimited file and need to have one directory for each unique value of a column. I tried using spark-csv with partitionBy and seems it is not supported. Is there any other option available for doing this? Regards, Pradeep

Re: DataFrame cannot find temporary table

2016-05-09 Thread Mich Talebzadeh
Have you created sqlContext based on HiveContext? val sc = new SparkContext(conf) // Create sqlContext based on HiveContext val sqlContext = new HiveContext(sc) import sqlContext.implicits._ df.registerTempTable("person") ... Dr Mich Talebzadeh LinkedIn *

DataFrame cannot find temporary table

2016-05-09 Thread KhajaAsmath Mohammed
Hi, I have created dataframe with below code and I was able to print the schema but unfortuntely cannot pull the data from the temporary table. It always says that table is not found val df=convertRDDToDF(records, mapper, errorRecords, sparkContext); import sqlContext._

StreamingLinearRegression Java example

2016-05-09 Thread diplomatic Guru
Hello, I'm trying to find an example of using StreamingLinearRegression in Java, but couldn't find any. There are examples for Scala but not for Java, Has anyone got any example that I can take a look. Thanks.

Re: How to use pyspark streaming module "slice"?

2016-05-09 Thread sethirot
Hi, Have you managed to solve this? I just got stick with this also ? -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/How-to-use-pyspark-streaming-module-slice-tp26813p26908.html Sent from the Apache Spark User List mailing list archive at Nabble.com.

Re: Re: Re: How big the spark stream window could be ?

2016-05-09 Thread Mich Talebzadeh
I had a look at the thread. This is what you have which I gather a standalone box in other words one worker node bin/spark-submit --master spark://ES01:7077 --executor-memory 4G --num-executors 1 --total-executor-cores 1 ./latest5min.py 1>a.log 2>b.log But what I don't understand why is using

Re:Re: Re: How big the spark stream window could be ?

2016-05-09 Thread 李明伟
Thanks for all the information guys. I wrote some code to do the test. Not using window. So only calculating data for each batch interval. I set the interval to 30 seconds also reduce the size of data to about 30 000 lines of csv. Means my code should calculation on 30 000 lines of CSV in 30

what is "spark.history.retainedApplications" points to

2016-05-09 Thread neeraj_yadav
Hi, As per apache doc "http://spark.apache.org/docs/latest/monitoring.html; spark.history.retainedApplications points to "The number of application UIs to retain. If this cap is exceeded, then the oldest applications will be removed" But I see more than configured apps into the UI. Is it

Re: How big the spark stream window could be ?

2016-05-09 Thread Mich Talebzadeh
I agree with Jorn et al on this. An alternative approach would be best. as a 24 hour operation sounds like a classic batch job more suitable for later reporting. This happens all the time in RDBMS. As I understand and within Spark the sliding interval can only be used after that window length

Re: Updating Values Inside Foreach Rdd loop

2016-05-09 Thread HARSH TAKKAR
Hi Please help. On Sat, 7 May 2016, 11:43 p.m. HARSH TAKKAR, wrote: > Hi Ted > > Following is my use case. > > I have a prediction algorithm where i need to update some records to > predict the target. > > For eg. > I have an eq. Y= mX +c > I need to change value of Xi

Re: Streaming application slows over time

2016-05-09 Thread Mich Talebzadeh
Hi Bryan, when this happens do you check OS to see the amount of memory free and cpu usage. sounds like another application may be creeping in. OS tools like free and top may provide some clue. Also you may see a number of skips in Spark GUI that could not be processed. Dr Mich Talebzadeh

Why I have memory leaking for such simple spark stream code?

2016-05-09 Thread kramer2...@126.com
Hi I wrote some Python code to do calculation on spark stream. The code works fine for about half an hour then the memory usage for the executor become very high. I assign 4GB in the submit command but it using 80% of my physical memory which is 16GB. I see this from top command. In this

Help understanding an exception that produces multiple stack traces

2016-05-09 Thread James Casiraghi
Hi, Below is an error I got while using Spark 1.6.1 on AWS EMR 4.5. I am trying to understand what exactly this error is telling me. I see the exception, then what I am assuming is the plan being executed the resulting stack trace, followed by two caused by stack traces, and then a driver

Streaming application slows over time

2016-05-09 Thread Bryan Jeffrey
All, I am seeing an odd issue with my streaming application. I am running Spark 1.4.1, Scala 2.10. Our streaming application has a batch time of two minutes. The application runs well for a reasonable period of time (4-8 hours). It processes the same data in approximately the same amount of

No of Spark context per jvm

2016-05-09 Thread praveen S
Hi, As far as I know you can create one SparkContext per jvm, but wanted to confirm if it's one per jvm or one per classloader. As in one SparkContext created per *. war, all deployment under one tomcat instance Regards, Praveen

Re: How big the spark stream window could be ?

2016-05-09 Thread firemonk9
I have not come across official docs in this regard how ever if you use 24 hour window size, you will need to have memory big enough to fit the stream data for 24 hours. Usually memory is the limiting factor for the window size. Dhiraj Peechara -- View this message in context:

Re: partitioner aware subtract

2016-05-09 Thread Raghava Mutharaju
We tried that but couldn't figure out a way to efficiently filter it. Lets take two RDDs. rdd1: (1,2) (1,5) (2,3) (3,20) (3,16) rdd2: (1,2) (3,30) (3,16) (5,12) rdd1.leftOuterJoin(rdd2) and get rdd1.subtract(rdd2): (1,(2,Some(2))) (1,(5,Some(2))) (2,(3,None)) (3,(20,Some(30)))

Re: How big the spark stream window could be ?

2016-05-09 Thread Jörn Franke
I do not recommend large windows. You can have small windows, store the data and then do the reports for one hour or one day on stored data. > On 09 May 2016, at 05:19, "kramer2...@126.com" wrote: > > We have some stream data need to be calculated and considering use spark

Re: java.lang.NoClassDefFoundError: kafka/api/TopicMetadataRequest

2016-05-09 Thread Guillermo Ortiz
The class is in *kafka_2.10-0.8.2.1.jar* not in spark-streaming-kafka_2.10-1.5.1.jar I found the error... so embarrasing. I was executing in cluster-mode and I only had the jars in the gateway. I guess that Spark chose an NodeManager to execute a container with the Driver and load the libraries,

ERROR SparkContext: Error initializing SparkContext.

2016-05-09 Thread Andrew Holway
Hi, I am having a hard time getting to the bottom of this problem. I'm really not sure where to start with it. Everything works fine in local mode. Cheers, Andrew [testing@instance-16826 ~]$ /opt/mapr/spark/spark-1.5.2/bin/spark-submit --num-executors 21 --executor-cores 5 --master yarn-client

Re: Is it a bug?

2016-05-09 Thread Zheng Wendell
You can move the definition of `t` upward. My example is still valide. On Mon, May 9, 2016 at 1:46 PM, Ted Yu wrote: > Using spark-shell, I was not allowed to define the map() without declaring > t first: > > scala> rdd = rdd.map(x => x*t) > :26: error: not found: value t >rdd =

Re: Re: How big the spark stream window could be ?

2016-05-09 Thread Mich Talebzadeh
Actually it is interesting to understand how Spark allocate cache size within each worker node. if it is allocated dynamically then memory error won't occur until all cache memories are exhausted? Also it really depends on the operation for example, I would not use spark for this purpose, to get

Re: java.lang.NoClassDefFoundError: kafka/api/TopicMetadataRequest

2016-05-09 Thread Mich Talebzadeh
That sounds like specific for Kafka. Check this https://www.codatlas.com/github.com/apache/kafka/HEAD/core/src/main/scala/kafka/api/TopicMetadataRequest.scala I cannot see it in jar tvf spark-streaming-kafka_2.10-1.5.1.jar|grep TopicMetadataRequest Dr Mich Talebzadeh LinkedIn *

Re: java.lang.NoClassDefFoundError: kafka/api/TopicMetadataRequest

2016-05-09 Thread Guillermo Ortiz
I was looking the log carefully looking for others error. Anyway the complete Exception is: 2016-05-09 13:20:25,646 [Driver] ERROR org.apache.spark.deploy.yarn.ApplicationMaster - User class threw exception: java.lang.NoClassDefFoundError: kafka/api/TopicMetadataRequest

Re: Is it a bug?

2016-05-09 Thread Ted Yu
Using spark-shell, I was not allowed to define the map() without declaring t first: scala> rdd = rdd.map(x => x*t) :26: error: not found: value t rdd = rdd.map(x => x*t) ^ On Mon, May 9, 2016 at 4:19 AM, Daniel Haviv < daniel.ha...@veracity-group.com> wrote: >

apache spark on gitter?

2016-05-09 Thread Paweł Szulc
Hi, I was wondering - why Spark does not have a gitter channel? -- Regards, Paul Szulc twitter: @rabbitonweb blog: www.rabbitonweb.com

Re: Is it a bug?

2016-05-09 Thread Daniel Haviv
How come that for the first() function it calculates an updated value and for collect it doesn't ? On Sun, May 8, 2016 at 4:17 PM, Ted Yu wrote: > I don't think so. > RDD is immutable. > > > On May 8, 2016, at 2:14 AM, Sisyphuss wrote: > > > >

Re: java.lang.NoClassDefFoundError: kafka/api/TopicMetadataRequest

2016-05-09 Thread Ted Yu
NoClassDefFoundError is different than saying that it could not be loaded from the classpath. >From my experience, there should be some other error before this error which would give you better idea. You can also check whether another version of kafka is embedded in any of the jars listed below.

Re: Re: How big the spark stream window could be ?

2016-05-09 Thread Ashok Kumar
great. So in simplest of forms let us assume that I have a standalone host that runs Spark and receives topics from a source say Kafa. So basically I have one executor, one cache on the node and if my streaming data is too much, I anticipate there will not be execution as I don't have memory.

Re: java.lang.NoClassDefFoundError: kafka/api/TopicMetadataRequest

2016-05-09 Thread Guillermo Ortiz
*jar tvf kafka_2.10-0.8.2.1.jar | grep TopicMetadataRequest * 1757 Thu Feb 26 14:30:34 CET 2015 kafka/api/TopicMetadataRequest$$anonfun$1.class 1712 Thu Feb 26 14:30:34 CET 2015 kafka/api/TopicMetadataRequest$$anonfun$readFrom$1.class 1437 Thu Feb 26 14:30:34 CET 2015

Re: Error Kafka/Spark. Ran out of messages before reaching ending offset

2016-05-09 Thread Guillermo Ortiz
I reinstalled Kafka and it works, I work with virtual machines and someone changed the host of one of the Kafkas without telling anybody. 2016-05-06 16:11 GMT+02:00 Cody Koeninger : > Yeah, so that means the driver talked to kafka and kafka told it the > highest available

java.lang.NoClassDefFoundError: kafka/api/TopicMetadataRequest

2016-05-09 Thread Guillermo Ortiz
I'm trying to execute a job with Spark and Kafka and I'm getting this error. I know that it's becuase the version are not right, but I have been checking the jar which I import on the SparkUI spark.yarn.secondary.jars and they are right and the class exists inside *kafka_2.10-0.8.2.1.jar. *

Re: partitioner aware subtract

2016-05-09 Thread ayan guha
How about outer join? On 9 May 2016 13:18, "Raghava Mutharaju" wrote: > Hello All, > > We have two PairRDDs (rdd1, rdd2) which are hash partitioned on key > (number of partitions are same for both the RDDs). We would like to > subtract rdd2 from rdd1. > > The subtract

Re: Kafka 0.9 and spark-streaming-kafka_2.10

2016-05-09 Thread Mich Talebzadeh
our Kafka version kafka_2.11-0.9.0.1 works fine with Spark 1.6.1 Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw *

Reading Shuffle Data from highly loaded nodes

2016-05-09 Thread alvarobrandon
Hello everyone: I'm running an experiment in a Spark cluster where some of the machines are highly loaded with CPU, memory and network consuming process ( let's call them straggler machines ). Obviously the tasks of these machines take longer to execute than in other nodes of the cluster.

Kafka 0.9 and spark-streaming-kafka_2.10

2016-05-09 Thread Michel Hubert
Hi, I'm thinking of upgdrading our kafka cluster to 0.9. Will this be a problem for the Spark Streaming + Kafka Direct Approach Integration using artifact spark-streaming-kafka_2.10 (1.6.1)? groupId = org.apache.spark artifactId = spark-streaming-kafka_2.10 version = 1.6.1 Because the

Re: Re: How big the spark stream window could be ?

2016-05-09 Thread Saisai Shao
Pease see the inline comments. On Mon, May 9, 2016 at 5:31 PM, Ashok Kumar wrote: > Thank you. > > So If I create spark streaming then > > >1. The streams will always need to be cached? It cannot be stored in >persistent storage > > You don't need to cache the

Re: Spark support for Complex Event Processing (CEP)

2016-05-09 Thread Esa Heikkinen
Sorry for answering delay.. Yes, this is not pure "CEP", but quite close for it or many similar "functionalities". My case is not so easy, because i dont' want to compare against original time schedule of route. I want to compare how close (ITS) system has estimated arrival time to bus

Re: Re: How big the spark stream window could be ?

2016-05-09 Thread Ashok Kumar
Thank you. So If I create spark streaming then - The streams will always need to be cached? It cannot be stored in persistent storage - The stream data cached will be distributed among all nodes of Spark among executors - As I understand each Spark worker node has one executor that

Re: Re: How big the spark stream window could be ?

2016-05-09 Thread Saisai Shao
No, each executor only stores part of data in memory (it depends on how the partition are distributed and how many receivers you have). For WindowedDStream, it will obviously cache the data in memory, from my understanding you don't need to call cache() again. On Mon, May 9, 2016 at 5:06 PM,

Re: Re: How big the spark stream window could be ?

2016-05-09 Thread Ashok Kumar
hi, so if i have 10gb of streaming data coming in does it require 10gb of memory in each node? also in that case why do we need using dstream.cache() thanks On Monday, 9 May 2016, 9:58, Saisai Shao wrote: It depends on you to write the Spark application,

Re: Re: How big the spark stream window could be ?

2016-05-09 Thread Saisai Shao
It depends on you to write the Spark application, normally if data is already on the persistent storage, there's no need to be put into memory. The reason why Spark Streaming has to be stored in memory is that streaming source is not persistent source, so you need to have a place to store the

Re:Re: How big the spark stream window could be ?

2016-05-09 Thread 李明伟
Thanks. What if I use batch calculation instead of stream computing? Do I still need that much memory? For example, if the 24 hour data set is 100 GB. Do I also need a 100GB RAM to do the one time batch calculation ? At 2016-05-09 15:14:47, "Saisai Shao" wrote:

Re: How big the spark stream window could be ?

2016-05-09 Thread Saisai Shao
What do you mean of swap space, the system swap space or Spark's block manager disk space? If you're referring to swap space, I think you should first think about JVM heap size and yarn container size before running out of system memory. If you're referring to block manager disk space, the

Re: Re: How big the spark stream window could be ?

2016-05-09 Thread Mich Talebzadeh
Hi, Have you thought of other alternatives like collecting data in a database (over 24 hours period)? I mean do you require reports of 5 min interval *after 24 hours data collection* from t0, t0+5m, t0+10 min? You can only do so after collecting data then you can partition your table into 5

Re: How big the spark stream window could be ?

2016-05-09 Thread Mich Talebzadeh
That is a valid point Shao. However, it will start using disk space as memory storage akin to swap space. It will not crash I believe it will just be slow and this assumes that you do not run out of disk space. Dr Mich Talebzadeh LinkedIn *

Re:Re: How big the spark stream window could be ?

2016-05-09 Thread 李明伟
Thanks Mich I guess I did not make my question clear enough. I know the terms like interval or window. I also know how to use them. The problem is that in my case, I need to set the window to cover data for 24 hours or 1 hours. I am not sure if it is a good way because the window is just too

Re: How big the spark stream window could be ?

2016-05-09 Thread Saisai Shao
For window related operators, Spark Streaming will cache the data into memory within this window, in your case your window size is up to 24 hours, which means data has to be in Executor's memory for more than 1 day, this may introduce several problems when memory is not enough. On Mon, May 9,

Re: Sliding Average over Window in Spark Streaming

2016-05-09 Thread Mich Talebzadeh
In general working out minimum or max of say prices (I do not know your use case) is pretty straight forward. For example val maxValue = price.reduceByWindow((x:Double,y:Double) => if(x > y) x else y,Seconds(windowLength), Seconds(slidingInterval)) maxValue.print() The average values are

Re: How big the spark stream window could be ?

2016-05-09 Thread Mich Talebzadeh
ok terms for Spark Streaming "Batch interval" is the basic interval at which the system with receive the data in batches. This is the interval set when creating a StreamingContext. For example, if you set the batch interval as 300 seconds, then any input DStream will generate RDDs of received